RubyGems - combine_pdf - Versions diffs - 1.0.6 → 1.0.22 - Mend

combine_pdf 1.0.6 → 1.0.22

Files changed (16) hide show

checksums.yaml +5 -5
data/CHANGELOG.md +84 -0
data/README.md +40 -1
data/combine_pdf.gemspec +4 -2
data/lib/combine_pdf/api.rb +6 -6
data/lib/combine_pdf/fonts.rb +13 -4
data/lib/combine_pdf/page_methods.rb +9 -10
data/lib/combine_pdf/parser.rb +145 -60
data/lib/combine_pdf/pdf_protected.rb +44 -11
data/lib/combine_pdf/pdf_public.rb +20 -12
data/lib/combine_pdf/renderer.rb +22 -15
data/lib/combine_pdf/version.rb +1 -1
data/lib/combine_pdf.rb +1 -0
data/test/automated +2 -0
data/test/combine_pdf/renderer_test.rb +22 -0
metadata +32 -17

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: b455928d3e64892b983f743f94243bd3850f1324
-  data.tar.gz: '0925ba5c1a7754bd54311ba499329771b0bc36b4'
+SHA256:
+  metadata.gz: 96825d0aa74bd673883c4d7dbf3884459ff27c2a3d7bd0c60875e0499c7b9aeb
+  data.tar.gz: 985c39883f343bb5182344ccc31353103fbac89494000362973f08cdd379d2ac
 SHA512:
-  metadata.gz: 1a382e673cf8c042ed95638d44cb9cd804a52190bc06130ba3d9ae5a5aac564184c13c41b6a44ecbf265590a1ed3fdd97a011c252dc647502295255051fb473c
-  data.tar.gz: 0a77bd1af453712bcff6de7e3fbae6a745c0b3137351c8b147a614aac8d0062feee0044e66cba8e4924213ab7c9020ec5c8087fb9cc5d1258edb7303a23a919b
+  metadata.gz: 8575b612e1eb31775833faba8f310d84680d6ce27512d6a9c182e7598a743956da34e0321f0280d032e3e46b861dd2abdd88b297e65a652ec8e3e416ed9fb0a0
+  data.tar.gz: 2026d924120f1798681842fee7a2eb0de78be6ac493dcbd4ffbb934c1c0135161ccbf29283fb0eec42b4ebab66f84b7fa3ac354a970fad9bc0ad302f64da7c7a

data/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,90 @@
 ***
+#### Change log v.1.0.22
+**Fix**: fix `fonts` dereferencing issue (#203), credit to @MarcWeber (Marc Weber) for identifying the issue.
+**Fix**: fix `metrix` dependency, credit to @casperisfine (Jean byroot Boussier) for PR #195.
+#### Change log v.1.0.21
+**Fix**: possible fix for issue #184, where nested PDF files within an object stream could break the parser. Credit to Greg Sparrow (@hazelsparrow) for exposng the issue.
+#### Change log v.1.0.20
+**Fix**: merges PR #180, `TypeError: can't dup NilClass`. Credit to Adam Trepanier (@adam-e-trepanier) for the merge.
+#### Change log v.1.0.19
+**Fix**: fixes font height and width detection issue. Issue #179. Credit to @5anchezzz for opening the issue.
+**Fix**: fixes an indentation warning. Issue #173. Credit to @rubyFeedback for exposing this issue.
+#### Change log v.1.0.18
+**Fix**: fixed issue with the 1.0.17 release where `ProcSet` PDF Arrays should have been expected but where ignored and a PDF Object was assumed instead (issue #171) - credit to @chuchiperriman (Jesús Barbero Rodríguez).
+#### Change log v.1.0.17
+NB: yanked from RubyGems.org.
+**Fix**: fixed issue where nested structure equality tests might provide false positives, resulting in lost data (issue #166) - credit to @cschilbe (Conrad Schilbe).
+#### Change log v.1.0.16
+**Fix**: some documentation typos were fixed (PR #147) - credit to @djhopper01 (Derek Hopper).
+#### Change log v.1.0.15
+**Fix**: An attempt to fix JRuby compatibility concerns (issue #127).
+#### Change log v.1.0.14
+**Fix**: Fixed an issue related to PDF XRef table data, where a malformed EOL marker would cause the parser to fail. Credit to @dangerous (David Rainsford) for exposing this issue in a comment to issue #140.
+#### Change log v.1.0.13
+**Fix**: Fixed an issue related to PDF object streams (version 1.6) where a numerical object at the beginning of the stream might be mis-parsed as an object reference number rather than an object. Credit to @Defoncesko for reporting issue #141.
+#### Change log v.1.0.12
+**Fix**: Fixed an issue introduced in version 1.0.11, where a fragmented XREF table might cause the CombinePDF::Parser to fail. Credit to @solasdev for reporting issue #140.
+#### Change log v.1.0.11
+**Fix**: Fixed an issue where small floating point numbers would produce invalid PDF rendering (where exponent notation was used instead of decimal notation). Credit to @avit (Andrew Vit) for PR #139.
+#### Change log v.1.0.10
+**Fix**: Fixed an issue related to issue #131 where parsing would fail if the `xref` section appears to be misplaced within the PDF. Credit to @bharat303 (Bharat Godhani) for exposing this issue.
+#### Change log v.1.0.9
+**Fix**: Fixed issue #136 where the `#fix_rotation` function would rotate the page to the wrong direction. Credit to @dmkash for exposing this issue.
+#### Change log v.1.0.8
+**Fix**: Fixed an issue with octal representation in escaped string data. The issue would (usually) go unnoticed (altering internal labels in a non-disruptive manner), however the issue did effect `ColorSpace` data in the rare use of `ICCBased` color maps, causing color distortion and transparency loss. Credit to @react-rails and @bedaronco for exposing the issue (issue #130).
+**Fix**: Fixed an issue with non English alphabet in PDF literal strings. This issue went undetected since PDF literal strings aren't used by CombinePDF except for the date stamping...
+**Fix**: Improbable, but possibly a fix for issue #127, where the JRuby interpreter would fail to pass the correct arguments to the Hash update Proc. Since I'm trying to author a workaround, I have my doubts... but an attempt is better than nothing.
+**Update**: Improved parsing error handling, courtesy of Evgeny Garlukovich (@evgenygarl).
+**Update**: Added reader methods for the `names` and `outlines` PDF objects in response to issue #133. Use with care.
+#### Change log v.1.0.7
+**Fix**: Fix an issue where page property inheritance might break PDF structure if there's a conflict between property types (inheritance using properties by reference vs. nested properties), fixing issue #124. Credit to @erikaxel for exposing the issue.
+#### Change log v.1.0.6
+**Fix**: Fix warnings, issue #120. Credit to @lloeki for exposing the issue.
+**Fix**: Fix / add adjustable nesting protection, fixing issue #117. Credit to @emmanuelmillionaer for exposing the issue.
 #### Change log v.1.0.5
 **Fix**: Fix issue #116 where some PDF objects (the page catalog and some root information data) were written twice to the saved PDF file (or String). Credit to @albertsaave  for exposing the issue using GhostScript.

data/README.md CHANGED Viewed

@@ -1,8 +1,10 @@
 # CombinePDF - the ruby way for merging PDF files
 [![Gem Version](https://badge.fury.io/rb/combine_pdf.svg)](http://badge.fury.io/rb/combine_pdf)
 [![GitHub](https://img.shields.io/badge/GitHub-Open%20Source-blue.svg)](https://github.com/boazsegev/combine_pdf)
+[![Documentation](http://inch-ci.org/github/boazsegev/combine_pdf.svg?branch=master)](https://www.rubydoc.info/github/boazsegev/combine_pdf)
 [![Maintainers Wanted](https://img.shields.io/badge/maintainers-wanted-red.svg)](https://github.com/pickhardt/maintainers-wanted)
 CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).
 ## Install
@@ -41,6 +43,8 @@ Quick rundown:
 * Sometimes the CombinePDF will raise an exception even if the PDF could be parsed (i.e., when PDF optional content exists)... I find it better to err on the side of caution, although for optional content PDFs an exception is avoidable using `CombinePDF.load(pdf_file, allow_optional_content: true)`.
+* The CombinePDF gem runs recursive code to both parse and format the PDF files. Hence, PDF files that have heavily nested objects, as well as those that where combined in a way that results in cyclic nesting, might explode the stack - resulting in an exception or program failure.
 CombinePDF is written natively in Ruby and should (presumably) work on all Ruby platforms that follow Ruby 2.0 compatibility.
 However, PDF files are quite complex creatures and no guaranty is provided.
@@ -112,7 +116,42 @@ pdf.number_pages
 pdf.save "file_with_numbering.pdf"
 ```
-Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - see documentation.
+Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - [see documentation](https://www.rubydoc.info/github/boazsegev/combine_pdf/CombinePDF/PDF#number_pages-instance_method).
+For example, should you prefer to place the page number on the bottom right side of all PDF pages, do:
+```ruby
+pdf.number_pages(location: [:bottom_right])
+```
+As another example, the dashes around the number are removed and a box is placed around it. The numbering is semi-transparent and the first 3 pages are numbered using letters (a,b,c) rather than numbers:
+```ruby
+# number first 3 pages as "a", "b", "c"
+pdf.number_pages(number_format: " %s ",
+                 location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
+                 start_at: "a",
+                 page_range: (0..2),
+                 box_color: [0.8,0.8,0.8],
+                 border_color: [0.4, 0.4, 0.4],
+                 border_width: 1,
+                 box_radius: 6,
+                 opacity: 0.75)
+# number the rest of the pages as 4, 5, ... etc'
+pdf.number_pages(number_format: " %s ",
+                 location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
+                 start_at: 4,
+                 page_range: (3..-1),
+                 box_color: [0.8,0.8,0.8],
+                 border_color: [0.4, 0.4, 0.4],
+                 border_width: 1,
+                 box_radius: 6,
+                 opacity: 0.75)
+```
+    pdf.number_pages(number_format: " %s ", location: :bottom_right, font_size: 44)
 ## Loading and Parsing PDF data

data/combine_pdf.gemspec CHANGED Viewed

@@ -19,7 +19,9 @@ Gem::Specification.new do |spec|
   spec.require_paths = ["lib"]
   spec.add_runtime_dependency 'ruby-rc4', '>= 0.1.5'
+  spec.add_runtime_dependency 'matrix'
-  spec.add_development_dependency "bundler", "~> 1.7"
-  spec.add_development_dependency "rake", "~> 10.0"
+  # spec.add_development_dependency "bundler", ">= 1.7"
+  spec.add_development_dependency "rake", ">= 12.3.3"
+  spec.add_development_dependency "minitest"
 end

data/lib/combine_pdf/api.rb CHANGED Viewed

@@ -24,11 +24,11 @@ module CombinePDF
     raise TypeError, "couldn't create PDF object, expecting type String" unless string.is_a?(String) || string.is_a?(Pathname)
     begin
       (begin
-     File.file? string
-   rescue
-     false
-   end) ? load(string) : parse(string)
-   rescue => _e
+        File.file? string
+      rescue
+        false
+      end) ? load(string) : parse(string)
+    rescue => _e
       raise 'General PDF error - Use CombinePDF.load or CombinePDF.parse for a non-general error message (the requested file was not found OR the string received is not a valid PDF stream OR the file was found but not valid).'
     end
   end
@@ -140,7 +140,7 @@ module CombinePDF
   # this function enables plug-ins to expend the font functionality of CombinePDF.
   #
   # font_name:: a Symbol with the name of the font. if the fonts exists in the library, it will be overwritten!
-  # font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, buttom_y, right_x, top_y]} where char == character itself (i.e. " " for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
+  # font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where char == character itself (i.e. " " for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
   # font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
   # font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
   def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)

data/lib/combine_pdf/fonts.rb CHANGED Viewed

@@ -100,7 +100,7 @@ module CombinePDF
     # adds a correctly formatted font object to the font library.
     # font_name:: a Symbol with the name of the font. if the fonts name exists, the font will be overwritten!
-    # font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x, buttom_y, right_x, top_y]} where i == character code (i.e. 32 for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy will be supported in the future, for up to down fonts.
+    # font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where i == character code (i.e. 32 for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy will be supported in the future, for up to down fonts.
     # font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
     # font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
     def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
@@ -138,12 +138,21 @@ module CombinePDF
       text.each_char do |c|
         metrics_array << (merged_metrics[c] || { wx: 0, boundingbox: [0, 0, 0, 0] })
       end
-      height = metrics_array.map { |m| m ? m[:boundingbox][3] : 0 } .max
-      height -= (metrics_array.map { |m| m ? m[:boundingbox][1] : 0 }).min
+      metrics_array_mapped_top = [].dup
+      metrics_array_mapped_bottom = [].dup
       width = 0.0
       metrics_array.each do |m|
-        width += (m[:wx] || m[:wy])
+        if (m && m[:boundingbox])
+          metrics_array_mapped_top << m[:boundingbox][3]
+          metrics_array_mapped_bottom << m[:boundingbox][1]
+        else
+          metrics_array_mapped_top << 0
+          metrics_array_mapped_bottom << 0
+        end
+        width += (m[:wx] || m[:wy] || 0) if m
       end
+      height = metrics_array_mapped_top.max
+      height -=metrics_array_mapped_bottom.min
       return [height.to_f / 1000 * size, width.to_f / 1000 * size] if metrics_array[0][:wy]
       [width.to_f / 1000 * size, height.to_f / 1000 * size]
     end

data/lib/combine_pdf/page_methods.rb CHANGED Viewed

@@ -94,7 +94,7 @@ module CombinePDF
       # end
       # set ProcSet to recommended value
-      resources[:ProcSet] = [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
+      resources[:ProcSet] ||= [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
       if top # if this is a stamp (overlay)
         insert_content CONTENT_CONTAINER_START, 0
@@ -147,15 +147,15 @@ module CombinePDF
     # This method adds a simple text box to the Page represented by the PDFWriter class.
     # This function takes two values:
-    # text:: the text to potin the box.
+    # text:: the text to write in the box.
     # properties:: a Hash of box properties.
     # the symbols and values in the properties Hash could be any or all of the following:
     # x:: the left position of the box.
-    # y:: the BUTTOM position of the box.
+    # y:: the BOTTOM position of the box.
     # width:: the width/length of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
     # height:: the height of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
     # text_align:: symbol for horizontal text alignment, can be ":center" (default), ":right", ":left"
-    # text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":buttom"
+    # text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":bottom"
     # text_padding:: a Float between 0 and 1, setting the padding for the text. defaults to 0.05 (5%).
     # font:: a registered font name or an Array of names. defaults to ":Helvetica". The 14 standard fonts names are:
     # - :"Times-Roman"
@@ -244,8 +244,8 @@ module CombinePDF
         half_radius = (radius.to_f / 2).round 4
         ## set starting point
         box_stream << "#{options[:x] + radius} #{options[:y]} m\n"
-        ## buttom and right corner - first line and first corner
-        box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" # buttom
+        ## bottom and right corner - first line and first corner
+        box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" # bottom
         if options[:box_radius] != 0 # make first corner, if not straight.
           box_stream << "#{options[:x] + options[:width] - half_radius} #{options[:y]} "
           box_stream << "#{options[:x] + options[:width]} #{options[:y] + half_radius} "
@@ -265,7 +265,7 @@ module CombinePDF
           box_stream << "#{options[:x]} #{options[:y] + options[:height] - half_radius} "
           box_stream << "#{options[:x]} #{options[:y] + options[:height] - radius} c\n"
         end
-        ## left and buttom-left corner
+        ## left and bottom-left corner
         box_stream << "#{options[:x]} #{options[:y] + radius} l\n"
         if options[:box_radius] != 0
           box_stream << "#{options[:x]} #{options[:y] + half_radius} "
@@ -287,7 +287,7 @@ module CombinePDF
       end
       contents << box_stream
-      # reset x,y by text alignment - x,y are calculated from the buttom left
+      # reset x,y by text alignment - x,y are calculated from the bottom left
       # each unit (1) is 1/72 Inch
       # create text stream
       text_stream = ''
@@ -403,7 +403,7 @@ module CombinePDF
     def fix_rotation
       return self if self[:Rotate].to_f == 0.0 || mediabox.nil?
       # calculate the rotation
-      r = self[:Rotate].to_f * Math::PI / 180
+      r = (360.0 - self[:Rotate].to_f) * Math::PI / 180
       s = Math.sin(r).round 6
       c = Math.cos(r).round 6
       ctm = [c, s, -s, c]
@@ -649,7 +649,6 @@ module CombinePDF
         page_res.each do |k, v|
           v = page_res[k] = v.dup if v.is_a?(Array) || v.is_a?(Hash)
           v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
-          v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
         end
       end
       page_copy.instance_exec(secure || @secure_injection) { |s| secure_for_copy if s; init_contents; self }

data/lib/combine_pdf/parser.rb CHANGED Viewed

@@ -6,6 +6,8 @@
 ########################################################
 module CombinePDF
+  ParsingError = Class.new(StandardError)
   # @!visibility private
   # @private
   #:nodoc: all
@@ -77,7 +79,10 @@ module CombinePDF
       @parsed = _parse_
       # puts @parsed
-      raise 'Unknown PDF parsing error - malformed PDF file?' unless (@parsed.select { |i| !i.is_a?(Hash) }).empty?
+      unless (@parsed.select { |i| !i.is_a?(Hash) }).empty?
+        # p @parsed.select
+        raise ParsingError, 'Unknown PDF parsing error - malformed PDF file?'
+      end
       if @root_object == {}.freeze
         xref_streams = @parsed.select { |obj| obj.is_a?(Hash) && obj[:Type] == :XRef }
@@ -86,7 +91,9 @@ module CombinePDF
         end
       end
-      raise 'root is unknown - cannot determine if file is Encrypted' if @root_object == {}.freeze
+      if @root_object == {}.freeze
+        raise ParsingError, 'root is unknown - cannot determine if file is Encrypted'
+      end
       if @root_object[:Encrypt]
         # change_references_to_actual_values @root_object
@@ -105,12 +112,13 @@ module CombinePDF
           next unless o.is_a?(Hash) && o[:Type] == :ObjStm
           ## un-encode (using the correct filter) the object streams
           PDFFilter.inflate_object o
+          # puts "Object Stream Found:", o[:raw_stream_content]
           ## extract objects from stream
           @scanner = StringScanner.new o[:raw_stream_content]
           stream_data = _parse_
           id_array = []
           collection = [nil]
-          while stream_data[0].is_a? (Numeric)
+          while (stream_data[0].is_a?(Numeric) && stream_data[1].is_a?(Numeric))
             id_array << stream_data.shift
             stream_data.shift
           end
@@ -225,16 +233,18 @@ module CombinePDF
         # all characters that aren't white space or special: /[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]+
         elsif str = @scanner.scan(/\/[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]*/)
           out << (str[1..-1].gsub(/\#[0-9a-fA-F]{2}/) { |a| a[1..2].hex.chr }).to_sym
+          # warn "CombinePDF detected name: #{out.last.to_s}"
         ##########################################
         ## Parse a Number
         ##########################################
         elsif str = @scanner.scan(/[\+\-\.\d]+/)
           str =~ /\./ ? (out << str.to_f) : (out << str.to_i)
+          # warn "CombinePDF detected number: #{out.last.to_s}"
         ##########################################
         ## parse a Hex String
         ##########################################
         elsif str = @scanner.scan(/\<[0-9a-fA-F]*\>/)
-          # warn "Found a hex string"
+          # warn "Found a hex string #{str}"
           str = str.slice(1..-2).force_encoding(Encoding::ASCII_8BIT)
           # str = "0#{str}" if str.length.odd?
           out << unify_string([str].pack('H*').force_encoding(Encoding::ASCII_8BIT))
@@ -310,10 +320,10 @@ module CombinePDF
               when 102 # f, form-feed
                 str << 12
               when 48..57 # octal notation for byte?
-                rep = rep.chr
-                rep += str_bytes.shift.chr if str_bytes[0].between?(48, 57)
-                rep += str_bytes.shift.chr if str_bytes[0].between?(48, 57) && ((rep + str_bytes[0].chr).to_i <= 255)
-                str << rep.to_i
+                rep -= 48
+                rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57)
+                rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57) && (((rep << 3) + (str_bytes[0] - 48)) <= 255)
+                str << rep
               when 10 # new line, ignore
                 str_bytes.shift if str_bytes[0] == 13
                 true
@@ -328,6 +338,7 @@ module CombinePDF
             end
           end
           out << unify_string(str.pack('C*').force_encoding(Encoding::ASCII_8BIT))
+          # warn "Found Literal String: #{out.last}"
         ##########################################
         ## parse a Dictionary
         ##########################################
@@ -340,25 +351,42 @@ module CombinePDF
         ## return content of array or dictionary
         ##########################################
         elsif @scanner.scan(/\]/) || @scanner.scan(/>>/)
+          # warn "Dictionary / Array ended with #{@scanner.peek(5)}"
           return out
         ##########################################
         ## parse a Stream
         ##########################################
         elsif @scanner.scan(/stream[ \t]*[\r\n]/)
           @scanner.pos += 1 if @scanner.peek(1) == "\n".freeze && @scanner.matched[-1] != "\n".freeze
+          # advance by the publshed stream length (if any)
+          old_pos = @scanner.pos
+          if(out.last.is_a?(Hash) && out.last[:Length].is_a?(Integer) && out.last[:Length] > 2)
+            @scanner.pos += out.last[:Length] - 2
+          end
           # the following was dicarded because some PDF files didn't have an EOL marker as required
           # str = @scanner.scan_until(/(\r\n|\r|\n)endstream/)
           # instead, a non-strict RegExp is used:
-          str = @scanner.scan_until(/endstream/)
           # raise error if the stream doesn't end.
-          raise "Parsing Error: PDF file error - a stream object wasn't properly closed using 'endstream'!" unless str
+          unless @scanner.skip_until(/endstream/)
+            raise ParsingError, "Parsing Error: PDF file error - a stream object wasn't properly closed using 'endstream'!"
+          end
+          length = @scanner.pos - (old_pos + 9)
+          length = 0 if(length < 0)
+          length -= 1 if(@scanner.string[old_pos + length - 1] == "\n")
+          length -= 1 if(@scanner.string[old_pos + length - 1] == "\r")
+          str = (length > 0) ? @scanner.string.slice(old_pos, length) : ''
+          # warn "CombinePDF parser: detected Stream #{str.length} bytes long #{str[0..3]}...#{str[-4..-1]}"
           # need to remove end of stream
           if out.last.is_a? Hash
-            # out.last[:raw_stream_content] = str[0...-10] #cuts only one EON char (\n or \r)
-            out.last[:raw_stream_content] = unify_string str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
+            out.last[:raw_stream_content] = unify_string str.force_encoding(Encoding::ASCII_8BIT)
           else
             warn 'Stream not attached to dictionary!'
-            out << str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
+            out << str.force_encoding(Encoding::ASCII_8BIT)
           end
         ##########################################
         ## parse an Object after finished
@@ -375,17 +403,6 @@ module CombinePDF
           out.last[:Dest] = unify_string(out.last[:Dest].to_s) if out.last[:Dest] && out.last[:Dest].is_a?(Symbol)
         # puts "!!!!!!!!! Error with :indirect_reference_id\n\nObject #{out.last}  :indirect_reference_id = #{out.last[:indirect_reference_id]}" unless out.last[:indirect_reference_id].is_a?(Numeric)
         ##########################################
-        ## Parse a comment
-        ##########################################
-        elsif str = @scanner.scan(/\%/)
-          # is a comment, skip until new line
-          loop do
-            # break unless @scanner.scan(/[^\d\r\n]+/)
-            break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
-            @scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
-          end
-        # puts "AFTER COMMENT: #{@scanner.peek 8}"
-        ##########################################
         ## Parse an Object Reference
         ##########################################
         elsif @scanner.scan(/R/)
@@ -404,32 +421,55 @@ module CombinePDF
         elsif @scanner.scan(/null/)
           out << nil
         ##########################################
+        ## Parse file trailer
+        ##########################################
+        elsif @scanner.scan(/trailer/)
+          if @scanner.skip_until(/<</)
+            data = _parse_
+            (@root_object ||= {}).clear
+            @root_object[data.shift] = data.shift while data[0]
+          end
+        ##########################################
         ## XREF - check for encryption... anything else?
         ##########################################
-        elsif @scanner.scan(/(startxref)|(xref)/)
-          ##########
-          ## get root object to check for encryption
-          @scanner.scan_until(/(trailer)|(\%EOF)/)
-          fresh = true
-          if @scanner.matched[-1] == 'r'
-            if @scanner.skip_until(/<</)
-              data = _parse_
-              (@root_object ||= {}).clear
-              @root_object[data.shift] = data.shift while data[0]
-            end
-            ##########
-            ## skip untill end of segment, maked by %%EOF
-            @scanner.skip_until(/\%\%EOF/)
-            ##########
-            ## If this was the last valid segment, ignore any trailing garbage
-            ## (issue #49 resolution)
-            break unless @scanner.exist?(/\%\%EOF/)
+        elsif @scanner.scan(/xref/)
+          # skip list indetifier lines or list lines ([\d] [\d][\r\n]) ot ([\d] [\d] [nf][\r\n])
+          while @scanner.scan(/[\s]*[\d]+[ \t]+[\d]+[ \t]*[\n\r]+/) || @scanner.scan(/[ \t]*[\d]+[ \t]+[\d]+[ \t]+[nf][\s]*/)
+            nil
           end
+        ##########################################
+        ## XREF location can be ignored
+        ##########################################
+        elsif @scanner.scan(/startxref/)
+          @scanner.scan(/[\s]+[\d]+[\s]+/)
+        ##########################################
+        ## Skip Whitespace
+        ##########################################
         elsif @scanner.scan(/[\s]+/)
           # Generally, do nothing
           nil
+        ##########################################
+        ## EOF?
+        ##########################################
+        elsif @scanner.scan(/\%\%EOF/)
+          ##########
+          ## If this was the last valid segment, ignore any trailing garbage
+          ## (issue #49 resolution)
+          break unless @scanner.exist?(/\%\%EOF/)
+        ##########################################
+        ## Parse a comment
+        ##########################################
+        elsif str = @scanner.scan(/\%/)
+          # is a comment, skip until new line
+          loop do
+            # break unless @scanner.scan(/[^\d\r\n]+/)
+            break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
+            @scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
+          end
+        # puts "AFTER COMMENT: #{@scanner.peek 8}"
+        ##########################################
+        ## Fix wkhtmltopdf - missing 'endobj' keywords
+        ##########################################
         elsif @scanner.scan(/obj[\s]*/)
           # Fix wkhtmltopdf PDF authoring issue - missing 'endobj' keywords
           unless fresh || (out[-4].nil? || out[-4].is_a?(Hash))
@@ -450,6 +490,9 @@ module CombinePDF
             out << keep.pop
           end
           fresh = false
+        ##########################################
+        ## Unknown, warn and advance
+        ##########################################
         else
           # always advance
           # warn "Advancing for unknown reason... #{@scanner.string[@scanner.pos - 4, 8]} ... #{@scanner.peek(4)}" unless @scanner.peek(1) =~ /[\s\n]/
@@ -475,7 +518,9 @@ module CombinePDF
         @parsed.delete_if { |obj| obj.nil? || obj[:Type] == :Catalog }
         @parsed << catalogs
-        raise "Unknown error - parsed data doesn't contain a cataloged object!" unless catalogs
+        unless catalogs
+          raise ParsingError, "Unknown error - parsed data doesn't contain a cataloged object!"
+        end
       end
       if catalogs.is_a?(Array)
         catalogs.each { |c| catalog_pages(c, inheritance_hash) unless c.nil? }
@@ -488,20 +533,31 @@ module CombinePDF
           end
         else
           unless catalogs[:Type] == :Page
-            raise "Optional Content PDF files aren't supported and their pages cannot be safely extracted." if (catalogs[:AS] || catalogs[:OCProperties]) && !@allow_optional_content
+            if (catalogs[:AS] || catalogs[:OCProperties]) && !@allow_optional_content
+              raise ParsingError, "Optional Content PDF files aren't supported and their pages cannot be safely extracted."
+            end
             inheritance_hash[:MediaBox] = catalogs[:MediaBox] if catalogs[:MediaBox]
             inheritance_hash[:CropBox] = catalogs[:CropBox] if catalogs[:CropBox]
             inheritance_hash[:Rotate] = catalogs[:Rotate] if catalogs[:Rotate]
             if catalogs[:Resources]
               inheritance_hash[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
-              (inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &self.class.method(:hash_update_proc_for_old))
+              (inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
+            end
+            if catalogs[:ProcSet].is_a?(Array)
+              if(inheritance_hash[:ProcSet])
+                inheritance_hash[:ProcSet][:referenced_object].concat(catalogs[:ProcSet])
+                inheritance_hash[:ProcSet][:referenced_object].uniq!
+              else
+                inheritance_hash[:ProcSet] ||= { referenced_object: catalogs[:ProcSet], is_reference_only: true }.dup
+              end
             end
             if catalogs[:ColorSpace]
               inheritance_hash[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
-              (inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &self.class.method(:hash_update_proc_for_old))
+              (inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
             end
-            # (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Resources]
-            # (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:ColorSpace]
+            # (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Resources]
+            # (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:ColorSpace]
             # inheritance_hash[:Order] = catalogs[:Order] if catalogs[:Order]
             # inheritance_hash[:OCProperties] = catalogs[:OCProperties] if catalogs[:OCProperties]
@@ -516,13 +572,27 @@ module CombinePDF
             catalogs[:Rotate] ||= inheritance_hash[:Rotate] if inheritance_hash[:Rotate]
             if inheritance_hash[:Resources]
               catalogs[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
-              (catalogs[:Resources][:referenced_object] || catalogs[:Resources]).update((inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]), &self.class.method(:hash_update_proc_for_old))
+              catalogs[:Resources] = { referenced_object: catalogs[:Resources], is_reference_only: true } unless catalogs[:Resources][:referenced_object]
+              catalogs[:Resources][:referenced_object].update((inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
             end
             if inheritance_hash[:ColorSpace]
               catalogs[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
-              (catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]).update((inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]), &self.class.method(:hash_update_proc_for_old))
+              catalogs[:ColorSpace] = { referenced_object: catalogs[:ColorSpace], is_reference_only: true } unless catalogs[:ColorSpace][:referenced_object]
+              catalogs[:ColorSpace][:referenced_object].update((inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
             end
-            # (catalogs[:ColorSpace] ||= {}).update(inheritance_hash[:ColorSpace], &self.class.method(:hash_update_proc_for_old)) if inheritance_hash[:ColorSpace]
+            if inheritance_hash[:ProcSet]
+              if(catalogs[:ProcSet])
+                if catalogs[:ProcSet].is_a?(Array)
+                  catalogs[:ProcSet] = { referenced_object: catalogs[:ProcSet], is_reference_only: true }
+                end
+                catalogs[:ProcSet][:referenced_object].concat(inheritance_hash[:ProcSet][:referenced_object])
+                catalogs[:ProcSet][:referenced_object].uniq!
+              else
+                catalogs[:ProcSet] = { is_reference_only: true }.dup
+                catalogs[:ProcSet][:referenced_object] = []
+              end
+            end
+            # (catalogs[:ColorSpace] ||= {}).update(inheritance_hash[:ColorSpace], &HASH_UPDATE_PROC_FOR_OLD) if inheritance_hash[:ColorSpace]
             # catalogs[:Order] ||= inheritance_hash[:Order] if inheritance_hash[:Order]
             # catalogs[:AS] ||= inheritance_hash[:AS] if inheritance_hash[:AS]
             # catalogs[:OCProperties] ||= inheritance_hash[:OCProperties] if inheritance_hash[:OCProperties]
@@ -536,9 +606,9 @@ module CombinePDF
           when :Pages
             catalog_pages(catalogs[:Kids], inheritance_hash.dup) unless catalogs[:Kids].nil?
           when :Catalog
-            @forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:AcroForm]
-            @names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Names]
-            @outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Outlines]
+            @forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:AcroForm]
+            @names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Names]
+            @outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Outlines]
             if catalogs[:Dests] # convert PDF 1.1 Dests to PDF 1.2+ Dests
               dests_arry = (@names_object[:Dests] ||= {})
               dests_arry = ((dests_arry[:referenced_object] || dests_arry)[:Names] ||= [])
@@ -652,30 +722,45 @@ module CombinePDF
     # All Strings are one String
     def unify_string(str)
+      str.force_encoding(Encoding::ASCII_8BIT)
       @strings_dictionary[str] ||= str
     end
     # @private
     # this method reviews a Hash and updates it by merging Hash data,
     # preffering the old over the new.
-    def self.hash_update_proc_for_old(_key, old_data, new_data)
+    HASH_UPDATE_PROC_FOR_OLD = Proc.new do |_key, old_data, new_data|
       if old_data.is_a? Hash
-        old_data.merge(new_data, &method(:hash_update_proc_for_old))
+        old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_OLD)
       else
         old_data
       end
     end
+    # def self.hash_update_proc_for_old(_key, old_data, new_data)
+    #   if old_data.is_a? Hash
+    #     old_data.merge(new_data, &method(:hash_update_proc_for_old))
+    #   else
+    #     old_data
+    #   end
+    # end
     # @private
     # this method reviews a Hash an updates it by merging Hash data,
     # preffering the new over the old.
-    def self.hash_update_proc_for_new(_key, old_data, new_data)
+    HASH_UPDATE_PROC_FOR_NEW = Proc.new do |_key, old_data, new_data|
       if old_data.is_a? Hash
-        old_data.merge(new_data, &method(:hash_update_proc_for_new))
+        old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_NEW)
       else
         new_data
       end
     end
+    # def self.hash_update_proc_for_new(_key, old_data, new_data)
+    #   if old_data.is_a? Hash
+    #     old_data.merge(new_data, &method(:hash_update_proc_for_new))
+    #   else
+    #     new_data
+    #   end
+    # end
     # # run block of code on evey PDF object (PDF objects are class Hash)
     # def each_object(object, limit_references = true, already_visited = {}, &block)

data/lib/combine_pdf/pdf_protected.rb CHANGED Viewed

@@ -137,11 +137,14 @@ module CombinePDF
       catalog_object
     end
+    # Deprecation Notice
     def names_object
+      puts "CombinePDF Deprecation Notice: the protected method `names_object` will be deprecated in the upcoming version. Use `names` instead."
       @names
     end
     def outlines_object
+      puts "CombinePDF Deprecation Notice: the protected method `outlines_object` will be deprecated in the upcoming version. Use `oulines` instead."
       @outlines
     end
     # def forms_data
@@ -229,15 +232,42 @@ module CombinePDF
     # @private
     # this method reviews a Hash and updates it by merging Hash data,
     # preffering the new over the old.
-    def self.hash_merge_new_no_page(_key, old_data, new_data)
-      return old_data unless new_data
-      return new_data unless old_data
-      if old_data.is_a?(Hash) && new_data.is_a?(Hash)
-        return old_data if (old_data[:Type] == :Page)
-        old_data.merge(new_data, &(@hash_merge_new_no_page_proc ||= method(:hash_merge_new_no_page)))
+    # def self.hash_merge_new_no_page(_key = nil, old_data = nil, new_data = nil)
+    #   return old_data unless new_data
+    #   return new_data unless old_data
+    #   if old_data.is_a?(Hash) && new_data.is_a?(Hash)
+    #     return old_data if (old_data[:Type] == :Page)
+    #     old_data.merge(new_data, &(@hash_merge_new_no_page_proc ||= method(:hash_merge_new_no_page)))
+    #   elsif old_data.is_a? Array
+    #     return old_data + new_data if new_data.is_a?(Array)
+    #     return old_data.dup << new_data
+    #   elsif new_data.is_a? Array
+    #     new_data + [old_data]
+    #   else
+    #     new_data
+    #   end
+    # end
+    # @private
+    # JRuby Alternative this method reviews a Hash and updates it by merging Hash data,
+    # preffering the new over the old.
+    HASH_MERGE_NEW_NO_PAGE = Proc.new do |_key = nil, old_data = nil, new_data = nil|
+      if !new_data
+        old_data
+      elsif !old_data
+        new_data
+      elsif old_data.is_a?(Hash) && new_data.is_a?(Hash)
+        if (old_data[:Type] == :Page)
+          old_data
+        else
+          old_data.merge(new_data, &HASH_MERGE_NEW_NO_PAGE)
+        end
       elsif old_data.is_a? Array
-        return old_data + new_data if new_data.is_a?(Array)
-        return old_data.dup << new_data
+        if new_data.is_a?(Array)
+          old_data + new_data
+        else
+          old_data.dup << new_data
+        end
       elsif new_data.is_a? Array
         new_data + [old_data]
       else
@@ -343,16 +373,19 @@ module CombinePDF
     private
     def equal_layers obj1, obj2, layer = CombinePDF.eq_depth_limit
-      return true   if(layer == 0)
       return true if obj1.object_id == obj2.object_id
       if obj1.is_a? Hash
         return false unless obj2.is_a? Hash
+        return false unless obj1.length == obj2.length
         keys = obj1.keys;
-        return false if (keys - obj2.keys).any?
+        keys2 = obj2.keys;
+        return false if (keys - keys2).any? || (keys2 - keys).any?
+        return (warn("CombinePDF nesting limit reached") || true) if(layer == 0)
         keys.each {|k| return false unless equal_layers( obj1[k], obj2[k], layer-1) }
       elsif obj1.is_a? Array
         return false unless obj2.is_a? Array
-        (obj1-obj2).any?
+        return false unless obj1.length == obj2.length
+        (obj1-obj2).any? || (obj2-obj1).any?
       else
         obj1 == obj2
       end

data/lib/combine_pdf/pdf_public.rb CHANGED Viewed

@@ -82,6 +82,10 @@ module CombinePDF
     # use, for example:
     #   pdf.viewer_preferences[:HideMenubar] = true
     attr_reader :viewer_preferences
+    # Access the Outlines PDF object Hash (or reference). Use with care.
+    attr_reader :outlines
+    # Access the Names PDF object Hash (or reference). Use with care.
+    attr_reader :names
     def initialize(parser = nil)
       # default before setting
@@ -207,7 +211,7 @@ module CombinePDF
       # when finished, remove the numbering system and keep only pointers
       remove_old_ids
       # output the pdf stream
-      out.join("\n").force_encoding(Encoding::ASCII_8BIT)
+      out.join("\n".force_encoding(Encoding::ASCII_8BIT)).force_encoding(Encoding::ASCII_8BIT)
     end
     # this method returns all the pages cataloged in the catalog.
@@ -253,12 +257,16 @@ module CombinePDF
     def fonts(limit_to_type0 = false)
       fonts_array = []
       pages.each do |pg|
-        if pg[:Resources][:Font]
-          pg[:Resources][:Font].values.each do |f|
-            f = f[:referenced_object] if f[:referenced_object]
-            if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f)
-              fonts_array << f
-            end
+        r = pg[:Resources]
+        next if !r
+        r = r[:referenced_object] if r[:referenced_object]
+        r = r[:Font]
+        next if !r
+        r = r[:referenced_object] if r[:referenced_object]
+        r.values.each do |f|
+          f = f[:referenced_object] if f[:referenced_object]
+          if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f)
+            fonts_array << f
           end
         end
       end
@@ -302,10 +310,10 @@ module CombinePDF
       if data.is_a? PDF
         @version = [@version, data.version].max
         pages_to_add = data.pages
-        actual_value(@names ||= {}.dup).update actual_value(data.names_object), &self.class.method(:hash_merge_new_no_page)
-        merge_outlines((@outlines ||= {}.dup), data.outlines_object, location) unless actual_value(data.outlines_object).empty?
+        actual_value(@names ||= {}.dup).update data.names, &HASH_MERGE_NEW_NO_PAGE
+        merge_outlines((@outlines ||= {}.dup), actual_value(data.outlines), location) unless actual_value(data.outlines).empty?
         if actual_value(@forms_data)
-          actual_value(@forms_data).update actual_value(data.forms_data), &self.class.method(:hash_merge_new_no_page) if data.forms_data
+          actual_value(@forms_data).update actual_value(data.forms_data), &HASH_MERGE_NEW_NO_PAGE if data.forms_data
         else
           @forms_data = data.forms_data
         end
@@ -354,9 +362,9 @@ module CombinePDF
     #
     # options:: a Hash of options setting the behavior and format of the page numbers:
     # - :number_format a string representing the format for page number. defaults to ' - %s - ' (allows for letter numbering as well, such as "a", "b"...).
-    # - :location an Array containing the location for the page numbers, can be :top, :buttom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :buttom].
+    # - :location an Array containing the location for the page numbers, can be :top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :bottom].
     # - :start_at an Integer that sets the number for first page number. also accepts a letter ("a") for letter numbering. defaults to 1.
-    # - :margin_from_height a number (PDF points) for the top and buttom margins. defaults to 45.
+    # - :margin_from_height a number (PDF points) for the top and bottom margins. defaults to 45.
     # - :margin_from_side a number (PDF points) for the left and right margins. defaults to 15.
     # - :page_range a range of pages to be numbered (i.e. (2..-1) ) defaults to all the pages (nil). Remember to set the :start_at to the correct value.
     # the options Hash can also take all the options for {Page_Methods#textbox}.

data/lib/combine_pdf/renderer.rb CHANGED Viewed

@@ -20,8 +20,10 @@ module CombinePDF
         return format_name_to_pdf object
       elsif object.is_a?(Array)
         return format_array_to_pdf object
-      elsif object.is_a?(Numeric) || object.is_a?(TrueClass) || object.is_a?(FalseClass)
+      elsif object.is_a?(Integer) || object.is_a?(TrueClass) || object.is_a?(FalseClass)
         return object.to_s
+      elsif object.is_a?(Numeric) # Float or other non-integer
+        return sprintf('%f', object)
       elsif object.is_a?(Hash)
         return format_hash_to_pdf object
       else
@@ -29,25 +31,30 @@ module CombinePDF
       end
     end
-    STRING_REPLACEMENT_HASH = { "\x0A" => '\\n',
-                                "\x0D" => '\\r',
-                                "\x09" => '\\t',
-                                "\x08" => '\\b',
-                                "\x0C" => '\\f', # form-feed (\f) == 0x0C
-                                "\x28" => '\\(',
-                                "\x29" => '\\)',
-                                "\x5C" => '\\\\' }.dup
-    32.times { |i| STRING_REPLACEMENT_HASH[i.chr] ||= "\\#{i}" }
-    (256 - 127).times { |i| STRING_REPLACEMENT_HASH[(i + 127).chr] ||= "\\#{i + 127}" }
+    STRING_REPLACEMENT_ARRAY = []
+    256.times {|i| STRING_REPLACEMENT_ARRAY[i] = [i]}
+    8.times { |i| STRING_REPLACEMENT_ARRAY[i] =  "\\00#{i.to_s(8)}".bytes.to_a }
+    24.times { |i| STRING_REPLACEMENT_ARRAY[i + 7] =  "\\0#{i.to_s(8)}".bytes.to_a }
+    (256 - 127).times { |i| STRING_REPLACEMENT_ARRAY[(i + 127)] ||= "\\#{(i + 127).to_s(8)}".bytes.to_a }
+    STRING_REPLACEMENT_ARRAY[0x0A] = '\\n'.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x0D] = '\\r'.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x09] = '\\t'.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x08] = '\\b'.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x0C] = '\\f'.bytes.to_a # form-feed (\f) == 0x0C
+    STRING_REPLACEMENT_ARRAY[0x28] = '\\('.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x29] = '\\)'.bytes.to_a
+    STRING_REPLACEMENT_ARRAY[0x5C] = '\\\\'.bytes.to_a
     def format_string_to_pdf(object)
+      obj_bytes = object.bytes.to_a
       # object.force_encoding(Encoding::ASCII_8BIT)
-      if !object.match(/[^D\:\d\+\-Z\']/) # if format is set to Literal and string isn't a date
-        ('(' + ([].tap { |out| object.bytes.to_a.each { |byte| STRING_REPLACEMENT_HASH[byte.chr] ? (STRING_REPLACEMENT_HASH[byte.chr].bytes.each { |b| out << b }) : out << byte } }).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
-      else
-        # A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f)
+      if object.length == 0 || obj_bytes.min <= 31 || obj_bytes.max >= 127 # || (obj_bytes[0] != 68  object.match(/[^D\:\d\+\-Z\']/))
+        # A hexadecimal string shall be written as a sequence of hexadecimal digits (0-9 and either A-F or a-f)
         # encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
         "<#{object.unpack('H*')[0]}>".force_encoding(Encoding::ASCII_8BIT)
+      else
+        # a good fit for a Literal String or the string is a date (MUST be literal)
+        ('(' + ([].tap { |out| obj_bytes.each { |byte| out.concat(STRING_REPLACEMENT_ARRAY[byte]) } } ).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
       end
     end

data/lib/combine_pdf/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module CombinePDF
-  VERSION = '1.0.6'.freeze
+  VERSION = '1.0.22'.freeze
 end

data/lib/combine_pdf.rb CHANGED Viewed

@@ -5,6 +5,7 @@ require 'securerandom'
 require 'strscan'
 require 'matrix'
 require 'set'
+require 'digest'
 # require the RC4 Gem
 require 'rc4'

data/test/automated CHANGED Viewed

@@ -95,6 +95,8 @@ pdf.save('07_named destinations_numbered.pdf')
 CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err.pdf").save '08_1-unknown-err-empty-str.pdf'
 CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err2.pdf").save '08_2-unknown-err-empty-str.pdf'
 CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err3.pdf").save '08_3-unknown-err-empty-str.pdf'
+CombinePDF.load("./Ruby/test\ pdfs/xref_in_middle.pdf").save '08_4-xref-in-middle.pdf'
+CombinePDF.load("./Ruby/test\ pdfs/xref_split.pdf").save '08_5-xref-fragmented.pdf'
 CombinePDF.load("/Users/2Be/Ruby/test\ pdfs/nil_object.pdf").save('09_nil_in_parsed_array.pdf')

data/test/combine_pdf/renderer_test.rb ADDED Viewed

@@ -0,0 +1,22 @@
+require 'bundler/setup'
+require 'minitest/autorun'
+require 'combine_pdf/renderer'
+class CombinePDFRendererTest < Minitest::Test
+  class TestRenderer
+    include CombinePDF::Renderer
+    def test_object(object)
+      object_to_pdf(object)
+    end
+  end
+  def test_numeric_array_to_pdf
+    input = [1.234567, 0.000054, 5, -0.000099]
+    expected = "[1.234567 0.000054 5 -0.000099]".force_encoding('BINARY')
+    actual = TestRenderer.new.test_object(input)
+    assert_equal(expected, actual)
+  end
+end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: combine_pdf
 version: !ruby/object:Gem::Version
-  version: 1.0.6
+  version: 1.0.22
 platform: ruby
 authors:
 - Boaz Segev
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-08-02 00:00:00.000000000 Z
+date: 2021-11-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ruby-rc4
@@ -25,33 +25,47 @@ dependencies:
       - !ruby/object:Gem::Version
         version: 0.1.5
 - !ruby/object:Gem::Dependency
-  name: bundler
+  name: matrix
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.7'
-  type: :development
+        version: '0'
+  type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.7'
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: 12.3.3
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 12.3.3
+- !ruby/object:Gem::Dependency
+  name: minitest
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '10.0'
+        version: '0'
 description: A nifty gem, in pure Ruby, to parse PDF files and combine (merge) them
   with other PDF files, number the pages, watermark them or stamp them, create tables,
   add basic text objects etc` (all using the PDF file format).
@@ -82,13 +96,14 @@ files:
 - lib/combine_pdf/renderer.rb
 - lib/combine_pdf/version.rb
 - test/automated
+- test/combine_pdf/renderer_test.rb
 - test/console
 - test/named_dest
 homepage: https://github.com/boazsegev/combine_pdf
 licenses:
 - MIT
 metadata: {}
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -103,12 +118,12 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.6.11
-signing_key:
+rubygems_version: 3.2.3
+signing_key:
 specification_version: 4
 summary: Combine, stamp and watermark PDF files in pure Ruby.
 test_files:
 - test/automated
+- test/combine_pdf/renderer_test.rb
 - test/console
 - test/named_dest