combine_pdf 1.0.6 → 1.0.22

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: b455928d3e64892b983f743f94243bd3850f1324
4
- data.tar.gz: '0925ba5c1a7754bd54311ba499329771b0bc36b4'
2
+ SHA256:
3
+ metadata.gz: 96825d0aa74bd673883c4d7dbf3884459ff27c2a3d7bd0c60875e0499c7b9aeb
4
+ data.tar.gz: 985c39883f343bb5182344ccc31353103fbac89494000362973f08cdd379d2ac
5
5
  SHA512:
6
- metadata.gz: 1a382e673cf8c042ed95638d44cb9cd804a52190bc06130ba3d9ae5a5aac564184c13c41b6a44ecbf265590a1ed3fdd97a011c252dc647502295255051fb473c
7
- data.tar.gz: 0a77bd1af453712bcff6de7e3fbae6a745c0b3137351c8b147a614aac8d0062feee0044e66cba8e4924213ab7c9020ec5c8087fb9cc5d1258edb7303a23a919b
6
+ metadata.gz: 8575b612e1eb31775833faba8f310d84680d6ce27512d6a9c182e7598a743956da34e0321f0280d032e3e46b861dd2abdd88b297e65a652ec8e3e416ed9fb0a0
7
+ data.tar.gz: 2026d924120f1798681842fee7a2eb0de78be6ac493dcbd4ffbb934c1c0135161ccbf29283fb0eec42b4ebab66f84b7fa3ac354a970fad9bc0ad302f64da7c7a
data/CHANGELOG.md CHANGED
@@ -2,6 +2,90 @@
2
2
 
3
3
  ***
4
4
 
5
+ #### Change log v.1.0.22
6
+
7
+ **Fix**: fix `fonts` dereferencing issue (#203), credit to @MarcWeber (Marc Weber) for identifying the issue.
8
+
9
+ **Fix**: fix `metrix` dependency, credit to @casperisfine (Jean byroot Boussier) for PR #195.
10
+
11
+ #### Change log v.1.0.21
12
+
13
+ **Fix**: possible fix for issue #184, where nested PDF files within an object stream could break the parser. Credit to Greg Sparrow (@hazelsparrow) for exposng the issue.
14
+
15
+ #### Change log v.1.0.20
16
+
17
+ **Fix**: merges PR #180, `TypeError: can't dup NilClass`. Credit to Adam Trepanier (@adam-e-trepanier) for the merge.
18
+
19
+ #### Change log v.1.0.19
20
+
21
+ **Fix**: fixes font height and width detection issue. Issue #179. Credit to @5anchezzz for opening the issue.
22
+
23
+ **Fix**: fixes an indentation warning. Issue #173. Credit to @rubyFeedback for exposing this issue.
24
+
25
+ #### Change log v.1.0.18
26
+
27
+ **Fix**: fixed issue with the 1.0.17 release where `ProcSet` PDF Arrays should have been expected but where ignored and a PDF Object was assumed instead (issue #171) - credit to @chuchiperriman (Jesús Barbero Rodríguez).
28
+
29
+ #### Change log v.1.0.17
30
+
31
+ NB: yanked from RubyGems.org.
32
+
33
+ **Fix**: fixed issue where nested structure equality tests might provide false positives, resulting in lost data (issue #166) - credit to @cschilbe (Conrad Schilbe).
34
+
35
+ #### Change log v.1.0.16
36
+
37
+ **Fix**: some documentation typos were fixed (PR #147) - credit to @djhopper01 (Derek Hopper).
38
+
39
+ #### Change log v.1.0.15
40
+
41
+ **Fix**: An attempt to fix JRuby compatibility concerns (issue #127).
42
+
43
+ #### Change log v.1.0.14
44
+
45
+ **Fix**: Fixed an issue related to PDF XRef table data, where a malformed EOL marker would cause the parser to fail. Credit to @dangerous (David Rainsford) for exposing this issue in a comment to issue #140.
46
+
47
+ #### Change log v.1.0.13
48
+
49
+ **Fix**: Fixed an issue related to PDF object streams (version 1.6) where a numerical object at the beginning of the stream might be mis-parsed as an object reference number rather than an object. Credit to @Defoncesko for reporting issue #141.
50
+
51
+ #### Change log v.1.0.12
52
+
53
+ **Fix**: Fixed an issue introduced in version 1.0.11, where a fragmented XREF table might cause the CombinePDF::Parser to fail. Credit to @solasdev for reporting issue #140.
54
+
55
+ #### Change log v.1.0.11
56
+
57
+ **Fix**: Fixed an issue where small floating point numbers would produce invalid PDF rendering (where exponent notation was used instead of decimal notation). Credit to @avit (Andrew Vit) for PR #139.
58
+
59
+ #### Change log v.1.0.10
60
+
61
+ **Fix**: Fixed an issue related to issue #131 where parsing would fail if the `xref` section appears to be misplaced within the PDF. Credit to @bharat303 (Bharat Godhani) for exposing this issue.
62
+
63
+ #### Change log v.1.0.9
64
+
65
+ **Fix**: Fixed issue #136 where the `#fix_rotation` function would rotate the page to the wrong direction. Credit to @dmkash for exposing this issue.
66
+
67
+ #### Change log v.1.0.8
68
+
69
+ **Fix**: Fixed an issue with octal representation in escaped string data. The issue would (usually) go unnoticed (altering internal labels in a non-disruptive manner), however the issue did effect `ColorSpace` data in the rare use of `ICCBased` color maps, causing color distortion and transparency loss. Credit to @react-rails and @bedaronco for exposing the issue (issue #130).
70
+
71
+ **Fix**: Fixed an issue with non English alphabet in PDF literal strings. This issue went undetected since PDF literal strings aren't used by CombinePDF except for the date stamping...
72
+
73
+ **Fix**: Improbable, but possibly a fix for issue #127, where the JRuby interpreter would fail to pass the correct arguments to the Hash update Proc. Since I'm trying to author a workaround, I have my doubts... but an attempt is better than nothing.
74
+
75
+ **Update**: Improved parsing error handling, courtesy of Evgeny Garlukovich (@evgenygarl).
76
+
77
+ **Update**: Added reader methods for the `names` and `outlines` PDF objects in response to issue #133. Use with care.
78
+
79
+ #### Change log v.1.0.7
80
+
81
+ **Fix**: Fix an issue where page property inheritance might break PDF structure if there's a conflict between property types (inheritance using properties by reference vs. nested properties), fixing issue #124. Credit to @erikaxel for exposing the issue.
82
+
83
+ #### Change log v.1.0.6
84
+
85
+ **Fix**: Fix warnings, issue #120. Credit to @lloeki for exposing the issue.
86
+
87
+ **Fix**: Fix / add adjustable nesting protection, fixing issue #117. Credit to @emmanuelmillionaer for exposing the issue.
88
+
5
89
  #### Change log v.1.0.5
6
90
 
7
91
  **Fix**: Fix issue #116 where some PDF objects (the page catalog and some root information data) were written twice to the saved PDF file (or String). Credit to @albertsaave for exposing the issue using GhostScript.
data/README.md CHANGED
@@ -1,8 +1,10 @@
1
1
  # CombinePDF - the ruby way for merging PDF files
2
2
  [![Gem Version](https://badge.fury.io/rb/combine_pdf.svg)](http://badge.fury.io/rb/combine_pdf)
3
3
  [![GitHub](https://img.shields.io/badge/GitHub-Open%20Source-blue.svg)](https://github.com/boazsegev/combine_pdf)
4
+ [![Documentation](http://inch-ci.org/github/boazsegev/combine_pdf.svg?branch=master)](https://www.rubydoc.info/github/boazsegev/combine_pdf)
4
5
  [![Maintainers Wanted](https://img.shields.io/badge/maintainers-wanted-red.svg)](https://github.com/pickhardt/maintainers-wanted)
5
6
 
7
+
6
8
  CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).
7
9
 
8
10
  ## Install
@@ -41,6 +43,8 @@ Quick rundown:
41
43
 
42
44
  * Sometimes the CombinePDF will raise an exception even if the PDF could be parsed (i.e., when PDF optional content exists)... I find it better to err on the side of caution, although for optional content PDFs an exception is avoidable using `CombinePDF.load(pdf_file, allow_optional_content: true)`.
43
45
 
46
+ * The CombinePDF gem runs recursive code to both parse and format the PDF files. Hence, PDF files that have heavily nested objects, as well as those that where combined in a way that results in cyclic nesting, might explode the stack - resulting in an exception or program failure.
47
+
44
48
  CombinePDF is written natively in Ruby and should (presumably) work on all Ruby platforms that follow Ruby 2.0 compatibility.
45
49
 
46
50
  However, PDF files are quite complex creatures and no guaranty is provided.
@@ -112,7 +116,42 @@ pdf.number_pages
112
116
  pdf.save "file_with_numbering.pdf"
113
117
  ```
114
118
 
115
- Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - see documentation.
119
+ Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - [see documentation](https://www.rubydoc.info/github/boazsegev/combine_pdf/CombinePDF/PDF#number_pages-instance_method).
120
+
121
+ For example, should you prefer to place the page number on the bottom right side of all PDF pages, do:
122
+
123
+ ```ruby
124
+ pdf.number_pages(location: [:bottom_right])
125
+ ```
126
+
127
+ As another example, the dashes around the number are removed and a box is placed around it. The numbering is semi-transparent and the first 3 pages are numbered using letters (a,b,c) rather than numbers:
128
+
129
+
130
+ ```ruby
131
+ # number first 3 pages as "a", "b", "c"
132
+ pdf.number_pages(number_format: " %s ",
133
+ location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
134
+ start_at: "a",
135
+ page_range: (0..2),
136
+ box_color: [0.8,0.8,0.8],
137
+ border_color: [0.4, 0.4, 0.4],
138
+ border_width: 1,
139
+ box_radius: 6,
140
+ opacity: 0.75)
141
+ # number the rest of the pages as 4, 5, ... etc'
142
+ pdf.number_pages(number_format: " %s ",
143
+ location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
144
+ start_at: 4,
145
+ page_range: (3..-1),
146
+ box_color: [0.8,0.8,0.8],
147
+ border_color: [0.4, 0.4, 0.4],
148
+ border_width: 1,
149
+ box_radius: 6,
150
+ opacity: 0.75)
151
+ ```
152
+
153
+ pdf.number_pages(number_format: " %s ", location: :bottom_right, font_size: 44)
154
+
116
155
 
117
156
  ## Loading and Parsing PDF data
118
157
 
data/combine_pdf.gemspec CHANGED
@@ -19,7 +19,9 @@ Gem::Specification.new do |spec|
19
19
  spec.require_paths = ["lib"]
20
20
 
21
21
  spec.add_runtime_dependency 'ruby-rc4', '>= 0.1.5'
22
+ spec.add_runtime_dependency 'matrix'
22
23
 
23
- spec.add_development_dependency "bundler", "~> 1.7"
24
- spec.add_development_dependency "rake", "~> 10.0"
24
+ # spec.add_development_dependency "bundler", ">= 1.7"
25
+ spec.add_development_dependency "rake", ">= 12.3.3"
26
+ spec.add_development_dependency "minitest"
25
27
  end
@@ -24,11 +24,11 @@ module CombinePDF
24
24
  raise TypeError, "couldn't create PDF object, expecting type String" unless string.is_a?(String) || string.is_a?(Pathname)
25
25
  begin
26
26
  (begin
27
- File.file? string
28
- rescue
29
- false
30
- end) ? load(string) : parse(string)
31
- rescue => _e
27
+ File.file? string
28
+ rescue
29
+ false
30
+ end) ? load(string) : parse(string)
31
+ rescue => _e
32
32
  raise 'General PDF error - Use CombinePDF.load or CombinePDF.parse for a non-general error message (the requested file was not found OR the string received is not a valid PDF stream OR the file was found but not valid).'
33
33
  end
34
34
  end
@@ -140,7 +140,7 @@ module CombinePDF
140
140
  # this function enables plug-ins to expend the font functionality of CombinePDF.
141
141
  #
142
142
  # font_name:: a Symbol with the name of the font. if the fonts exists in the library, it will be overwritten!
143
- # font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, buttom_y, right_x, top_y]} where char == character itself (i.e. " " for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
143
+ # font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where char == character itself (i.e. " " for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
144
144
  # font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
145
145
  # font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
146
146
  def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
@@ -100,7 +100,7 @@ module CombinePDF
100
100
 
101
101
  # adds a correctly formatted font object to the font library.
102
102
  # font_name:: a Symbol with the name of the font. if the fonts name exists, the font will be overwritten!
103
- # font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x, buttom_y, right_x, top_y]} where i == character code (i.e. 32 for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy will be supported in the future, for up to down fonts.
103
+ # font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where i == character code (i.e. 32 for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy will be supported in the future, for up to down fonts.
104
104
  # font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
105
105
  # font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
106
106
  def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
@@ -138,12 +138,21 @@ module CombinePDF
138
138
  text.each_char do |c|
139
139
  metrics_array << (merged_metrics[c] || { wx: 0, boundingbox: [0, 0, 0, 0] })
140
140
  end
141
- height = metrics_array.map { |m| m ? m[:boundingbox][3] : 0 } .max
142
- height -= (metrics_array.map { |m| m ? m[:boundingbox][1] : 0 }).min
141
+ metrics_array_mapped_top = [].dup
142
+ metrics_array_mapped_bottom = [].dup
143
143
  width = 0.0
144
144
  metrics_array.each do |m|
145
- width += (m[:wx] || m[:wy])
145
+ if (m && m[:boundingbox])
146
+ metrics_array_mapped_top << m[:boundingbox][3]
147
+ metrics_array_mapped_bottom << m[:boundingbox][1]
148
+ else
149
+ metrics_array_mapped_top << 0
150
+ metrics_array_mapped_bottom << 0
151
+ end
152
+ width += (m[:wx] || m[:wy] || 0) if m
146
153
  end
154
+ height = metrics_array_mapped_top.max
155
+ height -=metrics_array_mapped_bottom.min
147
156
  return [height.to_f / 1000 * size, width.to_f / 1000 * size] if metrics_array[0][:wy]
148
157
  [width.to_f / 1000 * size, height.to_f / 1000 * size]
149
158
  end
@@ -94,7 +94,7 @@ module CombinePDF
94
94
  # end
95
95
 
96
96
  # set ProcSet to recommended value
97
- resources[:ProcSet] = [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
97
+ resources[:ProcSet] ||= [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
98
98
 
99
99
  if top # if this is a stamp (overlay)
100
100
  insert_content CONTENT_CONTAINER_START, 0
@@ -147,15 +147,15 @@ module CombinePDF
147
147
 
148
148
  # This method adds a simple text box to the Page represented by the PDFWriter class.
149
149
  # This function takes two values:
150
- # text:: the text to potin the box.
150
+ # text:: the text to write in the box.
151
151
  # properties:: a Hash of box properties.
152
152
  # the symbols and values in the properties Hash could be any or all of the following:
153
153
  # x:: the left position of the box.
154
- # y:: the BUTTOM position of the box.
154
+ # y:: the BOTTOM position of the box.
155
155
  # width:: the width/length of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
156
156
  # height:: the height of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
157
157
  # text_align:: symbol for horizontal text alignment, can be ":center" (default), ":right", ":left"
158
- # text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":buttom"
158
+ # text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":bottom"
159
159
  # text_padding:: a Float between 0 and 1, setting the padding for the text. defaults to 0.05 (5%).
160
160
  # font:: a registered font name or an Array of names. defaults to ":Helvetica". The 14 standard fonts names are:
161
161
  # - :"Times-Roman"
@@ -244,8 +244,8 @@ module CombinePDF
244
244
  half_radius = (radius.to_f / 2).round 4
245
245
  ## set starting point
246
246
  box_stream << "#{options[:x] + radius} #{options[:y]} m\n"
247
- ## buttom and right corner - first line and first corner
248
- box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" # buttom
247
+ ## bottom and right corner - first line and first corner
248
+ box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" # bottom
249
249
  if options[:box_radius] != 0 # make first corner, if not straight.
250
250
  box_stream << "#{options[:x] + options[:width] - half_radius} #{options[:y]} "
251
251
  box_stream << "#{options[:x] + options[:width]} #{options[:y] + half_radius} "
@@ -265,7 +265,7 @@ module CombinePDF
265
265
  box_stream << "#{options[:x]} #{options[:y] + options[:height] - half_radius} "
266
266
  box_stream << "#{options[:x]} #{options[:y] + options[:height] - radius} c\n"
267
267
  end
268
- ## left and buttom-left corner
268
+ ## left and bottom-left corner
269
269
  box_stream << "#{options[:x]} #{options[:y] + radius} l\n"
270
270
  if options[:box_radius] != 0
271
271
  box_stream << "#{options[:x]} #{options[:y] + half_radius} "
@@ -287,7 +287,7 @@ module CombinePDF
287
287
  end
288
288
  contents << box_stream
289
289
 
290
- # reset x,y by text alignment - x,y are calculated from the buttom left
290
+ # reset x,y by text alignment - x,y are calculated from the bottom left
291
291
  # each unit (1) is 1/72 Inch
292
292
  # create text stream
293
293
  text_stream = ''
@@ -403,7 +403,7 @@ module CombinePDF
403
403
  def fix_rotation
404
404
  return self if self[:Rotate].to_f == 0.0 || mediabox.nil?
405
405
  # calculate the rotation
406
- r = self[:Rotate].to_f * Math::PI / 180
406
+ r = (360.0 - self[:Rotate].to_f) * Math::PI / 180
407
407
  s = Math.sin(r).round 6
408
408
  c = Math.cos(r).round 6
409
409
  ctm = [c, s, -s, c]
@@ -649,7 +649,6 @@ module CombinePDF
649
649
  page_res.each do |k, v|
650
650
  v = page_res[k] = v.dup if v.is_a?(Array) || v.is_a?(Hash)
651
651
  v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
652
- v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
653
652
  end
654
653
  end
655
654
  page_copy.instance_exec(secure || @secure_injection) { |s| secure_for_copy if s; init_contents; self }
@@ -6,6 +6,8 @@
6
6
  ########################################################
7
7
 
8
8
  module CombinePDF
9
+ ParsingError = Class.new(StandardError)
10
+
9
11
  # @!visibility private
10
12
  # @private
11
13
  #:nodoc: all
@@ -77,7 +79,10 @@ module CombinePDF
77
79
  @parsed = _parse_
78
80
  # puts @parsed
79
81
 
80
- raise 'Unknown PDF parsing error - malformed PDF file?' unless (@parsed.select { |i| !i.is_a?(Hash) }).empty?
82
+ unless (@parsed.select { |i| !i.is_a?(Hash) }).empty?
83
+ # p @parsed.select
84
+ raise ParsingError, 'Unknown PDF parsing error - malformed PDF file?'
85
+ end
81
86
 
82
87
  if @root_object == {}.freeze
83
88
  xref_streams = @parsed.select { |obj| obj.is_a?(Hash) && obj[:Type] == :XRef }
@@ -86,7 +91,9 @@ module CombinePDF
86
91
  end
87
92
  end
88
93
 
89
- raise 'root is unknown - cannot determine if file is Encrypted' if @root_object == {}.freeze
94
+ if @root_object == {}.freeze
95
+ raise ParsingError, 'root is unknown - cannot determine if file is Encrypted'
96
+ end
90
97
 
91
98
  if @root_object[:Encrypt]
92
99
  # change_references_to_actual_values @root_object
@@ -105,12 +112,13 @@ module CombinePDF
105
112
  next unless o.is_a?(Hash) && o[:Type] == :ObjStm
106
113
  ## un-encode (using the correct filter) the object streams
107
114
  PDFFilter.inflate_object o
115
+ # puts "Object Stream Found:", o[:raw_stream_content]
108
116
  ## extract objects from stream
109
117
  @scanner = StringScanner.new o[:raw_stream_content]
110
118
  stream_data = _parse_
111
119
  id_array = []
112
120
  collection = [nil]
113
- while stream_data[0].is_a? (Numeric)
121
+ while (stream_data[0].is_a?(Numeric) && stream_data[1].is_a?(Numeric))
114
122
  id_array << stream_data.shift
115
123
  stream_data.shift
116
124
  end
@@ -225,16 +233,18 @@ module CombinePDF
225
233
  # all characters that aren't white space or special: /[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]+
226
234
  elsif str = @scanner.scan(/\/[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]*/)
227
235
  out << (str[1..-1].gsub(/\#[0-9a-fA-F]{2}/) { |a| a[1..2].hex.chr }).to_sym
236
+ # warn "CombinePDF detected name: #{out.last.to_s}"
228
237
  ##########################################
229
238
  ## Parse a Number
230
239
  ##########################################
231
240
  elsif str = @scanner.scan(/[\+\-\.\d]+/)
232
241
  str =~ /\./ ? (out << str.to_f) : (out << str.to_i)
242
+ # warn "CombinePDF detected number: #{out.last.to_s}"
233
243
  ##########################################
234
244
  ## parse a Hex String
235
245
  ##########################################
236
246
  elsif str = @scanner.scan(/\<[0-9a-fA-F]*\>/)
237
- # warn "Found a hex string"
247
+ # warn "Found a hex string #{str}"
238
248
  str = str.slice(1..-2).force_encoding(Encoding::ASCII_8BIT)
239
249
  # str = "0#{str}" if str.length.odd?
240
250
  out << unify_string([str].pack('H*').force_encoding(Encoding::ASCII_8BIT))
@@ -310,10 +320,10 @@ module CombinePDF
310
320
  when 102 # f, form-feed
311
321
  str << 12
312
322
  when 48..57 # octal notation for byte?
313
- rep = rep.chr
314
- rep += str_bytes.shift.chr if str_bytes[0].between?(48, 57)
315
- rep += str_bytes.shift.chr if str_bytes[0].between?(48, 57) && ((rep + str_bytes[0].chr).to_i <= 255)
316
- str << rep.to_i
323
+ rep -= 48
324
+ rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57)
325
+ rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57) && (((rep << 3) + (str_bytes[0] - 48)) <= 255)
326
+ str << rep
317
327
  when 10 # new line, ignore
318
328
  str_bytes.shift if str_bytes[0] == 13
319
329
  true
@@ -328,6 +338,7 @@ module CombinePDF
328
338
  end
329
339
  end
330
340
  out << unify_string(str.pack('C*').force_encoding(Encoding::ASCII_8BIT))
341
+ # warn "Found Literal String: #{out.last}"
331
342
  ##########################################
332
343
  ## parse a Dictionary
333
344
  ##########################################
@@ -340,25 +351,42 @@ module CombinePDF
340
351
  ## return content of array or dictionary
341
352
  ##########################################
342
353
  elsif @scanner.scan(/\]/) || @scanner.scan(/>>/)
354
+ # warn "Dictionary / Array ended with #{@scanner.peek(5)}"
343
355
  return out
344
356
  ##########################################
345
357
  ## parse a Stream
346
358
  ##########################################
347
359
  elsif @scanner.scan(/stream[ \t]*[\r\n]/)
348
360
  @scanner.pos += 1 if @scanner.peek(1) == "\n".freeze && @scanner.matched[-1] != "\n".freeze
361
+ # advance by the publshed stream length (if any)
362
+ old_pos = @scanner.pos
363
+ if(out.last.is_a?(Hash) && out.last[:Length].is_a?(Integer) && out.last[:Length] > 2)
364
+ @scanner.pos += out.last[:Length] - 2
365
+ end
366
+
349
367
  # the following was dicarded because some PDF files didn't have an EOL marker as required
350
368
  # str = @scanner.scan_until(/(\r\n|\r|\n)endstream/)
351
369
  # instead, a non-strict RegExp is used:
352
- str = @scanner.scan_until(/endstream/)
370
+
371
+
353
372
  # raise error if the stream doesn't end.
354
- raise "Parsing Error: PDF file error - a stream object wasn't properly closed using 'endstream'!" unless str
373
+ unless @scanner.skip_until(/endstream/)
374
+ raise ParsingError, "Parsing Error: PDF file error - a stream object wasn't properly closed using 'endstream'!"
375
+ end
376
+ length = @scanner.pos - (old_pos + 9)
377
+ length = 0 if(length < 0)
378
+ length -= 1 if(@scanner.string[old_pos + length - 1] == "\n")
379
+ length -= 1 if(@scanner.string[old_pos + length - 1] == "\r")
380
+ str = (length > 0) ? @scanner.string.slice(old_pos, length) : ''
381
+
382
+ # warn "CombinePDF parser: detected Stream #{str.length} bytes long #{str[0..3]}...#{str[-4..-1]}"
383
+
355
384
  # need to remove end of stream
356
385
  if out.last.is_a? Hash
357
- # out.last[:raw_stream_content] = str[0...-10] #cuts only one EON char (\n or \r)
358
- out.last[:raw_stream_content] = unify_string str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
386
+ out.last[:raw_stream_content] = unify_string str.force_encoding(Encoding::ASCII_8BIT)
359
387
  else
360
388
  warn 'Stream not attached to dictionary!'
361
- out << str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
389
+ out << str.force_encoding(Encoding::ASCII_8BIT)
362
390
  end
363
391
  ##########################################
364
392
  ## parse an Object after finished
@@ -375,17 +403,6 @@ module CombinePDF
375
403
  out.last[:Dest] = unify_string(out.last[:Dest].to_s) if out.last[:Dest] && out.last[:Dest].is_a?(Symbol)
376
404
  # puts "!!!!!!!!! Error with :indirect_reference_id\n\nObject #{out.last} :indirect_reference_id = #{out.last[:indirect_reference_id]}" unless out.last[:indirect_reference_id].is_a?(Numeric)
377
405
  ##########################################
378
- ## Parse a comment
379
- ##########################################
380
- elsif str = @scanner.scan(/\%/)
381
- # is a comment, skip until new line
382
- loop do
383
- # break unless @scanner.scan(/[^\d\r\n]+/)
384
- break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
385
- @scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
386
- end
387
- # puts "AFTER COMMENT: #{@scanner.peek 8}"
388
- ##########################################
389
406
  ## Parse an Object Reference
390
407
  ##########################################
391
408
  elsif @scanner.scan(/R/)
@@ -404,32 +421,55 @@ module CombinePDF
404
421
  elsif @scanner.scan(/null/)
405
422
  out << nil
406
423
  ##########################################
424
+ ## Parse file trailer
425
+ ##########################################
426
+ elsif @scanner.scan(/trailer/)
427
+ if @scanner.skip_until(/<</)
428
+ data = _parse_
429
+ (@root_object ||= {}).clear
430
+ @root_object[data.shift] = data.shift while data[0]
431
+ end
432
+ ##########################################
407
433
  ## XREF - check for encryption... anything else?
408
434
  ##########################################
409
- elsif @scanner.scan(/(startxref)|(xref)/)
410
- ##########
411
- ## get root object to check for encryption
412
- @scanner.scan_until(/(trailer)|(\%EOF)/)
413
- fresh = true
414
- if @scanner.matched[-1] == 'r'
415
- if @scanner.skip_until(/<</)
416
- data = _parse_
417
- (@root_object ||= {}).clear
418
- @root_object[data.shift] = data.shift while data[0]
419
- end
420
- ##########
421
- ## skip untill end of segment, maked by %%EOF
422
- @scanner.skip_until(/\%\%EOF/)
423
- ##########
424
- ## If this was the last valid segment, ignore any trailing garbage
425
- ## (issue #49 resolution)
426
- break unless @scanner.exist?(/\%\%EOF/)
427
-
435
+ elsif @scanner.scan(/xref/)
436
+ # skip list indetifier lines or list lines ([\d] [\d][\r\n]) ot ([\d] [\d] [nf][\r\n])
437
+ while @scanner.scan(/[\s]*[\d]+[ \t]+[\d]+[ \t]*[\n\r]+/) || @scanner.scan(/[ \t]*[\d]+[ \t]+[\d]+[ \t]+[nf][\s]*/)
438
+ nil
428
439
  end
429
-
440
+ ##########################################
441
+ ## XREF location can be ignored
442
+ ##########################################
443
+ elsif @scanner.scan(/startxref/)
444
+ @scanner.scan(/[\s]+[\d]+[\s]+/)
445
+ ##########################################
446
+ ## Skip Whitespace
447
+ ##########################################
430
448
  elsif @scanner.scan(/[\s]+/)
431
449
  # Generally, do nothing
432
450
  nil
451
+ ##########################################
452
+ ## EOF?
453
+ ##########################################
454
+ elsif @scanner.scan(/\%\%EOF/)
455
+ ##########
456
+ ## If this was the last valid segment, ignore any trailing garbage
457
+ ## (issue #49 resolution)
458
+ break unless @scanner.exist?(/\%\%EOF/)
459
+ ##########################################
460
+ ## Parse a comment
461
+ ##########################################
462
+ elsif str = @scanner.scan(/\%/)
463
+ # is a comment, skip until new line
464
+ loop do
465
+ # break unless @scanner.scan(/[^\d\r\n]+/)
466
+ break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
467
+ @scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
468
+ end
469
+ # puts "AFTER COMMENT: #{@scanner.peek 8}"
470
+ ##########################################
471
+ ## Fix wkhtmltopdf - missing 'endobj' keywords
472
+ ##########################################
433
473
  elsif @scanner.scan(/obj[\s]*/)
434
474
  # Fix wkhtmltopdf PDF authoring issue - missing 'endobj' keywords
435
475
  unless fresh || (out[-4].nil? || out[-4].is_a?(Hash))
@@ -450,6 +490,9 @@ module CombinePDF
450
490
  out << keep.pop
451
491
  end
452
492
  fresh = false
493
+ ##########################################
494
+ ## Unknown, warn and advance
495
+ ##########################################
453
496
  else
454
497
  # always advance
455
498
  # warn "Advancing for unknown reason... #{@scanner.string[@scanner.pos - 4, 8]} ... #{@scanner.peek(4)}" unless @scanner.peek(1) =~ /[\s\n]/
@@ -475,7 +518,9 @@ module CombinePDF
475
518
  @parsed.delete_if { |obj| obj.nil? || obj[:Type] == :Catalog }
476
519
  @parsed << catalogs
477
520
 
478
- raise "Unknown error - parsed data doesn't contain a cataloged object!" unless catalogs
521
+ unless catalogs
522
+ raise ParsingError, "Unknown error - parsed data doesn't contain a cataloged object!"
523
+ end
479
524
  end
480
525
  if catalogs.is_a?(Array)
481
526
  catalogs.each { |c| catalog_pages(c, inheritance_hash) unless c.nil? }
@@ -488,20 +533,31 @@ module CombinePDF
488
533
  end
489
534
  else
490
535
  unless catalogs[:Type] == :Page
491
- raise "Optional Content PDF files aren't supported and their pages cannot be safely extracted." if (catalogs[:AS] || catalogs[:OCProperties]) && !@allow_optional_content
536
+ if (catalogs[:AS] || catalogs[:OCProperties]) && !@allow_optional_content
537
+ raise ParsingError, "Optional Content PDF files aren't supported and their pages cannot be safely extracted."
538
+ end
539
+
492
540
  inheritance_hash[:MediaBox] = catalogs[:MediaBox] if catalogs[:MediaBox]
493
541
  inheritance_hash[:CropBox] = catalogs[:CropBox] if catalogs[:CropBox]
494
542
  inheritance_hash[:Rotate] = catalogs[:Rotate] if catalogs[:Rotate]
495
543
  if catalogs[:Resources]
496
544
  inheritance_hash[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
497
- (inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &self.class.method(:hash_update_proc_for_old))
545
+ (inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
546
+ end
547
+ if catalogs[:ProcSet].is_a?(Array)
548
+ if(inheritance_hash[:ProcSet])
549
+ inheritance_hash[:ProcSet][:referenced_object].concat(catalogs[:ProcSet])
550
+ inheritance_hash[:ProcSet][:referenced_object].uniq!
551
+ else
552
+ inheritance_hash[:ProcSet] ||= { referenced_object: catalogs[:ProcSet], is_reference_only: true }.dup
553
+ end
498
554
  end
499
555
  if catalogs[:ColorSpace]
500
556
  inheritance_hash[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
501
- (inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &self.class.method(:hash_update_proc_for_old))
557
+ (inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
502
558
  end
503
- # (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Resources]
504
- # (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:ColorSpace]
559
+ # (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Resources]
560
+ # (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:ColorSpace]
505
561
 
506
562
  # inheritance_hash[:Order] = catalogs[:Order] if catalogs[:Order]
507
563
  # inheritance_hash[:OCProperties] = catalogs[:OCProperties] if catalogs[:OCProperties]
@@ -516,13 +572,27 @@ module CombinePDF
516
572
  catalogs[:Rotate] ||= inheritance_hash[:Rotate] if inheritance_hash[:Rotate]
517
573
  if inheritance_hash[:Resources]
518
574
  catalogs[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
519
- (catalogs[:Resources][:referenced_object] || catalogs[:Resources]).update((inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]), &self.class.method(:hash_update_proc_for_old))
575
+ catalogs[:Resources] = { referenced_object: catalogs[:Resources], is_reference_only: true } unless catalogs[:Resources][:referenced_object]
576
+ catalogs[:Resources][:referenced_object].update((inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
520
577
  end
521
578
  if inheritance_hash[:ColorSpace]
522
579
  catalogs[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
523
- (catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]).update((inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]), &self.class.method(:hash_update_proc_for_old))
580
+ catalogs[:ColorSpace] = { referenced_object: catalogs[:ColorSpace], is_reference_only: true } unless catalogs[:ColorSpace][:referenced_object]
581
+ catalogs[:ColorSpace][:referenced_object].update((inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
524
582
  end
525
- # (catalogs[:ColorSpace] ||= {}).update(inheritance_hash[:ColorSpace], &self.class.method(:hash_update_proc_for_old)) if inheritance_hash[:ColorSpace]
583
+ if inheritance_hash[:ProcSet]
584
+ if(catalogs[:ProcSet])
585
+ if catalogs[:ProcSet].is_a?(Array)
586
+ catalogs[:ProcSet] = { referenced_object: catalogs[:ProcSet], is_reference_only: true }
587
+ end
588
+ catalogs[:ProcSet][:referenced_object].concat(inheritance_hash[:ProcSet][:referenced_object])
589
+ catalogs[:ProcSet][:referenced_object].uniq!
590
+ else
591
+ catalogs[:ProcSet] = { is_reference_only: true }.dup
592
+ catalogs[:ProcSet][:referenced_object] = []
593
+ end
594
+ end
595
+ # (catalogs[:ColorSpace] ||= {}).update(inheritance_hash[:ColorSpace], &HASH_UPDATE_PROC_FOR_OLD) if inheritance_hash[:ColorSpace]
526
596
  # catalogs[:Order] ||= inheritance_hash[:Order] if inheritance_hash[:Order]
527
597
  # catalogs[:AS] ||= inheritance_hash[:AS] if inheritance_hash[:AS]
528
598
  # catalogs[:OCProperties] ||= inheritance_hash[:OCProperties] if inheritance_hash[:OCProperties]
@@ -536,9 +606,9 @@ module CombinePDF
536
606
  when :Pages
537
607
  catalog_pages(catalogs[:Kids], inheritance_hash.dup) unless catalogs[:Kids].nil?
538
608
  when :Catalog
539
- @forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:AcroForm]
540
- @names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Names]
541
- @outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &self.class.method(:hash_update_proc_for_new)) if catalogs[:Outlines]
609
+ @forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:AcroForm]
610
+ @names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Names]
611
+ @outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Outlines]
542
612
  if catalogs[:Dests] # convert PDF 1.1 Dests to PDF 1.2+ Dests
543
613
  dests_arry = (@names_object[:Dests] ||= {})
544
614
  dests_arry = ((dests_arry[:referenced_object] || dests_arry)[:Names] ||= [])
@@ -652,30 +722,45 @@ module CombinePDF
652
722
 
653
723
  # All Strings are one String
654
724
  def unify_string(str)
725
+ str.force_encoding(Encoding::ASCII_8BIT)
655
726
  @strings_dictionary[str] ||= str
656
727
  end
657
728
 
658
729
  # @private
659
730
  # this method reviews a Hash and updates it by merging Hash data,
660
731
  # preffering the old over the new.
661
- def self.hash_update_proc_for_old(_key, old_data, new_data)
732
+ HASH_UPDATE_PROC_FOR_OLD = Proc.new do |_key, old_data, new_data|
662
733
  if old_data.is_a? Hash
663
- old_data.merge(new_data, &method(:hash_update_proc_for_old))
734
+ old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_OLD)
664
735
  else
665
736
  old_data
666
737
  end
667
738
  end
739
+ # def self.hash_update_proc_for_old(_key, old_data, new_data)
740
+ # if old_data.is_a? Hash
741
+ # old_data.merge(new_data, &method(:hash_update_proc_for_old))
742
+ # else
743
+ # old_data
744
+ # end
745
+ # end
668
746
 
669
747
  # @private
670
748
  # this method reviews a Hash an updates it by merging Hash data,
671
749
  # preffering the new over the old.
672
- def self.hash_update_proc_for_new(_key, old_data, new_data)
750
+ HASH_UPDATE_PROC_FOR_NEW = Proc.new do |_key, old_data, new_data|
673
751
  if old_data.is_a? Hash
674
- old_data.merge(new_data, &method(:hash_update_proc_for_new))
752
+ old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_NEW)
675
753
  else
676
754
  new_data
677
755
  end
678
756
  end
757
+ # def self.hash_update_proc_for_new(_key, old_data, new_data)
758
+ # if old_data.is_a? Hash
759
+ # old_data.merge(new_data, &method(:hash_update_proc_for_new))
760
+ # else
761
+ # new_data
762
+ # end
763
+ # end
679
764
 
680
765
  # # run block of code on evey PDF object (PDF objects are class Hash)
681
766
  # def each_object(object, limit_references = true, already_visited = {}, &block)
@@ -137,11 +137,14 @@ module CombinePDF
137
137
  catalog_object
138
138
  end
139
139
 
140
+ # Deprecation Notice
140
141
  def names_object
142
+ puts "CombinePDF Deprecation Notice: the protected method `names_object` will be deprecated in the upcoming version. Use `names` instead."
141
143
  @names
142
144
  end
143
145
 
144
146
  def outlines_object
147
+ puts "CombinePDF Deprecation Notice: the protected method `outlines_object` will be deprecated in the upcoming version. Use `oulines` instead."
145
148
  @outlines
146
149
  end
147
150
  # def forms_data
@@ -229,15 +232,42 @@ module CombinePDF
229
232
  # @private
230
233
  # this method reviews a Hash and updates it by merging Hash data,
231
234
  # preffering the new over the old.
232
- def self.hash_merge_new_no_page(_key, old_data, new_data)
233
- return old_data unless new_data
234
- return new_data unless old_data
235
- if old_data.is_a?(Hash) && new_data.is_a?(Hash)
236
- return old_data if (old_data[:Type] == :Page)
237
- old_data.merge(new_data, &(@hash_merge_new_no_page_proc ||= method(:hash_merge_new_no_page)))
235
+ # def self.hash_merge_new_no_page(_key = nil, old_data = nil, new_data = nil)
236
+ # return old_data unless new_data
237
+ # return new_data unless old_data
238
+ # if old_data.is_a?(Hash) && new_data.is_a?(Hash)
239
+ # return old_data if (old_data[:Type] == :Page)
240
+ # old_data.merge(new_data, &(@hash_merge_new_no_page_proc ||= method(:hash_merge_new_no_page)))
241
+ # elsif old_data.is_a? Array
242
+ # return old_data + new_data if new_data.is_a?(Array)
243
+ # return old_data.dup << new_data
244
+ # elsif new_data.is_a? Array
245
+ # new_data + [old_data]
246
+ # else
247
+ # new_data
248
+ # end
249
+ # end
250
+
251
+ # @private
252
+ # JRuby Alternative this method reviews a Hash and updates it by merging Hash data,
253
+ # preffering the new over the old.
254
+ HASH_MERGE_NEW_NO_PAGE = Proc.new do |_key = nil, old_data = nil, new_data = nil|
255
+ if !new_data
256
+ old_data
257
+ elsif !old_data
258
+ new_data
259
+ elsif old_data.is_a?(Hash) && new_data.is_a?(Hash)
260
+ if (old_data[:Type] == :Page)
261
+ old_data
262
+ else
263
+ old_data.merge(new_data, &HASH_MERGE_NEW_NO_PAGE)
264
+ end
238
265
  elsif old_data.is_a? Array
239
- return old_data + new_data if new_data.is_a?(Array)
240
- return old_data.dup << new_data
266
+ if new_data.is_a?(Array)
267
+ old_data + new_data
268
+ else
269
+ old_data.dup << new_data
270
+ end
241
271
  elsif new_data.is_a? Array
242
272
  new_data + [old_data]
243
273
  else
@@ -343,16 +373,19 @@ module CombinePDF
343
373
  private
344
374
 
345
375
  def equal_layers obj1, obj2, layer = CombinePDF.eq_depth_limit
346
- return true if(layer == 0)
347
376
  return true if obj1.object_id == obj2.object_id
348
377
  if obj1.is_a? Hash
349
378
  return false unless obj2.is_a? Hash
379
+ return false unless obj1.length == obj2.length
350
380
  keys = obj1.keys;
351
- return false if (keys - obj2.keys).any?
381
+ keys2 = obj2.keys;
382
+ return false if (keys - keys2).any? || (keys2 - keys).any?
383
+ return (warn("CombinePDF nesting limit reached") || true) if(layer == 0)
352
384
  keys.each {|k| return false unless equal_layers( obj1[k], obj2[k], layer-1) }
353
385
  elsif obj1.is_a? Array
354
386
  return false unless obj2.is_a? Array
355
- (obj1-obj2).any?
387
+ return false unless obj1.length == obj2.length
388
+ (obj1-obj2).any? || (obj2-obj1).any?
356
389
  else
357
390
  obj1 == obj2
358
391
  end
@@ -82,6 +82,10 @@ module CombinePDF
82
82
  # use, for example:
83
83
  # pdf.viewer_preferences[:HideMenubar] = true
84
84
  attr_reader :viewer_preferences
85
+ # Access the Outlines PDF object Hash (or reference). Use with care.
86
+ attr_reader :outlines
87
+ # Access the Names PDF object Hash (or reference). Use with care.
88
+ attr_reader :names
85
89
 
86
90
  def initialize(parser = nil)
87
91
  # default before setting
@@ -207,7 +211,7 @@ module CombinePDF
207
211
  # when finished, remove the numbering system and keep only pointers
208
212
  remove_old_ids
209
213
  # output the pdf stream
210
- out.join("\n").force_encoding(Encoding::ASCII_8BIT)
214
+ out.join("\n".force_encoding(Encoding::ASCII_8BIT)).force_encoding(Encoding::ASCII_8BIT)
211
215
  end
212
216
 
213
217
  # this method returns all the pages cataloged in the catalog.
@@ -253,12 +257,16 @@ module CombinePDF
253
257
  def fonts(limit_to_type0 = false)
254
258
  fonts_array = []
255
259
  pages.each do |pg|
256
- if pg[:Resources][:Font]
257
- pg[:Resources][:Font].values.each do |f|
258
- f = f[:referenced_object] if f[:referenced_object]
259
- if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f)
260
- fonts_array << f
261
- end
260
+ r = pg[:Resources]
261
+ next if !r
262
+ r = r[:referenced_object] if r[:referenced_object]
263
+ r = r[:Font]
264
+ next if !r
265
+ r = r[:referenced_object] if r[:referenced_object]
266
+ r.values.each do |f|
267
+ f = f[:referenced_object] if f[:referenced_object]
268
+ if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f)
269
+ fonts_array << f
262
270
  end
263
271
  end
264
272
  end
@@ -302,10 +310,10 @@ module CombinePDF
302
310
  if data.is_a? PDF
303
311
  @version = [@version, data.version].max
304
312
  pages_to_add = data.pages
305
- actual_value(@names ||= {}.dup).update actual_value(data.names_object), &self.class.method(:hash_merge_new_no_page)
306
- merge_outlines((@outlines ||= {}.dup), data.outlines_object, location) unless actual_value(data.outlines_object).empty?
313
+ actual_value(@names ||= {}.dup).update data.names, &HASH_MERGE_NEW_NO_PAGE
314
+ merge_outlines((@outlines ||= {}.dup), actual_value(data.outlines), location) unless actual_value(data.outlines).empty?
307
315
  if actual_value(@forms_data)
308
- actual_value(@forms_data).update actual_value(data.forms_data), &self.class.method(:hash_merge_new_no_page) if data.forms_data
316
+ actual_value(@forms_data).update actual_value(data.forms_data), &HASH_MERGE_NEW_NO_PAGE if data.forms_data
309
317
  else
310
318
  @forms_data = data.forms_data
311
319
  end
@@ -354,9 +362,9 @@ module CombinePDF
354
362
  #
355
363
  # options:: a Hash of options setting the behavior and format of the page numbers:
356
364
  # - :number_format a string representing the format for page number. defaults to ' - %s - ' (allows for letter numbering as well, such as "a", "b"...).
357
- # - :location an Array containing the location for the page numbers, can be :top, :buttom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :buttom].
365
+ # - :location an Array containing the location for the page numbers, can be :top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :bottom].
358
366
  # - :start_at an Integer that sets the number for first page number. also accepts a letter ("a") for letter numbering. defaults to 1.
359
- # - :margin_from_height a number (PDF points) for the top and buttom margins. defaults to 45.
367
+ # - :margin_from_height a number (PDF points) for the top and bottom margins. defaults to 45.
360
368
  # - :margin_from_side a number (PDF points) for the left and right margins. defaults to 15.
361
369
  # - :page_range a range of pages to be numbered (i.e. (2..-1) ) defaults to all the pages (nil). Remember to set the :start_at to the correct value.
362
370
  # the options Hash can also take all the options for {Page_Methods#textbox}.
@@ -20,8 +20,10 @@ module CombinePDF
20
20
  return format_name_to_pdf object
21
21
  elsif object.is_a?(Array)
22
22
  return format_array_to_pdf object
23
- elsif object.is_a?(Numeric) || object.is_a?(TrueClass) || object.is_a?(FalseClass)
23
+ elsif object.is_a?(Integer) || object.is_a?(TrueClass) || object.is_a?(FalseClass)
24
24
  return object.to_s
25
+ elsif object.is_a?(Numeric) # Float or other non-integer
26
+ return sprintf('%f', object)
25
27
  elsif object.is_a?(Hash)
26
28
  return format_hash_to_pdf object
27
29
  else
@@ -29,25 +31,30 @@ module CombinePDF
29
31
  end
30
32
  end
31
33
 
32
- STRING_REPLACEMENT_HASH = { "\x0A" => '\\n',
33
- "\x0D" => '\\r',
34
- "\x09" => '\\t',
35
- "\x08" => '\\b',
36
- "\x0C" => '\\f', # form-feed (\f) == 0x0C
37
- "\x28" => '\\(',
38
- "\x29" => '\\)',
39
- "\x5C" => '\\\\' }.dup
40
- 32.times { |i| STRING_REPLACEMENT_HASH[i.chr] ||= "\\#{i}" }
41
- (256 - 127).times { |i| STRING_REPLACEMENT_HASH[(i + 127).chr] ||= "\\#{i + 127}" }
34
+ STRING_REPLACEMENT_ARRAY = []
35
+ 256.times {|i| STRING_REPLACEMENT_ARRAY[i] = [i]}
36
+ 8.times { |i| STRING_REPLACEMENT_ARRAY[i] = "\\00#{i.to_s(8)}".bytes.to_a }
37
+ 24.times { |i| STRING_REPLACEMENT_ARRAY[i + 7] = "\\0#{i.to_s(8)}".bytes.to_a }
38
+ (256 - 127).times { |i| STRING_REPLACEMENT_ARRAY[(i + 127)] ||= "\\#{(i + 127).to_s(8)}".bytes.to_a }
39
+ STRING_REPLACEMENT_ARRAY[0x0A] = '\\n'.bytes.to_a
40
+ STRING_REPLACEMENT_ARRAY[0x0D] = '\\r'.bytes.to_a
41
+ STRING_REPLACEMENT_ARRAY[0x09] = '\\t'.bytes.to_a
42
+ STRING_REPLACEMENT_ARRAY[0x08] = '\\b'.bytes.to_a
43
+ STRING_REPLACEMENT_ARRAY[0x0C] = '\\f'.bytes.to_a # form-feed (\f) == 0x0C
44
+ STRING_REPLACEMENT_ARRAY[0x28] = '\\('.bytes.to_a
45
+ STRING_REPLACEMENT_ARRAY[0x29] = '\\)'.bytes.to_a
46
+ STRING_REPLACEMENT_ARRAY[0x5C] = '\\\\'.bytes.to_a
42
47
 
43
48
  def format_string_to_pdf(object)
49
+ obj_bytes = object.bytes.to_a
44
50
  # object.force_encoding(Encoding::ASCII_8BIT)
45
- if !object.match(/[^D\:\d\+\-Z\']/) # if format is set to Literal and string isn't a date
46
- ('(' + ([].tap { |out| object.bytes.to_a.each { |byte| STRING_REPLACEMENT_HASH[byte.chr] ? (STRING_REPLACEMENT_HASH[byte.chr].bytes.each { |b| out << b }) : out << byte } }).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
47
- else
48
- # A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f)
51
+ if object.length == 0 || obj_bytes.min <= 31 || obj_bytes.max >= 127 # || (obj_bytes[0] != 68 object.match(/[^D\:\d\+\-Z\']/))
52
+ # A hexadecimal string shall be written as a sequence of hexadecimal digits (0-9 and either A-F or a-f)
49
53
  # encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
50
54
  "<#{object.unpack('H*')[0]}>".force_encoding(Encoding::ASCII_8BIT)
55
+ else
56
+ # a good fit for a Literal String or the string is a date (MUST be literal)
57
+ ('(' + ([].tap { |out| obj_bytes.each { |byte| out.concat(STRING_REPLACEMENT_ARRAY[byte]) } } ).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
51
58
  end
52
59
  end
53
60
 
@@ -1,3 +1,3 @@
1
1
  module CombinePDF
2
- VERSION = '1.0.6'.freeze
2
+ VERSION = '1.0.22'.freeze
3
3
  end
data/lib/combine_pdf.rb CHANGED
@@ -5,6 +5,7 @@ require 'securerandom'
5
5
  require 'strscan'
6
6
  require 'matrix'
7
7
  require 'set'
8
+ require 'digest'
8
9
 
9
10
  # require the RC4 Gem
10
11
  require 'rc4'
data/test/automated CHANGED
@@ -95,6 +95,8 @@ pdf.save('07_named destinations_numbered.pdf')
95
95
  CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err.pdf").save '08_1-unknown-err-empty-str.pdf'
96
96
  CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err2.pdf").save '08_2-unknown-err-empty-str.pdf'
97
97
  CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err3.pdf").save '08_3-unknown-err-empty-str.pdf'
98
+ CombinePDF.load("./Ruby/test\ pdfs/xref_in_middle.pdf").save '08_4-xref-in-middle.pdf'
99
+ CombinePDF.load("./Ruby/test\ pdfs/xref_split.pdf").save '08_5-xref-fragmented.pdf'
98
100
 
99
101
  CombinePDF.load("/Users/2Be/Ruby/test\ pdfs/nil_object.pdf").save('09_nil_in_parsed_array.pdf')
100
102
 
@@ -0,0 +1,22 @@
1
+ require 'bundler/setup'
2
+ require 'minitest/autorun'
3
+ require 'combine_pdf/renderer'
4
+
5
+ class CombinePDFRendererTest < Minitest::Test
6
+
7
+ class TestRenderer
8
+ include CombinePDF::Renderer
9
+
10
+ def test_object(object)
11
+ object_to_pdf(object)
12
+ end
13
+ end
14
+
15
+ def test_numeric_array_to_pdf
16
+ input = [1.234567, 0.000054, 5, -0.000099]
17
+ expected = "[1.234567 0.000054 5 -0.000099]".force_encoding('BINARY')
18
+ actual = TestRenderer.new.test_object(input)
19
+
20
+ assert_equal(expected, actual)
21
+ end
22
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: combine_pdf
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.6
4
+ version: 1.0.22
5
5
  platform: ruby
6
6
  authors:
7
7
  - Boaz Segev
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-02 00:00:00.000000000 Z
11
+ date: 2021-11-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-rc4
@@ -25,33 +25,47 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: 0.1.5
27
27
  - !ruby/object:Gem::Dependency
28
- name: bundler
28
+ name: matrix
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - "~>"
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: '1.7'
34
- type: :development
33
+ version: '0'
34
+ type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - "~>"
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
- version: '1.7'
40
+ version: '0'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rake
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - "~>"
45
+ - - ">="
46
46
  - !ruby/object:Gem::Version
47
- version: '10.0'
47
+ version: 12.3.3
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - "~>"
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: 12.3.3
55
+ - !ruby/object:Gem::Dependency
56
+ name: minitest
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
53
67
  - !ruby/object:Gem::Version
54
- version: '10.0'
68
+ version: '0'
55
69
  description: A nifty gem, in pure Ruby, to parse PDF files and combine (merge) them
56
70
  with other PDF files, number the pages, watermark them or stamp them, create tables,
57
71
  add basic text objects etc` (all using the PDF file format).
@@ -82,13 +96,14 @@ files:
82
96
  - lib/combine_pdf/renderer.rb
83
97
  - lib/combine_pdf/version.rb
84
98
  - test/automated
99
+ - test/combine_pdf/renderer_test.rb
85
100
  - test/console
86
101
  - test/named_dest
87
102
  homepage: https://github.com/boazsegev/combine_pdf
88
103
  licenses:
89
104
  - MIT
90
105
  metadata: {}
91
- post_install_message:
106
+ post_install_message:
92
107
  rdoc_options: []
93
108
  require_paths:
94
109
  - lib
@@ -103,12 +118,12 @@ required_rubygems_version: !ruby/object:Gem::Requirement
103
118
  - !ruby/object:Gem::Version
104
119
  version: '0'
105
120
  requirements: []
106
- rubyforge_project:
107
- rubygems_version: 2.6.11
108
- signing_key:
121
+ rubygems_version: 3.2.3
122
+ signing_key:
109
123
  specification_version: 4
110
124
  summary: Combine, stamp and watermark PDF files in pure Ruby.
111
125
  test_files:
112
126
  - test/automated
127
+ - test/combine_pdf/renderer_test.rb
113
128
  - test/console
114
129
  - test/named_dest