combine_pdf 1.0.6 → 1.0.22
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/CHANGELOG.md +84 -0
- data/README.md +40 -1
- data/combine_pdf.gemspec +4 -2
- data/lib/combine_pdf/api.rb +6 -6
- data/lib/combine_pdf/fonts.rb +13 -4
- data/lib/combine_pdf/page_methods.rb +9 -10
- data/lib/combine_pdf/parser.rb +145 -60
- data/lib/combine_pdf/pdf_protected.rb +44 -11
- data/lib/combine_pdf/pdf_public.rb +20 -12
- data/lib/combine_pdf/renderer.rb +22 -15
- data/lib/combine_pdf/version.rb +1 -1
- data/lib/combine_pdf.rb +1 -0
- data/test/automated +2 -0
- data/test/combine_pdf/renderer_test.rb +22 -0
- metadata +32 -17
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 96825d0aa74bd673883c4d7dbf3884459ff27c2a3d7bd0c60875e0499c7b9aeb
|
4
|
+
data.tar.gz: 985c39883f343bb5182344ccc31353103fbac89494000362973f08cdd379d2ac
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8575b612e1eb31775833faba8f310d84680d6ce27512d6a9c182e7598a743956da34e0321f0280d032e3e46b861dd2abdd88b297e65a652ec8e3e416ed9fb0a0
|
7
|
+
data.tar.gz: 2026d924120f1798681842fee7a2eb0de78be6ac493dcbd4ffbb934c1c0135161ccbf29283fb0eec42b4ebab66f84b7fa3ac354a970fad9bc0ad302f64da7c7a
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,90 @@
|
|
2
2
|
|
3
3
|
***
|
4
4
|
|
5
|
+
#### Change log v.1.0.22
|
6
|
+
|
7
|
+
**Fix**: fix `fonts` dereferencing issue (#203), credit to @MarcWeber (Marc Weber) for identifying the issue.
|
8
|
+
|
9
|
+
**Fix**: fix `metrix` dependency, credit to @casperisfine (Jean byroot Boussier) for PR #195.
|
10
|
+
|
11
|
+
#### Change log v.1.0.21
|
12
|
+
|
13
|
+
**Fix**: possible fix for issue #184, where nested PDF files within an object stream could break the parser. Credit to Greg Sparrow (@hazelsparrow) for exposng the issue.
|
14
|
+
|
15
|
+
#### Change log v.1.0.20
|
16
|
+
|
17
|
+
**Fix**: merges PR #180, `TypeError: can't dup NilClass`. Credit to Adam Trepanier (@adam-e-trepanier) for the merge.
|
18
|
+
|
19
|
+
#### Change log v.1.0.19
|
20
|
+
|
21
|
+
**Fix**: fixes font height and width detection issue. Issue #179. Credit to @5anchezzz for opening the issue.
|
22
|
+
|
23
|
+
**Fix**: fixes an indentation warning. Issue #173. Credit to @rubyFeedback for exposing this issue.
|
24
|
+
|
25
|
+
#### Change log v.1.0.18
|
26
|
+
|
27
|
+
**Fix**: fixed issue with the 1.0.17 release where `ProcSet` PDF Arrays should have been expected but where ignored and a PDF Object was assumed instead (issue #171) - credit to @chuchiperriman (Jesús Barbero Rodríguez).
|
28
|
+
|
29
|
+
#### Change log v.1.0.17
|
30
|
+
|
31
|
+
NB: yanked from RubyGems.org.
|
32
|
+
|
33
|
+
**Fix**: fixed issue where nested structure equality tests might provide false positives, resulting in lost data (issue #166) - credit to @cschilbe (Conrad Schilbe).
|
34
|
+
|
35
|
+
#### Change log v.1.0.16
|
36
|
+
|
37
|
+
**Fix**: some documentation typos were fixed (PR #147) - credit to @djhopper01 (Derek Hopper).
|
38
|
+
|
39
|
+
#### Change log v.1.0.15
|
40
|
+
|
41
|
+
**Fix**: An attempt to fix JRuby compatibility concerns (issue #127).
|
42
|
+
|
43
|
+
#### Change log v.1.0.14
|
44
|
+
|
45
|
+
**Fix**: Fixed an issue related to PDF XRef table data, where a malformed EOL marker would cause the parser to fail. Credit to @dangerous (David Rainsford) for exposing this issue in a comment to issue #140.
|
46
|
+
|
47
|
+
#### Change log v.1.0.13
|
48
|
+
|
49
|
+
**Fix**: Fixed an issue related to PDF object streams (version 1.6) where a numerical object at the beginning of the stream might be mis-parsed as an object reference number rather than an object. Credit to @Defoncesko for reporting issue #141.
|
50
|
+
|
51
|
+
#### Change log v.1.0.12
|
52
|
+
|
53
|
+
**Fix**: Fixed an issue introduced in version 1.0.11, where a fragmented XREF table might cause the CombinePDF::Parser to fail. Credit to @solasdev for reporting issue #140.
|
54
|
+
|
55
|
+
#### Change log v.1.0.11
|
56
|
+
|
57
|
+
**Fix**: Fixed an issue where small floating point numbers would produce invalid PDF rendering (where exponent notation was used instead of decimal notation). Credit to @avit (Andrew Vit) for PR #139.
|
58
|
+
|
59
|
+
#### Change log v.1.0.10
|
60
|
+
|
61
|
+
**Fix**: Fixed an issue related to issue #131 where parsing would fail if the `xref` section appears to be misplaced within the PDF. Credit to @bharat303 (Bharat Godhani) for exposing this issue.
|
62
|
+
|
63
|
+
#### Change log v.1.0.9
|
64
|
+
|
65
|
+
**Fix**: Fixed issue #136 where the `#fix_rotation` function would rotate the page to the wrong direction. Credit to @dmkash for exposing this issue.
|
66
|
+
|
67
|
+
#### Change log v.1.0.8
|
68
|
+
|
69
|
+
**Fix**: Fixed an issue with octal representation in escaped string data. The issue would (usually) go unnoticed (altering internal labels in a non-disruptive manner), however the issue did effect `ColorSpace` data in the rare use of `ICCBased` color maps, causing color distortion and transparency loss. Credit to @react-rails and @bedaronco for exposing the issue (issue #130).
|
70
|
+
|
71
|
+
**Fix**: Fixed an issue with non English alphabet in PDF literal strings. This issue went undetected since PDF literal strings aren't used by CombinePDF except for the date stamping...
|
72
|
+
|
73
|
+
**Fix**: Improbable, but possibly a fix for issue #127, where the JRuby interpreter would fail to pass the correct arguments to the Hash update Proc. Since I'm trying to author a workaround, I have my doubts... but an attempt is better than nothing.
|
74
|
+
|
75
|
+
**Update**: Improved parsing error handling, courtesy of Evgeny Garlukovich (@evgenygarl).
|
76
|
+
|
77
|
+
**Update**: Added reader methods for the `names` and `outlines` PDF objects in response to issue #133. Use with care.
|
78
|
+
|
79
|
+
#### Change log v.1.0.7
|
80
|
+
|
81
|
+
**Fix**: Fix an issue where page property inheritance might break PDF structure if there's a conflict between property types (inheritance using properties by reference vs. nested properties), fixing issue #124. Credit to @erikaxel for exposing the issue.
|
82
|
+
|
83
|
+
#### Change log v.1.0.6
|
84
|
+
|
85
|
+
**Fix**: Fix warnings, issue #120. Credit to @lloeki for exposing the issue.
|
86
|
+
|
87
|
+
**Fix**: Fix / add adjustable nesting protection, fixing issue #117. Credit to @emmanuelmillionaer for exposing the issue.
|
88
|
+
|
5
89
|
#### Change log v.1.0.5
|
6
90
|
|
7
91
|
**Fix**: Fix issue #116 where some PDF objects (the page catalog and some root information data) were written twice to the saved PDF file (or String). Credit to @albertsaave for exposing the issue using GhostScript.
|
data/README.md
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
# CombinePDF - the ruby way for merging PDF files
|
2
2
|
[![Gem Version](https://badge.fury.io/rb/combine_pdf.svg)](http://badge.fury.io/rb/combine_pdf)
|
3
3
|
[![GitHub](https://img.shields.io/badge/GitHub-Open%20Source-blue.svg)](https://github.com/boazsegev/combine_pdf)
|
4
|
+
[![Documentation](http://inch-ci.org/github/boazsegev/combine_pdf.svg?branch=master)](https://www.rubydoc.info/github/boazsegev/combine_pdf)
|
4
5
|
[![Maintainers Wanted](https://img.shields.io/badge/maintainers-wanted-red.svg)](https://github.com/pickhardt/maintainers-wanted)
|
5
6
|
|
7
|
+
|
6
8
|
CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).
|
7
9
|
|
8
10
|
## Install
|
@@ -41,6 +43,8 @@ Quick rundown:
|
|
41
43
|
|
42
44
|
* Sometimes the CombinePDF will raise an exception even if the PDF could be parsed (i.e., when PDF optional content exists)... I find it better to err on the side of caution, although for optional content PDFs an exception is avoidable using `CombinePDF.load(pdf_file, allow_optional_content: true)`.
|
43
45
|
|
46
|
+
* The CombinePDF gem runs recursive code to both parse and format the PDF files. Hence, PDF files that have heavily nested objects, as well as those that where combined in a way that results in cyclic nesting, might explode the stack - resulting in an exception or program failure.
|
47
|
+
|
44
48
|
CombinePDF is written natively in Ruby and should (presumably) work on all Ruby platforms that follow Ruby 2.0 compatibility.
|
45
49
|
|
46
50
|
However, PDF files are quite complex creatures and no guaranty is provided.
|
@@ -112,7 +116,42 @@ pdf.number_pages
|
|
112
116
|
pdf.save "file_with_numbering.pdf"
|
113
117
|
```
|
114
118
|
|
115
|
-
Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - see documentation.
|
119
|
+
Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - [see documentation](https://www.rubydoc.info/github/boazsegev/combine_pdf/CombinePDF/PDF#number_pages-instance_method).
|
120
|
+
|
121
|
+
For example, should you prefer to place the page number on the bottom right side of all PDF pages, do:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
pdf.number_pages(location: [:bottom_right])
|
125
|
+
```
|
126
|
+
|
127
|
+
As another example, the dashes around the number are removed and a box is placed around it. The numbering is semi-transparent and the first 3 pages are numbered using letters (a,b,c) rather than numbers:
|
128
|
+
|
129
|
+
|
130
|
+
```ruby
|
131
|
+
# number first 3 pages as "a", "b", "c"
|
132
|
+
pdf.number_pages(number_format: " %s ",
|
133
|
+
location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
|
134
|
+
start_at: "a",
|
135
|
+
page_range: (0..2),
|
136
|
+
box_color: [0.8,0.8,0.8],
|
137
|
+
border_color: [0.4, 0.4, 0.4],
|
138
|
+
border_width: 1,
|
139
|
+
box_radius: 6,
|
140
|
+
opacity: 0.75)
|
141
|
+
# number the rest of the pages as 4, 5, ... etc'
|
142
|
+
pdf.number_pages(number_format: " %s ",
|
143
|
+
location: [:top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right],
|
144
|
+
start_at: 4,
|
145
|
+
page_range: (3..-1),
|
146
|
+
box_color: [0.8,0.8,0.8],
|
147
|
+
border_color: [0.4, 0.4, 0.4],
|
148
|
+
border_width: 1,
|
149
|
+
box_radius: 6,
|
150
|
+
opacity: 0.75)
|
151
|
+
```
|
152
|
+
|
153
|
+
pdf.number_pages(number_format: " %s ", location: :bottom_right, font_size: 44)
|
154
|
+
|
116
155
|
|
117
156
|
## Loading and Parsing PDF data
|
118
157
|
|
data/combine_pdf.gemspec
CHANGED
@@ -19,7 +19,9 @@ Gem::Specification.new do |spec|
|
|
19
19
|
spec.require_paths = ["lib"]
|
20
20
|
|
21
21
|
spec.add_runtime_dependency 'ruby-rc4', '>= 0.1.5'
|
22
|
+
spec.add_runtime_dependency 'matrix'
|
22
23
|
|
23
|
-
spec.add_development_dependency "bundler", "
|
24
|
-
spec.add_development_dependency "rake", "
|
24
|
+
# spec.add_development_dependency "bundler", ">= 1.7"
|
25
|
+
spec.add_development_dependency "rake", ">= 12.3.3"
|
26
|
+
spec.add_development_dependency "minitest"
|
25
27
|
end
|
data/lib/combine_pdf/api.rb
CHANGED
@@ -24,11 +24,11 @@ module CombinePDF
|
|
24
24
|
raise TypeError, "couldn't create PDF object, expecting type String" unless string.is_a?(String) || string.is_a?(Pathname)
|
25
25
|
begin
|
26
26
|
(begin
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
27
|
+
File.file? string
|
28
|
+
rescue
|
29
|
+
false
|
30
|
+
end) ? load(string) : parse(string)
|
31
|
+
rescue => _e
|
32
32
|
raise 'General PDF error - Use CombinePDF.load or CombinePDF.parse for a non-general error message (the requested file was not found OR the string received is not a valid PDF stream OR the file was found but not valid).'
|
33
33
|
end
|
34
34
|
end
|
@@ -140,7 +140,7 @@ module CombinePDF
|
|
140
140
|
# this function enables plug-ins to expend the font functionality of CombinePDF.
|
141
141
|
#
|
142
142
|
# font_name:: a Symbol with the name of the font. if the fonts exists in the library, it will be overwritten!
|
143
|
-
# font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x,
|
143
|
+
# font_metrics:: a Hash of font metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where char == character itself (i.e. " " for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.
|
144
144
|
# font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
|
145
145
|
# font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
|
146
146
|
def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
|
data/lib/combine_pdf/fonts.rb
CHANGED
@@ -100,7 +100,7 @@ module CombinePDF
|
|
100
100
|
|
101
101
|
# adds a correctly formatted font object to the font library.
|
102
102
|
# font_name:: a Symbol with the name of the font. if the fonts name exists, the font will be overwritten!
|
103
|
-
# font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x,
|
103
|
+
# font_metrics:: a Hash of ont metrics, of the format char => {wx: char_width, boundingbox: [left_x, bottom_y, right_x, top_y]} where i == character code (i.e. 32 for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy will be supported in the future, for up to down fonts.
|
104
104
|
# font_pdf_object:: a Hash in the internal format recognized by CombinePDF, that represents the font object.
|
105
105
|
# font_cmap:: a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {"a" => "61", "z" => "7a" }).
|
106
106
|
def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
|
@@ -138,12 +138,21 @@ module CombinePDF
|
|
138
138
|
text.each_char do |c|
|
139
139
|
metrics_array << (merged_metrics[c] || { wx: 0, boundingbox: [0, 0, 0, 0] })
|
140
140
|
end
|
141
|
-
|
142
|
-
|
141
|
+
metrics_array_mapped_top = [].dup
|
142
|
+
metrics_array_mapped_bottom = [].dup
|
143
143
|
width = 0.0
|
144
144
|
metrics_array.each do |m|
|
145
|
-
|
145
|
+
if (m && m[:boundingbox])
|
146
|
+
metrics_array_mapped_top << m[:boundingbox][3]
|
147
|
+
metrics_array_mapped_bottom << m[:boundingbox][1]
|
148
|
+
else
|
149
|
+
metrics_array_mapped_top << 0
|
150
|
+
metrics_array_mapped_bottom << 0
|
151
|
+
end
|
152
|
+
width += (m[:wx] || m[:wy] || 0) if m
|
146
153
|
end
|
154
|
+
height = metrics_array_mapped_top.max
|
155
|
+
height -=metrics_array_mapped_bottom.min
|
147
156
|
return [height.to_f / 1000 * size, width.to_f / 1000 * size] if metrics_array[0][:wy]
|
148
157
|
[width.to_f / 1000 * size, height.to_f / 1000 * size]
|
149
158
|
end
|
@@ -94,7 +94,7 @@ module CombinePDF
|
|
94
94
|
# end
|
95
95
|
|
96
96
|
# set ProcSet to recommended value
|
97
|
-
resources[:ProcSet]
|
97
|
+
resources[:ProcSet] ||= [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
|
98
98
|
|
99
99
|
if top # if this is a stamp (overlay)
|
100
100
|
insert_content CONTENT_CONTAINER_START, 0
|
@@ -147,15 +147,15 @@ module CombinePDF
|
|
147
147
|
|
148
148
|
# This method adds a simple text box to the Page represented by the PDFWriter class.
|
149
149
|
# This function takes two values:
|
150
|
-
# text:: the text to
|
150
|
+
# text:: the text to write in the box.
|
151
151
|
# properties:: a Hash of box properties.
|
152
152
|
# the symbols and values in the properties Hash could be any or all of the following:
|
153
153
|
# x:: the left position of the box.
|
154
|
-
# y:: the
|
154
|
+
# y:: the BOTTOM position of the box.
|
155
155
|
# width:: the width/length of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
|
156
156
|
# height:: the height of the box. negative values will be computed from edge of page. defaults to 0 (end of page).
|
157
157
|
# text_align:: symbol for horizontal text alignment, can be ":center" (default), ":right", ":left"
|
158
|
-
# text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":
|
158
|
+
# text_valign:: symbol for vertical text alignment, can be ":center" (default), ":top", ":bottom"
|
159
159
|
# text_padding:: a Float between 0 and 1, setting the padding for the text. defaults to 0.05 (5%).
|
160
160
|
# font:: a registered font name or an Array of names. defaults to ":Helvetica". The 14 standard fonts names are:
|
161
161
|
# - :"Times-Roman"
|
@@ -244,8 +244,8 @@ module CombinePDF
|
|
244
244
|
half_radius = (radius.to_f / 2).round 4
|
245
245
|
## set starting point
|
246
246
|
box_stream << "#{options[:x] + radius} #{options[:y]} m\n"
|
247
|
-
##
|
248
|
-
box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" #
|
247
|
+
## bottom and right corner - first line and first corner
|
248
|
+
box_stream << "#{options[:x] + options[:width] - radius} #{options[:y]} l\n" # bottom
|
249
249
|
if options[:box_radius] != 0 # make first corner, if not straight.
|
250
250
|
box_stream << "#{options[:x] + options[:width] - half_radius} #{options[:y]} "
|
251
251
|
box_stream << "#{options[:x] + options[:width]} #{options[:y] + half_radius} "
|
@@ -265,7 +265,7 @@ module CombinePDF
|
|
265
265
|
box_stream << "#{options[:x]} #{options[:y] + options[:height] - half_radius} "
|
266
266
|
box_stream << "#{options[:x]} #{options[:y] + options[:height] - radius} c\n"
|
267
267
|
end
|
268
|
-
## left and
|
268
|
+
## left and bottom-left corner
|
269
269
|
box_stream << "#{options[:x]} #{options[:y] + radius} l\n"
|
270
270
|
if options[:box_radius] != 0
|
271
271
|
box_stream << "#{options[:x]} #{options[:y] + half_radius} "
|
@@ -287,7 +287,7 @@ module CombinePDF
|
|
287
287
|
end
|
288
288
|
contents << box_stream
|
289
289
|
|
290
|
-
# reset x,y by text alignment - x,y are calculated from the
|
290
|
+
# reset x,y by text alignment - x,y are calculated from the bottom left
|
291
291
|
# each unit (1) is 1/72 Inch
|
292
292
|
# create text stream
|
293
293
|
text_stream = ''
|
@@ -403,7 +403,7 @@ module CombinePDF
|
|
403
403
|
def fix_rotation
|
404
404
|
return self if self[:Rotate].to_f == 0.0 || mediabox.nil?
|
405
405
|
# calculate the rotation
|
406
|
-
r = self[:Rotate].to_f * Math::PI / 180
|
406
|
+
r = (360.0 - self[:Rotate].to_f) * Math::PI / 180
|
407
407
|
s = Math.sin(r).round 6
|
408
408
|
c = Math.cos(r).round 6
|
409
409
|
ctm = [c, s, -s, c]
|
@@ -649,7 +649,6 @@ module CombinePDF
|
|
649
649
|
page_res.each do |k, v|
|
650
650
|
v = page_res[k] = v.dup if v.is_a?(Array) || v.is_a?(Hash)
|
651
651
|
v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
|
652
|
-
v = v[:referenced_object] = v[:referenced_object].dup if v.is_a?(Hash) && v[:referenced_object]
|
653
652
|
end
|
654
653
|
end
|
655
654
|
page_copy.instance_exec(secure || @secure_injection) { |s| secure_for_copy if s; init_contents; self }
|
data/lib/combine_pdf/parser.rb
CHANGED
@@ -6,6 +6,8 @@
|
|
6
6
|
########################################################
|
7
7
|
|
8
8
|
module CombinePDF
|
9
|
+
ParsingError = Class.new(StandardError)
|
10
|
+
|
9
11
|
# @!visibility private
|
10
12
|
# @private
|
11
13
|
#:nodoc: all
|
@@ -77,7 +79,10 @@ module CombinePDF
|
|
77
79
|
@parsed = _parse_
|
78
80
|
# puts @parsed
|
79
81
|
|
80
|
-
|
82
|
+
unless (@parsed.select { |i| !i.is_a?(Hash) }).empty?
|
83
|
+
# p @parsed.select
|
84
|
+
raise ParsingError, 'Unknown PDF parsing error - malformed PDF file?'
|
85
|
+
end
|
81
86
|
|
82
87
|
if @root_object == {}.freeze
|
83
88
|
xref_streams = @parsed.select { |obj| obj.is_a?(Hash) && obj[:Type] == :XRef }
|
@@ -86,7 +91,9 @@ module CombinePDF
|
|
86
91
|
end
|
87
92
|
end
|
88
93
|
|
89
|
-
|
94
|
+
if @root_object == {}.freeze
|
95
|
+
raise ParsingError, 'root is unknown - cannot determine if file is Encrypted'
|
96
|
+
end
|
90
97
|
|
91
98
|
if @root_object[:Encrypt]
|
92
99
|
# change_references_to_actual_values @root_object
|
@@ -105,12 +112,13 @@ module CombinePDF
|
|
105
112
|
next unless o.is_a?(Hash) && o[:Type] == :ObjStm
|
106
113
|
## un-encode (using the correct filter) the object streams
|
107
114
|
PDFFilter.inflate_object o
|
115
|
+
# puts "Object Stream Found:", o[:raw_stream_content]
|
108
116
|
## extract objects from stream
|
109
117
|
@scanner = StringScanner.new o[:raw_stream_content]
|
110
118
|
stream_data = _parse_
|
111
119
|
id_array = []
|
112
120
|
collection = [nil]
|
113
|
-
while stream_data[0].is_a? (Numeric)
|
121
|
+
while (stream_data[0].is_a?(Numeric) && stream_data[1].is_a?(Numeric))
|
114
122
|
id_array << stream_data.shift
|
115
123
|
stream_data.shift
|
116
124
|
end
|
@@ -225,16 +233,18 @@ module CombinePDF
|
|
225
233
|
# all characters that aren't white space or special: /[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]+
|
226
234
|
elsif str = @scanner.scan(/\/[^\x00\x09\x0a\x0c\x0d\x20\x28\x29\x3c\x3e\x5b\x5d\x7b\x7d\x2f\x25]*/)
|
227
235
|
out << (str[1..-1].gsub(/\#[0-9a-fA-F]{2}/) { |a| a[1..2].hex.chr }).to_sym
|
236
|
+
# warn "CombinePDF detected name: #{out.last.to_s}"
|
228
237
|
##########################################
|
229
238
|
## Parse a Number
|
230
239
|
##########################################
|
231
240
|
elsif str = @scanner.scan(/[\+\-\.\d]+/)
|
232
241
|
str =~ /\./ ? (out << str.to_f) : (out << str.to_i)
|
242
|
+
# warn "CombinePDF detected number: #{out.last.to_s}"
|
233
243
|
##########################################
|
234
244
|
## parse a Hex String
|
235
245
|
##########################################
|
236
246
|
elsif str = @scanner.scan(/\<[0-9a-fA-F]*\>/)
|
237
|
-
# warn "Found a hex string"
|
247
|
+
# warn "Found a hex string #{str}"
|
238
248
|
str = str.slice(1..-2).force_encoding(Encoding::ASCII_8BIT)
|
239
249
|
# str = "0#{str}" if str.length.odd?
|
240
250
|
out << unify_string([str].pack('H*').force_encoding(Encoding::ASCII_8BIT))
|
@@ -310,10 +320,10 @@ module CombinePDF
|
|
310
320
|
when 102 # f, form-feed
|
311
321
|
str << 12
|
312
322
|
when 48..57 # octal notation for byte?
|
313
|
-
rep
|
314
|
-
rep
|
315
|
-
rep
|
316
|
-
str << rep
|
323
|
+
rep -= 48
|
324
|
+
rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57)
|
325
|
+
rep = (rep << 3) + (str_bytes.shift-48) if str_bytes[0].between?(48, 57) && (((rep << 3) + (str_bytes[0] - 48)) <= 255)
|
326
|
+
str << rep
|
317
327
|
when 10 # new line, ignore
|
318
328
|
str_bytes.shift if str_bytes[0] == 13
|
319
329
|
true
|
@@ -328,6 +338,7 @@ module CombinePDF
|
|
328
338
|
end
|
329
339
|
end
|
330
340
|
out << unify_string(str.pack('C*').force_encoding(Encoding::ASCII_8BIT))
|
341
|
+
# warn "Found Literal String: #{out.last}"
|
331
342
|
##########################################
|
332
343
|
## parse a Dictionary
|
333
344
|
##########################################
|
@@ -340,25 +351,42 @@ module CombinePDF
|
|
340
351
|
## return content of array or dictionary
|
341
352
|
##########################################
|
342
353
|
elsif @scanner.scan(/\]/) || @scanner.scan(/>>/)
|
354
|
+
# warn "Dictionary / Array ended with #{@scanner.peek(5)}"
|
343
355
|
return out
|
344
356
|
##########################################
|
345
357
|
## parse a Stream
|
346
358
|
##########################################
|
347
359
|
elsif @scanner.scan(/stream[ \t]*[\r\n]/)
|
348
360
|
@scanner.pos += 1 if @scanner.peek(1) == "\n".freeze && @scanner.matched[-1] != "\n".freeze
|
361
|
+
# advance by the publshed stream length (if any)
|
362
|
+
old_pos = @scanner.pos
|
363
|
+
if(out.last.is_a?(Hash) && out.last[:Length].is_a?(Integer) && out.last[:Length] > 2)
|
364
|
+
@scanner.pos += out.last[:Length] - 2
|
365
|
+
end
|
366
|
+
|
349
367
|
# the following was dicarded because some PDF files didn't have an EOL marker as required
|
350
368
|
# str = @scanner.scan_until(/(\r\n|\r|\n)endstream/)
|
351
369
|
# instead, a non-strict RegExp is used:
|
352
|
-
|
370
|
+
|
371
|
+
|
353
372
|
# raise error if the stream doesn't end.
|
354
|
-
|
373
|
+
unless @scanner.skip_until(/endstream/)
|
374
|
+
raise ParsingError, "Parsing Error: PDF file error - a stream object wasn't properly closed using 'endstream'!"
|
375
|
+
end
|
376
|
+
length = @scanner.pos - (old_pos + 9)
|
377
|
+
length = 0 if(length < 0)
|
378
|
+
length -= 1 if(@scanner.string[old_pos + length - 1] == "\n")
|
379
|
+
length -= 1 if(@scanner.string[old_pos + length - 1] == "\r")
|
380
|
+
str = (length > 0) ? @scanner.string.slice(old_pos, length) : ''
|
381
|
+
|
382
|
+
# warn "CombinePDF parser: detected Stream #{str.length} bytes long #{str[0..3]}...#{str[-4..-1]}"
|
383
|
+
|
355
384
|
# need to remove end of stream
|
356
385
|
if out.last.is_a? Hash
|
357
|
-
|
358
|
-
out.last[:raw_stream_content] = unify_string str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
|
386
|
+
out.last[:raw_stream_content] = unify_string str.force_encoding(Encoding::ASCII_8BIT)
|
359
387
|
else
|
360
388
|
warn 'Stream not attached to dictionary!'
|
361
|
-
out << str.
|
389
|
+
out << str.force_encoding(Encoding::ASCII_8BIT)
|
362
390
|
end
|
363
391
|
##########################################
|
364
392
|
## parse an Object after finished
|
@@ -375,17 +403,6 @@ module CombinePDF
|
|
375
403
|
out.last[:Dest] = unify_string(out.last[:Dest].to_s) if out.last[:Dest] && out.last[:Dest].is_a?(Symbol)
|
376
404
|
# puts "!!!!!!!!! Error with :indirect_reference_id\n\nObject #{out.last} :indirect_reference_id = #{out.last[:indirect_reference_id]}" unless out.last[:indirect_reference_id].is_a?(Numeric)
|
377
405
|
##########################################
|
378
|
-
## Parse a comment
|
379
|
-
##########################################
|
380
|
-
elsif str = @scanner.scan(/\%/)
|
381
|
-
# is a comment, skip until new line
|
382
|
-
loop do
|
383
|
-
# break unless @scanner.scan(/[^\d\r\n]+/)
|
384
|
-
break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
|
385
|
-
@scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
|
386
|
-
end
|
387
|
-
# puts "AFTER COMMENT: #{@scanner.peek 8}"
|
388
|
-
##########################################
|
389
406
|
## Parse an Object Reference
|
390
407
|
##########################################
|
391
408
|
elsif @scanner.scan(/R/)
|
@@ -404,32 +421,55 @@ module CombinePDF
|
|
404
421
|
elsif @scanner.scan(/null/)
|
405
422
|
out << nil
|
406
423
|
##########################################
|
424
|
+
## Parse file trailer
|
425
|
+
##########################################
|
426
|
+
elsif @scanner.scan(/trailer/)
|
427
|
+
if @scanner.skip_until(/<</)
|
428
|
+
data = _parse_
|
429
|
+
(@root_object ||= {}).clear
|
430
|
+
@root_object[data.shift] = data.shift while data[0]
|
431
|
+
end
|
432
|
+
##########################################
|
407
433
|
## XREF - check for encryption... anything else?
|
408
434
|
##########################################
|
409
|
-
elsif @scanner.scan(/
|
410
|
-
|
411
|
-
|
412
|
-
|
413
|
-
fresh = true
|
414
|
-
if @scanner.matched[-1] == 'r'
|
415
|
-
if @scanner.skip_until(/<</)
|
416
|
-
data = _parse_
|
417
|
-
(@root_object ||= {}).clear
|
418
|
-
@root_object[data.shift] = data.shift while data[0]
|
419
|
-
end
|
420
|
-
##########
|
421
|
-
## skip untill end of segment, maked by %%EOF
|
422
|
-
@scanner.skip_until(/\%\%EOF/)
|
423
|
-
##########
|
424
|
-
## If this was the last valid segment, ignore any trailing garbage
|
425
|
-
## (issue #49 resolution)
|
426
|
-
break unless @scanner.exist?(/\%\%EOF/)
|
427
|
-
|
435
|
+
elsif @scanner.scan(/xref/)
|
436
|
+
# skip list indetifier lines or list lines ([\d] [\d][\r\n]) ot ([\d] [\d] [nf][\r\n])
|
437
|
+
while @scanner.scan(/[\s]*[\d]+[ \t]+[\d]+[ \t]*[\n\r]+/) || @scanner.scan(/[ \t]*[\d]+[ \t]+[\d]+[ \t]+[nf][\s]*/)
|
438
|
+
nil
|
428
439
|
end
|
429
|
-
|
440
|
+
##########################################
|
441
|
+
## XREF location can be ignored
|
442
|
+
##########################################
|
443
|
+
elsif @scanner.scan(/startxref/)
|
444
|
+
@scanner.scan(/[\s]+[\d]+[\s]+/)
|
445
|
+
##########################################
|
446
|
+
## Skip Whitespace
|
447
|
+
##########################################
|
430
448
|
elsif @scanner.scan(/[\s]+/)
|
431
449
|
# Generally, do nothing
|
432
450
|
nil
|
451
|
+
##########################################
|
452
|
+
## EOF?
|
453
|
+
##########################################
|
454
|
+
elsif @scanner.scan(/\%\%EOF/)
|
455
|
+
##########
|
456
|
+
## If this was the last valid segment, ignore any trailing garbage
|
457
|
+
## (issue #49 resolution)
|
458
|
+
break unless @scanner.exist?(/\%\%EOF/)
|
459
|
+
##########################################
|
460
|
+
## Parse a comment
|
461
|
+
##########################################
|
462
|
+
elsif str = @scanner.scan(/\%/)
|
463
|
+
# is a comment, skip until new line
|
464
|
+
loop do
|
465
|
+
# break unless @scanner.scan(/[^\d\r\n]+/)
|
466
|
+
break if @scanner.check(/([\d]+[\s]+[\d]+[\s]+obj[\s]+\<\<)|([\n\r]+)/) || @scanner.eos? # || @scanner.scan(/[^\d]+[\r\n]+/) ||
|
467
|
+
@scanner.scan(/[^\d\r\n]+/) || @scanner.pos += 1
|
468
|
+
end
|
469
|
+
# puts "AFTER COMMENT: #{@scanner.peek 8}"
|
470
|
+
##########################################
|
471
|
+
## Fix wkhtmltopdf - missing 'endobj' keywords
|
472
|
+
##########################################
|
433
473
|
elsif @scanner.scan(/obj[\s]*/)
|
434
474
|
# Fix wkhtmltopdf PDF authoring issue - missing 'endobj' keywords
|
435
475
|
unless fresh || (out[-4].nil? || out[-4].is_a?(Hash))
|
@@ -450,6 +490,9 @@ module CombinePDF
|
|
450
490
|
out << keep.pop
|
451
491
|
end
|
452
492
|
fresh = false
|
493
|
+
##########################################
|
494
|
+
## Unknown, warn and advance
|
495
|
+
##########################################
|
453
496
|
else
|
454
497
|
# always advance
|
455
498
|
# warn "Advancing for unknown reason... #{@scanner.string[@scanner.pos - 4, 8]} ... #{@scanner.peek(4)}" unless @scanner.peek(1) =~ /[\s\n]/
|
@@ -475,7 +518,9 @@ module CombinePDF
|
|
475
518
|
@parsed.delete_if { |obj| obj.nil? || obj[:Type] == :Catalog }
|
476
519
|
@parsed << catalogs
|
477
520
|
|
478
|
-
|
521
|
+
unless catalogs
|
522
|
+
raise ParsingError, "Unknown error - parsed data doesn't contain a cataloged object!"
|
523
|
+
end
|
479
524
|
end
|
480
525
|
if catalogs.is_a?(Array)
|
481
526
|
catalogs.each { |c| catalog_pages(c, inheritance_hash) unless c.nil? }
|
@@ -488,20 +533,31 @@ module CombinePDF
|
|
488
533
|
end
|
489
534
|
else
|
490
535
|
unless catalogs[:Type] == :Page
|
491
|
-
|
536
|
+
if (catalogs[:AS] || catalogs[:OCProperties]) && !@allow_optional_content
|
537
|
+
raise ParsingError, "Optional Content PDF files aren't supported and their pages cannot be safely extracted."
|
538
|
+
end
|
539
|
+
|
492
540
|
inheritance_hash[:MediaBox] = catalogs[:MediaBox] if catalogs[:MediaBox]
|
493
541
|
inheritance_hash[:CropBox] = catalogs[:CropBox] if catalogs[:CropBox]
|
494
542
|
inheritance_hash[:Rotate] = catalogs[:Rotate] if catalogs[:Rotate]
|
495
543
|
if catalogs[:Resources]
|
496
544
|
inheritance_hash[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
|
497
|
-
(inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &
|
545
|
+
(inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
|
546
|
+
end
|
547
|
+
if catalogs[:ProcSet].is_a?(Array)
|
548
|
+
if(inheritance_hash[:ProcSet])
|
549
|
+
inheritance_hash[:ProcSet][:referenced_object].concat(catalogs[:ProcSet])
|
550
|
+
inheritance_hash[:ProcSet][:referenced_object].uniq!
|
551
|
+
else
|
552
|
+
inheritance_hash[:ProcSet] ||= { referenced_object: catalogs[:ProcSet], is_reference_only: true }.dup
|
553
|
+
end
|
498
554
|
end
|
499
555
|
if catalogs[:ColorSpace]
|
500
556
|
inheritance_hash[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
|
501
|
-
(inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &
|
557
|
+
(inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
|
502
558
|
end
|
503
|
-
# (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &
|
504
|
-
# (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &
|
559
|
+
# (inheritance_hash[:Resources] ||= {}).update((catalogs[:Resources][:referenced_object] || catalogs[:Resources]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Resources]
|
560
|
+
# (inheritance_hash[:ColorSpace] ||= {}).update((catalogs[:ColorSpace][:referenced_object] || catalogs[:ColorSpace]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:ColorSpace]
|
505
561
|
|
506
562
|
# inheritance_hash[:Order] = catalogs[:Order] if catalogs[:Order]
|
507
563
|
# inheritance_hash[:OCProperties] = catalogs[:OCProperties] if catalogs[:OCProperties]
|
@@ -516,13 +572,27 @@ module CombinePDF
|
|
516
572
|
catalogs[:Rotate] ||= inheritance_hash[:Rotate] if inheritance_hash[:Rotate]
|
517
573
|
if inheritance_hash[:Resources]
|
518
574
|
catalogs[:Resources] ||= { referenced_object: {}, is_reference_only: true }.dup
|
519
|
-
|
575
|
+
catalogs[:Resources] = { referenced_object: catalogs[:Resources], is_reference_only: true } unless catalogs[:Resources][:referenced_object]
|
576
|
+
catalogs[:Resources][:referenced_object].update((inheritance_hash[:Resources][:referenced_object] || inheritance_hash[:Resources]), &HASH_UPDATE_PROC_FOR_OLD)
|
520
577
|
end
|
521
578
|
if inheritance_hash[:ColorSpace]
|
522
579
|
catalogs[:ColorSpace] ||= { referenced_object: {}, is_reference_only: true }.dup
|
523
|
-
|
580
|
+
catalogs[:ColorSpace] = { referenced_object: catalogs[:ColorSpace], is_reference_only: true } unless catalogs[:ColorSpace][:referenced_object]
|
581
|
+
catalogs[:ColorSpace][:referenced_object].update((inheritance_hash[:ColorSpace][:referenced_object] || inheritance_hash[:ColorSpace]), &HASH_UPDATE_PROC_FOR_OLD)
|
524
582
|
end
|
525
|
-
|
583
|
+
if inheritance_hash[:ProcSet]
|
584
|
+
if(catalogs[:ProcSet])
|
585
|
+
if catalogs[:ProcSet].is_a?(Array)
|
586
|
+
catalogs[:ProcSet] = { referenced_object: catalogs[:ProcSet], is_reference_only: true }
|
587
|
+
end
|
588
|
+
catalogs[:ProcSet][:referenced_object].concat(inheritance_hash[:ProcSet][:referenced_object])
|
589
|
+
catalogs[:ProcSet][:referenced_object].uniq!
|
590
|
+
else
|
591
|
+
catalogs[:ProcSet] = { is_reference_only: true }.dup
|
592
|
+
catalogs[:ProcSet][:referenced_object] = []
|
593
|
+
end
|
594
|
+
end
|
595
|
+
# (catalogs[:ColorSpace] ||= {}).update(inheritance_hash[:ColorSpace], &HASH_UPDATE_PROC_FOR_OLD) if inheritance_hash[:ColorSpace]
|
526
596
|
# catalogs[:Order] ||= inheritance_hash[:Order] if inheritance_hash[:Order]
|
527
597
|
# catalogs[:AS] ||= inheritance_hash[:AS] if inheritance_hash[:AS]
|
528
598
|
# catalogs[:OCProperties] ||= inheritance_hash[:OCProperties] if inheritance_hash[:OCProperties]
|
@@ -536,9 +606,9 @@ module CombinePDF
|
|
536
606
|
when :Pages
|
537
607
|
catalog_pages(catalogs[:Kids], inheritance_hash.dup) unless catalogs[:Kids].nil?
|
538
608
|
when :Catalog
|
539
|
-
@forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &
|
540
|
-
@names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &
|
541
|
-
@outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &
|
609
|
+
@forms_object.update((catalogs[:AcroForm][:referenced_object] || catalogs[:AcroForm]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:AcroForm]
|
610
|
+
@names_object.update((catalogs[:Names][:referenced_object] || catalogs[:Names]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Names]
|
611
|
+
@outlines_object.update((catalogs[:Outlines][:referenced_object] || catalogs[:Outlines]), &HASH_UPDATE_PROC_FOR_NEW) if catalogs[:Outlines]
|
542
612
|
if catalogs[:Dests] # convert PDF 1.1 Dests to PDF 1.2+ Dests
|
543
613
|
dests_arry = (@names_object[:Dests] ||= {})
|
544
614
|
dests_arry = ((dests_arry[:referenced_object] || dests_arry)[:Names] ||= [])
|
@@ -652,30 +722,45 @@ module CombinePDF
|
|
652
722
|
|
653
723
|
# All Strings are one String
|
654
724
|
def unify_string(str)
|
725
|
+
str.force_encoding(Encoding::ASCII_8BIT)
|
655
726
|
@strings_dictionary[str] ||= str
|
656
727
|
end
|
657
728
|
|
658
729
|
# @private
|
659
730
|
# this method reviews a Hash and updates it by merging Hash data,
|
660
731
|
# preffering the old over the new.
|
661
|
-
|
732
|
+
HASH_UPDATE_PROC_FOR_OLD = Proc.new do |_key, old_data, new_data|
|
662
733
|
if old_data.is_a? Hash
|
663
|
-
old_data.merge(new_data, &
|
734
|
+
old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_OLD)
|
664
735
|
else
|
665
736
|
old_data
|
666
737
|
end
|
667
738
|
end
|
739
|
+
# def self.hash_update_proc_for_old(_key, old_data, new_data)
|
740
|
+
# if old_data.is_a? Hash
|
741
|
+
# old_data.merge(new_data, &method(:hash_update_proc_for_old))
|
742
|
+
# else
|
743
|
+
# old_data
|
744
|
+
# end
|
745
|
+
# end
|
668
746
|
|
669
747
|
# @private
|
670
748
|
# this method reviews a Hash an updates it by merging Hash data,
|
671
749
|
# preffering the new over the old.
|
672
|
-
|
750
|
+
HASH_UPDATE_PROC_FOR_NEW = Proc.new do |_key, old_data, new_data|
|
673
751
|
if old_data.is_a? Hash
|
674
|
-
old_data.merge(new_data, &
|
752
|
+
old_data.merge(new_data, &HASH_UPDATE_PROC_FOR_NEW)
|
675
753
|
else
|
676
754
|
new_data
|
677
755
|
end
|
678
756
|
end
|
757
|
+
# def self.hash_update_proc_for_new(_key, old_data, new_data)
|
758
|
+
# if old_data.is_a? Hash
|
759
|
+
# old_data.merge(new_data, &method(:hash_update_proc_for_new))
|
760
|
+
# else
|
761
|
+
# new_data
|
762
|
+
# end
|
763
|
+
# end
|
679
764
|
|
680
765
|
# # run block of code on evey PDF object (PDF objects are class Hash)
|
681
766
|
# def each_object(object, limit_references = true, already_visited = {}, &block)
|
@@ -137,11 +137,14 @@ module CombinePDF
|
|
137
137
|
catalog_object
|
138
138
|
end
|
139
139
|
|
140
|
+
# Deprecation Notice
|
140
141
|
def names_object
|
142
|
+
puts "CombinePDF Deprecation Notice: the protected method `names_object` will be deprecated in the upcoming version. Use `names` instead."
|
141
143
|
@names
|
142
144
|
end
|
143
145
|
|
144
146
|
def outlines_object
|
147
|
+
puts "CombinePDF Deprecation Notice: the protected method `outlines_object` will be deprecated in the upcoming version. Use `oulines` instead."
|
145
148
|
@outlines
|
146
149
|
end
|
147
150
|
# def forms_data
|
@@ -229,15 +232,42 @@ module CombinePDF
|
|
229
232
|
# @private
|
230
233
|
# this method reviews a Hash and updates it by merging Hash data,
|
231
234
|
# preffering the new over the old.
|
232
|
-
def self.hash_merge_new_no_page(_key, old_data, new_data)
|
233
|
-
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
|
235
|
+
# def self.hash_merge_new_no_page(_key = nil, old_data = nil, new_data = nil)
|
236
|
+
# return old_data unless new_data
|
237
|
+
# return new_data unless old_data
|
238
|
+
# if old_data.is_a?(Hash) && new_data.is_a?(Hash)
|
239
|
+
# return old_data if (old_data[:Type] == :Page)
|
240
|
+
# old_data.merge(new_data, &(@hash_merge_new_no_page_proc ||= method(:hash_merge_new_no_page)))
|
241
|
+
# elsif old_data.is_a? Array
|
242
|
+
# return old_data + new_data if new_data.is_a?(Array)
|
243
|
+
# return old_data.dup << new_data
|
244
|
+
# elsif new_data.is_a? Array
|
245
|
+
# new_data + [old_data]
|
246
|
+
# else
|
247
|
+
# new_data
|
248
|
+
# end
|
249
|
+
# end
|
250
|
+
|
251
|
+
# @private
|
252
|
+
# JRuby Alternative this method reviews a Hash and updates it by merging Hash data,
|
253
|
+
# preffering the new over the old.
|
254
|
+
HASH_MERGE_NEW_NO_PAGE = Proc.new do |_key = nil, old_data = nil, new_data = nil|
|
255
|
+
if !new_data
|
256
|
+
old_data
|
257
|
+
elsif !old_data
|
258
|
+
new_data
|
259
|
+
elsif old_data.is_a?(Hash) && new_data.is_a?(Hash)
|
260
|
+
if (old_data[:Type] == :Page)
|
261
|
+
old_data
|
262
|
+
else
|
263
|
+
old_data.merge(new_data, &HASH_MERGE_NEW_NO_PAGE)
|
264
|
+
end
|
238
265
|
elsif old_data.is_a? Array
|
239
|
-
|
240
|
-
|
266
|
+
if new_data.is_a?(Array)
|
267
|
+
old_data + new_data
|
268
|
+
else
|
269
|
+
old_data.dup << new_data
|
270
|
+
end
|
241
271
|
elsif new_data.is_a? Array
|
242
272
|
new_data + [old_data]
|
243
273
|
else
|
@@ -343,16 +373,19 @@ module CombinePDF
|
|
343
373
|
private
|
344
374
|
|
345
375
|
def equal_layers obj1, obj2, layer = CombinePDF.eq_depth_limit
|
346
|
-
return true if(layer == 0)
|
347
376
|
return true if obj1.object_id == obj2.object_id
|
348
377
|
if obj1.is_a? Hash
|
349
378
|
return false unless obj2.is_a? Hash
|
379
|
+
return false unless obj1.length == obj2.length
|
350
380
|
keys = obj1.keys;
|
351
|
-
|
381
|
+
keys2 = obj2.keys;
|
382
|
+
return false if (keys - keys2).any? || (keys2 - keys).any?
|
383
|
+
return (warn("CombinePDF nesting limit reached") || true) if(layer == 0)
|
352
384
|
keys.each {|k| return false unless equal_layers( obj1[k], obj2[k], layer-1) }
|
353
385
|
elsif obj1.is_a? Array
|
354
386
|
return false unless obj2.is_a? Array
|
355
|
-
|
387
|
+
return false unless obj1.length == obj2.length
|
388
|
+
(obj1-obj2).any? || (obj2-obj1).any?
|
356
389
|
else
|
357
390
|
obj1 == obj2
|
358
391
|
end
|
@@ -82,6 +82,10 @@ module CombinePDF
|
|
82
82
|
# use, for example:
|
83
83
|
# pdf.viewer_preferences[:HideMenubar] = true
|
84
84
|
attr_reader :viewer_preferences
|
85
|
+
# Access the Outlines PDF object Hash (or reference). Use with care.
|
86
|
+
attr_reader :outlines
|
87
|
+
# Access the Names PDF object Hash (or reference). Use with care.
|
88
|
+
attr_reader :names
|
85
89
|
|
86
90
|
def initialize(parser = nil)
|
87
91
|
# default before setting
|
@@ -207,7 +211,7 @@ module CombinePDF
|
|
207
211
|
# when finished, remove the numbering system and keep only pointers
|
208
212
|
remove_old_ids
|
209
213
|
# output the pdf stream
|
210
|
-
out.join("\n").force_encoding(Encoding::ASCII_8BIT)
|
214
|
+
out.join("\n".force_encoding(Encoding::ASCII_8BIT)).force_encoding(Encoding::ASCII_8BIT)
|
211
215
|
end
|
212
216
|
|
213
217
|
# this method returns all the pages cataloged in the catalog.
|
@@ -253,12 +257,16 @@ module CombinePDF
|
|
253
257
|
def fonts(limit_to_type0 = false)
|
254
258
|
fonts_array = []
|
255
259
|
pages.each do |pg|
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
|
260
|
-
|
261
|
-
|
260
|
+
r = pg[:Resources]
|
261
|
+
next if !r
|
262
|
+
r = r[:referenced_object] if r[:referenced_object]
|
263
|
+
r = r[:Font]
|
264
|
+
next if !r
|
265
|
+
r = r[:referenced_object] if r[:referenced_object]
|
266
|
+
r.values.each do |f|
|
267
|
+
f = f[:referenced_object] if f[:referenced_object]
|
268
|
+
if (limit_to_type0 || f[:Subtype] == :Type0) && f[:Type] == :Font && !fonts_array.include?(f)
|
269
|
+
fonts_array << f
|
262
270
|
end
|
263
271
|
end
|
264
272
|
end
|
@@ -302,10 +310,10 @@ module CombinePDF
|
|
302
310
|
if data.is_a? PDF
|
303
311
|
@version = [@version, data.version].max
|
304
312
|
pages_to_add = data.pages
|
305
|
-
actual_value(@names ||= {}.dup).update
|
306
|
-
merge_outlines((@outlines ||= {}.dup), data.
|
313
|
+
actual_value(@names ||= {}.dup).update data.names, &HASH_MERGE_NEW_NO_PAGE
|
314
|
+
merge_outlines((@outlines ||= {}.dup), actual_value(data.outlines), location) unless actual_value(data.outlines).empty?
|
307
315
|
if actual_value(@forms_data)
|
308
|
-
actual_value(@forms_data).update actual_value(data.forms_data), &
|
316
|
+
actual_value(@forms_data).update actual_value(data.forms_data), &HASH_MERGE_NEW_NO_PAGE if data.forms_data
|
309
317
|
else
|
310
318
|
@forms_data = data.forms_data
|
311
319
|
end
|
@@ -354,9 +362,9 @@ module CombinePDF
|
|
354
362
|
#
|
355
363
|
# options:: a Hash of options setting the behavior and format of the page numbers:
|
356
364
|
# - :number_format a string representing the format for page number. defaults to ' - %s - ' (allows for letter numbering as well, such as "a", "b"...).
|
357
|
-
# - :location an Array containing the location for the page numbers, can be :top, :
|
365
|
+
# - :location an Array containing the location for the page numbers, can be :top, :bottom, :top_left, :top_right, :bottom_left, :bottom_right or :center (:center == full page). defaults to [:top, :bottom].
|
358
366
|
# - :start_at an Integer that sets the number for first page number. also accepts a letter ("a") for letter numbering. defaults to 1.
|
359
|
-
# - :margin_from_height a number (PDF points) for the top and
|
367
|
+
# - :margin_from_height a number (PDF points) for the top and bottom margins. defaults to 45.
|
360
368
|
# - :margin_from_side a number (PDF points) for the left and right margins. defaults to 15.
|
361
369
|
# - :page_range a range of pages to be numbered (i.e. (2..-1) ) defaults to all the pages (nil). Remember to set the :start_at to the correct value.
|
362
370
|
# the options Hash can also take all the options for {Page_Methods#textbox}.
|
data/lib/combine_pdf/renderer.rb
CHANGED
@@ -20,8 +20,10 @@ module CombinePDF
|
|
20
20
|
return format_name_to_pdf object
|
21
21
|
elsif object.is_a?(Array)
|
22
22
|
return format_array_to_pdf object
|
23
|
-
elsif object.is_a?(
|
23
|
+
elsif object.is_a?(Integer) || object.is_a?(TrueClass) || object.is_a?(FalseClass)
|
24
24
|
return object.to_s
|
25
|
+
elsif object.is_a?(Numeric) # Float or other non-integer
|
26
|
+
return sprintf('%f', object)
|
25
27
|
elsif object.is_a?(Hash)
|
26
28
|
return format_hash_to_pdf object
|
27
29
|
else
|
@@ -29,25 +31,30 @@ module CombinePDF
|
|
29
31
|
end
|
30
32
|
end
|
31
33
|
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
34
|
+
STRING_REPLACEMENT_ARRAY = []
|
35
|
+
256.times {|i| STRING_REPLACEMENT_ARRAY[i] = [i]}
|
36
|
+
8.times { |i| STRING_REPLACEMENT_ARRAY[i] = "\\00#{i.to_s(8)}".bytes.to_a }
|
37
|
+
24.times { |i| STRING_REPLACEMENT_ARRAY[i + 7] = "\\0#{i.to_s(8)}".bytes.to_a }
|
38
|
+
(256 - 127).times { |i| STRING_REPLACEMENT_ARRAY[(i + 127)] ||= "\\#{(i + 127).to_s(8)}".bytes.to_a }
|
39
|
+
STRING_REPLACEMENT_ARRAY[0x0A] = '\\n'.bytes.to_a
|
40
|
+
STRING_REPLACEMENT_ARRAY[0x0D] = '\\r'.bytes.to_a
|
41
|
+
STRING_REPLACEMENT_ARRAY[0x09] = '\\t'.bytes.to_a
|
42
|
+
STRING_REPLACEMENT_ARRAY[0x08] = '\\b'.bytes.to_a
|
43
|
+
STRING_REPLACEMENT_ARRAY[0x0C] = '\\f'.bytes.to_a # form-feed (\f) == 0x0C
|
44
|
+
STRING_REPLACEMENT_ARRAY[0x28] = '\\('.bytes.to_a
|
45
|
+
STRING_REPLACEMENT_ARRAY[0x29] = '\\)'.bytes.to_a
|
46
|
+
STRING_REPLACEMENT_ARRAY[0x5C] = '\\\\'.bytes.to_a
|
42
47
|
|
43
48
|
def format_string_to_pdf(object)
|
49
|
+
obj_bytes = object.bytes.to_a
|
44
50
|
# object.force_encoding(Encoding::ASCII_8BIT)
|
45
|
-
if
|
46
|
-
|
47
|
-
else
|
48
|
-
# A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f)
|
51
|
+
if object.length == 0 || obj_bytes.min <= 31 || obj_bytes.max >= 127 # || (obj_bytes[0] != 68 object.match(/[^D\:\d\+\-Z\']/))
|
52
|
+
# A hexadecimal string shall be written as a sequence of hexadecimal digits (0-9 and either A-F or a-f)
|
49
53
|
# encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
|
50
54
|
"<#{object.unpack('H*')[0]}>".force_encoding(Encoding::ASCII_8BIT)
|
55
|
+
else
|
56
|
+
# a good fit for a Literal String or the string is a date (MUST be literal)
|
57
|
+
('(' + ([].tap { |out| obj_bytes.each { |byte| out.concat(STRING_REPLACEMENT_ARRAY[byte]) } } ).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
|
51
58
|
end
|
52
59
|
end
|
53
60
|
|
data/lib/combine_pdf/version.rb
CHANGED
data/lib/combine_pdf.rb
CHANGED
data/test/automated
CHANGED
@@ -95,6 +95,8 @@ pdf.save('07_named destinations_numbered.pdf')
|
|
95
95
|
CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err.pdf").save '08_1-unknown-err-empty-str.pdf'
|
96
96
|
CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err2.pdf").save '08_2-unknown-err-empty-str.pdf'
|
97
97
|
CombinePDF.load("./Ruby/test\ pdfs/Scribus-unknown_err3.pdf").save '08_3-unknown-err-empty-str.pdf'
|
98
|
+
CombinePDF.load("./Ruby/test\ pdfs/xref_in_middle.pdf").save '08_4-xref-in-middle.pdf'
|
99
|
+
CombinePDF.load("./Ruby/test\ pdfs/xref_split.pdf").save '08_5-xref-fragmented.pdf'
|
98
100
|
|
99
101
|
CombinePDF.load("/Users/2Be/Ruby/test\ pdfs/nil_object.pdf").save('09_nil_in_parsed_array.pdf')
|
100
102
|
|
@@ -0,0 +1,22 @@
|
|
1
|
+
require 'bundler/setup'
|
2
|
+
require 'minitest/autorun'
|
3
|
+
require 'combine_pdf/renderer'
|
4
|
+
|
5
|
+
class CombinePDFRendererTest < Minitest::Test
|
6
|
+
|
7
|
+
class TestRenderer
|
8
|
+
include CombinePDF::Renderer
|
9
|
+
|
10
|
+
def test_object(object)
|
11
|
+
object_to_pdf(object)
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
def test_numeric_array_to_pdf
|
16
|
+
input = [1.234567, 0.000054, 5, -0.000099]
|
17
|
+
expected = "[1.234567 0.000054 5 -0.000099]".force_encoding('BINARY')
|
18
|
+
actual = TestRenderer.new.test_object(input)
|
19
|
+
|
20
|
+
assert_equal(expected, actual)
|
21
|
+
end
|
22
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: combine_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.22
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Boaz Segev
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-11-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ruby-rc4
|
@@ -25,33 +25,47 @@ dependencies:
|
|
25
25
|
- !ruby/object:Gem::Version
|
26
26
|
version: 0.1.5
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
|
-
name:
|
28
|
+
name: matrix
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- - "
|
31
|
+
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: '
|
34
|
-
type: :
|
33
|
+
version: '0'
|
34
|
+
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- - "
|
38
|
+
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: '
|
40
|
+
version: '0'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: rake
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
|
-
- - "
|
45
|
+
- - ">="
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version:
|
47
|
+
version: 12.3.3
|
48
48
|
type: :development
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
|
-
- - "
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: 12.3.3
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: minitest
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
53
67
|
- !ruby/object:Gem::Version
|
54
|
-
version: '
|
68
|
+
version: '0'
|
55
69
|
description: A nifty gem, in pure Ruby, to parse PDF files and combine (merge) them
|
56
70
|
with other PDF files, number the pages, watermark them or stamp them, create tables,
|
57
71
|
add basic text objects etc` (all using the PDF file format).
|
@@ -82,13 +96,14 @@ files:
|
|
82
96
|
- lib/combine_pdf/renderer.rb
|
83
97
|
- lib/combine_pdf/version.rb
|
84
98
|
- test/automated
|
99
|
+
- test/combine_pdf/renderer_test.rb
|
85
100
|
- test/console
|
86
101
|
- test/named_dest
|
87
102
|
homepage: https://github.com/boazsegev/combine_pdf
|
88
103
|
licenses:
|
89
104
|
- MIT
|
90
105
|
metadata: {}
|
91
|
-
post_install_message:
|
106
|
+
post_install_message:
|
92
107
|
rdoc_options: []
|
93
108
|
require_paths:
|
94
109
|
- lib
|
@@ -103,12 +118,12 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
103
118
|
- !ruby/object:Gem::Version
|
104
119
|
version: '0'
|
105
120
|
requirements: []
|
106
|
-
|
107
|
-
|
108
|
-
signing_key:
|
121
|
+
rubygems_version: 3.2.3
|
122
|
+
signing_key:
|
109
123
|
specification_version: 4
|
110
124
|
summary: Combine, stamp and watermark PDF files in pure Ruby.
|
111
125
|
test_files:
|
112
126
|
- test/automated
|
127
|
+
- test/combine_pdf/renderer_test.rb
|
113
128
|
- test/console
|
114
129
|
- test/named_dest
|