combine_pdf 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/lib/combine_pdf.rb +467 -0
- data/lib/combine_pdf/combine_pdf_basic_writer.rb +110 -0
- data/lib/combine_pdf/combine_pdf_decrypt.rb +198 -0
- data/lib/combine_pdf/combine_pdf_filter.rb +72 -0
- data/lib/combine_pdf/combine_pdf_parser.rb +315 -0
- data/lib/combine_pdf/combine_pdf_pdf.rb +396 -0
- metadata +66 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 797001336a5b4f1598ae399bd69161f9f2b4b4ef
|
4
|
+
data.tar.gz: a684409037ef5205aff23512a8fe9a04ec53d828
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: f74a67926f556606587211b4885e736cd9a8690c85aa62b56eb9deda4497a928ebb55a37ae4156f4ff748903c23b77e1aedb3f686a9c0a002201c6c6abe64afc
|
7
|
+
data.tar.gz: 3e73fd966be0fa30d5626d2216a5bc92bb133c26abd7648374991fadc8438f7cf13d6e549a264af6a8533ad110f76d9f244aa8443282440cd950d214e098d40d
|
data/lib/combine_pdf.rb
ADDED
@@ -0,0 +1,467 @@
|
|
1
|
+
# -*- encoding : utf-8 -*-
|
2
|
+
########################################################
|
3
|
+
## Thoughts from reading the ISO 32000-1:2008
|
4
|
+
## this file is part of the CombinePDF library and the code
|
5
|
+
## is subject to the same license (GPLv3).
|
6
|
+
##
|
7
|
+
##
|
8
|
+
## === Merge PDFs!
|
9
|
+
## This is a pure ruby library to merge PDF files.
|
10
|
+
## In the future, this library will also allow stamping and watermarking PDFs (it allows this now, only with some issues).
|
11
|
+
##
|
12
|
+
## I started the project as a model within a RoR (Ruby on Rails) application, and as it grew I moved it to a local gem.
|
13
|
+
## I fell in love with the project, even if it is still young and in the raw.
|
14
|
+
## It is very simple to parse pdfs - from files:
|
15
|
+
## >> pdf = CombinePDF.new "file_name.pdf"
|
16
|
+
## or from data:
|
17
|
+
## >> pdf = CombinePDF.parse "%PDF-1.4 .... [data]"
|
18
|
+
## It's also easy to start an empty pdf:
|
19
|
+
## >> pdf = CombinePDF.new
|
20
|
+
## Merging is a breeze:
|
21
|
+
## >> pdf << CombinePDF.new "another_file_name.pdf"
|
22
|
+
## and saving the final PDF is a one-liner:
|
23
|
+
## >> pdf.save "output_file_name.pdf"
|
24
|
+
## Also, as a side effect, we can get all sorts of info about our pdf... such as the page count:
|
25
|
+
## >> pdf.version # will tell you the PDF version (if discovered). you can also reset this manually.
|
26
|
+
## >> pdf.pages.length # will tell you how much pages are actually displayed
|
27
|
+
## >> pdf.all_pages.length # will tell you how many page objects actually exist (can be more or less then the pages displayed)
|
28
|
+
## >> pdf.info # a hash with the Info dictionary from the PDF file (if discovered).
|
29
|
+
## === Stamp PDF files
|
30
|
+
## <b>has issues with specific PDF files - please see the issues</b>: https://github.com/boazsegev/combine_pdf/issues/2
|
31
|
+
## You can use PDF files as stamps.
|
32
|
+
## For instance, lets say you have this wonderful PDF (maybe one you created with prawn), and you want to stump the company header and footer on every page.
|
33
|
+
## So you created your Prawn PDF file (Amazing library and hard work there, I totally recommend to have a look @ https://github.com/prawnpdf/prawn ):
|
34
|
+
## >> prawn_pdf = Prawn::Document.new
|
35
|
+
## >> ...(fill your new PDF with goodies)...
|
36
|
+
## Stamping every page is a breeze.
|
37
|
+
## We start by moving the PDF created by prawn into a CombinePDF object.
|
38
|
+
## >> pdf = CombinePDF.parse prawn_pdf.render
|
39
|
+
## Next we extract the stamp from our stamp pdf template:
|
40
|
+
## >> pdf_stamp = CombinePDF.new "stamp_file_name.pdf"
|
41
|
+
## >> stamp_page = pdf_stamp.pages[0]
|
42
|
+
## And off we stamp each page:
|
43
|
+
## >> pdf.pages.each {|page| pages << stamp_page}
|
44
|
+
## Of cource, we can save the stamped output:
|
45
|
+
## >> pdf.save "output_file_name.pdf"
|
46
|
+
## === Decryption & Filters
|
47
|
+
## Some PDF files are encrypted and some are compressed (the use of filters)...
|
48
|
+
## There is very little support for encrypted files and very very basic and limited support for compressed files.
|
49
|
+
## I need help with that.
|
50
|
+
## === Comments and file structure
|
51
|
+
## If you want to help with the code, please be aware:
|
52
|
+
## I'm a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.
|
53
|
+
## The code itself should be very straight forward, but feel free to ask whatever you want.
|
54
|
+
## === Credit
|
55
|
+
## Caige Nichols wrote an amazing RC4 gem which I used in my code.
|
56
|
+
## I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.
|
57
|
+
## Credit to his wonderful is given here. Please respect his license and copyright... and mine.
|
58
|
+
## === License
|
59
|
+
## GPLv3
|
60
|
+
########################################################
|
61
|
+
require 'zlib'
|
62
|
+
require 'strscan'
|
63
|
+
require 'combine_pdf/combine_pdf_pdf'
|
64
|
+
require 'combine_pdf/combine_pdf_decrypt'
|
65
|
+
require 'combine_pdf/combine_pdf_filter'
|
66
|
+
require 'combine_pdf/combine_pdf_parser'
|
67
|
+
module CombinePDF
|
68
|
+
module_function
|
69
|
+
################################################################
|
70
|
+
## These are the "gateway" functions for the model.
|
71
|
+
## These functions are open to the public.
|
72
|
+
################################################################
|
73
|
+
# PDF object types cross reference:
|
74
|
+
# Indirect objects, references, dictionaries and streams are Hash
|
75
|
+
# arrays are Array
|
76
|
+
# strings are String
|
77
|
+
# names are Symbols (String.to_sym)
|
78
|
+
# numbers are Fixnum or Float
|
79
|
+
# boolean are TrueClass or FalseClass
|
80
|
+
|
81
|
+
def new(file_name = "")
|
82
|
+
raise TypeError, "couldn't parse and data, expecting type String" unless file_name.is_a? String
|
83
|
+
return PDF.new() if file_name == ''
|
84
|
+
PDF.new( PDFParser.new( IO.read(file_name).force_encoding(Encoding::ASCII_8BIT) ) )
|
85
|
+
end
|
86
|
+
def parse(data)
|
87
|
+
raise TypeError, "couldn't parse and data, expecting type String" unless data.is_a? String
|
88
|
+
PDF.new( PDFParser.new(data) )
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
module CombinePDF
|
93
|
+
################################################################
|
94
|
+
## These are common functions, used within the different classes
|
95
|
+
## These functions aren't open to the public.
|
96
|
+
################################################################
|
97
|
+
PRIVATE_HASH_KEYS = [:indirect_reference_id, :indirect_generation_number, :raw_stream_content, :is_reference_only, :referenced_object, :indirect_without_dictionary]
|
98
|
+
LITERAL_STRING_REPLACEMENT_HASH = {
|
99
|
+
110 => 10, # "\\n".bytes = [92, 110] "\n".ord = 10
|
100
|
+
114 => 13, #r
|
101
|
+
116 => 9, #t
|
102
|
+
98 => 8, #b
|
103
|
+
102 => 255, #f
|
104
|
+
40 => 40, #(
|
105
|
+
41 => 41, #)
|
106
|
+
92 => 92 #\
|
107
|
+
}
|
108
|
+
module PDFOperations
|
109
|
+
module_function
|
110
|
+
def inject_to_page page = {Type: :Page, MediaBox: [0,0,612.0,792.0], Resources: {}, Contents: []}, stream = nil, top = true
|
111
|
+
# make sure both the page reciving the new data and the injected page are of the correct data type.
|
112
|
+
return false unless page.is_a?(Hash) && stream.is_a?(Hash)
|
113
|
+
|
114
|
+
# following the reference chain and assigning a pointer to the correct Resouces object.
|
115
|
+
# (assignments of Strings, Arrays and Hashes are pointers in Ruby, unless the .dup method is called)
|
116
|
+
original_resources = page[:Resources]
|
117
|
+
if original_resources[:is_reference_only]
|
118
|
+
original_resources = original_resources[:referenced_object]
|
119
|
+
raise "Couldn't tap into resources dictionary, as it is a reference and isn't linked." unless original_resources
|
120
|
+
end
|
121
|
+
original_contents = page[:Contents]
|
122
|
+
original_contents = [original_contents] unless original_contents.is_a? Array
|
123
|
+
|
124
|
+
stream_resources = stream[:Resources]
|
125
|
+
if stream_resources[:is_reference_only]
|
126
|
+
stream_resources = stream_resources[:referenced_object]
|
127
|
+
raise "Couldn't tap into resources dictionary, as it is a reference and isn't linked." unless stream_resources
|
128
|
+
end
|
129
|
+
stream_contents = stream[:Contents]
|
130
|
+
stream_contents = [stream_contents] unless stream_contents.is_a? Array
|
131
|
+
|
132
|
+
# collect keys as objects - this is to make sure that
|
133
|
+
# we are working on the actual resource data, rather then references
|
134
|
+
flatten_resources_dictionaries stream_resources
|
135
|
+
flatten_resources_dictionaries original_resources
|
136
|
+
|
137
|
+
# injecting each of the values in the injected Page
|
138
|
+
stream_resources.each do |key, new_val|
|
139
|
+
unless PRIVATE_HASH_KEYS.include? key # keep CombinePDF structual data intact.
|
140
|
+
if original_resources[key].nil?
|
141
|
+
original_resources[key] = new_val
|
142
|
+
elsif original_resources[key].is_a?(Hash) && new_val.is_a?(Hash)
|
143
|
+
new_val.update original_resources[key] # make sure the old values are respected
|
144
|
+
original_resources[key].update new_val # transfer old and new values to the injected page
|
145
|
+
end #Do nothing if array - ot is the PROC array, which is an issue
|
146
|
+
end
|
147
|
+
end
|
148
|
+
original_resources[:ProcSet] = [:PDF, :Text, :ImageB, :ImageC, :ImageI] # this was recommended by the ISO. 32000-1:2008
|
149
|
+
|
150
|
+
if top # if this is a stamp (overlay)
|
151
|
+
page[:Contents] = original_contents
|
152
|
+
page[:Contents].push *stream_contents
|
153
|
+
else #if this was a watermark (underlay? would be lost if the page was scanned, as white might not be transparent)
|
154
|
+
page[:Contents] = stream_contents
|
155
|
+
page[:Contents].push *original_contents
|
156
|
+
end
|
157
|
+
|
158
|
+
page
|
159
|
+
end
|
160
|
+
# copy_and_secure_for_injection(page)
|
161
|
+
# - page is a page in the pages array, i.e. pdf.pages[0]
|
162
|
+
# takes a page object and:
|
163
|
+
# makes a deep copy of the page (Ruby defaults to pointers, so this will copy the memory).
|
164
|
+
# then it will rewrite the content stream with renamed resources, so as to avoid name conflicts.
|
165
|
+
def copy_and_secure_for_injection(page)
|
166
|
+
# copy page
|
167
|
+
new_page = create_deep_copy page
|
168
|
+
|
169
|
+
# initiate dictionary from old names to new names
|
170
|
+
names_dictionary = {}
|
171
|
+
|
172
|
+
# itirate through all keys that are name objects and give them new names (add to dic)
|
173
|
+
# this should be done for every dictionary in :Resources
|
174
|
+
# this is a few steps stage:
|
175
|
+
|
176
|
+
# 1. get resources object
|
177
|
+
resources = new_page[:Resources]
|
178
|
+
if resources[:is_reference_only]
|
179
|
+
resources = resources[:referenced_object]
|
180
|
+
raise "Couldn't tap into resources dictionary, as it is a reference and isn't linked." unless resources
|
181
|
+
end
|
182
|
+
|
183
|
+
# 2. establich direct access to dictionaries and remove reference values
|
184
|
+
flatten_resources_dictionaries resources
|
185
|
+
|
186
|
+
# 3. travel every dictionary to pick up names (keys), change them and add them to the dictionary
|
187
|
+
resources.each do |k,v|
|
188
|
+
if v.is_a?(Hash)
|
189
|
+
new_dictionary = {}
|
190
|
+
v.each do |old_key, value|
|
191
|
+
new_key = ("CombinePDF" + SecureRandom.urlsafe_base64(9)).to_sym
|
192
|
+
names_dictionary[old_key] = new_key
|
193
|
+
new_dictionary[new_key] = value
|
194
|
+
end
|
195
|
+
resources[k] = new_dictionary
|
196
|
+
end
|
197
|
+
end
|
198
|
+
|
199
|
+
# now that we have replaced the names in the resources dictionaries,
|
200
|
+
# it is time to replace the names inside the stream
|
201
|
+
# we will need to make sure we have access to the stream injected
|
202
|
+
# we will user PDFFilter.inflate_object
|
203
|
+
(new_page[:Contents].is_a?(Array) ? new_page[:Contents] : [new_page[:Contents] ]).each do |c|
|
204
|
+
stream = c[:referenced_object]
|
205
|
+
PDFFilter.inflate_object stream
|
206
|
+
names_dictionary.each do |old_key, new_key|
|
207
|
+
stream[:raw_stream_content].gsub! _object_to_pdf(old_key), _object_to_pdf(new_key) ##### PRAY(!) that the parsed datawill be correctly reproduced!
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
new_page
|
212
|
+
end
|
213
|
+
def flatten_resources_dictionaries(resources)
|
214
|
+
resources.each do |k,v|
|
215
|
+
if v.is_a?(Hash) && v[:is_reference_only]
|
216
|
+
if v[:referenced_object]
|
217
|
+
resources[k] = resources[k][:referenced_object].dup
|
218
|
+
resources[k].delete(:indirect_reference_id)
|
219
|
+
resources[k].delete(:indirect_generation_number)
|
220
|
+
elsif v[:indirect_without_dictionary]
|
221
|
+
resources[k] = resources[k][:indirect_without_dictionary]
|
222
|
+
end
|
223
|
+
end
|
224
|
+
end
|
225
|
+
end
|
226
|
+
|
227
|
+
|
228
|
+
# Ruby normally assigns pointes.
|
229
|
+
# noramlly:
|
230
|
+
# a = [1,2,3] # => [1,2,3]
|
231
|
+
# b = a # => [1,2,3]
|
232
|
+
# a << 4 # => [1,2,3,4]
|
233
|
+
# b # => [1,2,3,4]
|
234
|
+
# This method makes sure that the memory is copied instead of a pointer assigned.
|
235
|
+
# this works using recursion, so that arrays and hashes within arrays and hashes are also copied and not pointed to.
|
236
|
+
# One needs to be careful of infinit loops using this function.
|
237
|
+
def create_deep_copy object
|
238
|
+
if object.is_a?(Array)
|
239
|
+
return object.map { |e| create_deep_copy e }
|
240
|
+
elsif object.is_a?(Hash)
|
241
|
+
return {}.tap {|out| object.each {|k,v| out[create_deep_copy(k)] = create_deep_copy(v) unless k == :Parent} }
|
242
|
+
elsif object.is_a?(String)
|
243
|
+
return object.dup
|
244
|
+
else
|
245
|
+
return object # objects that aren't Strings, Arrays or Hashes (such as Symbols and Fixnums) aren't pointers in Ruby and are always copied.
|
246
|
+
end
|
247
|
+
end
|
248
|
+
def get_refernced_object(objects_array = [], reference_hash = {})
|
249
|
+
objects_array.each do |stored_object|
|
250
|
+
return stored_object if ( stored_object.is_a?(Hash) &&
|
251
|
+
reference_hash[:indirect_reference_id] == stored_object[:indirect_reference_id] &&
|
252
|
+
reference_hash[:indirect_generation_number] == stored_object[:indirect_generation_number] )
|
253
|
+
end
|
254
|
+
warn "didn't find reference #{reference_hash}"
|
255
|
+
nil
|
256
|
+
end
|
257
|
+
def change_references_to_actual_values(objects_array = [], hash_with_references = {})
|
258
|
+
hash_with_references.each do |k,v|
|
259
|
+
if v.is_a?(Hash) && v[:is_reference_only]
|
260
|
+
hash_with_references[k] = PDFOperations.get_refernced_object( objects_array, v)
|
261
|
+
hash_with_references[k] = hash_with_references[k][:indirect_without_dictionary] if hash_with_references[k].is_a?(Hash) && hash_with_references[k][:indirect_without_dictionary]
|
262
|
+
warn "Couldn't connect all values from references - didn't find reference #{hash_with_references}!!!" if hash_with_references[k] == nil
|
263
|
+
hash_with_references[k] = v unless hash_with_references[k]
|
264
|
+
end
|
265
|
+
end
|
266
|
+
hash_with_references
|
267
|
+
end
|
268
|
+
def change_connected_references_to_actual_values(hash_with_references = {})
|
269
|
+
if hash_with_references.is_a?(Hash)
|
270
|
+
hash_with_references.each do |k,v|
|
271
|
+
if v.is_a?(Hash) && v[:is_reference_only]
|
272
|
+
if v[:indirect_without_dictionary]
|
273
|
+
hash_with_references[k] = v[:indirect_without_dictionary]
|
274
|
+
elsif v[:referenced_object]
|
275
|
+
hash_with_references[k] = v[:referenced_object]
|
276
|
+
else
|
277
|
+
raise "Cannot change references to values, as they are disconnected!"
|
278
|
+
end
|
279
|
+
end
|
280
|
+
end
|
281
|
+
hash_with_references.each {|k, v| change_connected_references_to_actual_values(v) if v.is_a?(Hash) || v.is_a?(Array)}
|
282
|
+
elsif hash_with_references.is_a?(Array)
|
283
|
+
hash_with_references.each {|item| change_connected_references_to_actual_values(item) if item.is_a?(Hash) || item.is_a?(Array)}
|
284
|
+
end
|
285
|
+
hash_with_references
|
286
|
+
end
|
287
|
+
def connect_references_and_actual_values(objects_array = [], hash_with_references = {})
|
288
|
+
ret = true
|
289
|
+
hash_with_references.each do |k,v|
|
290
|
+
if v.is_a?(Hash) && v[:is_reference_only]
|
291
|
+
ref_obj = PDFOperations.get_refernced_object( objects_array, v)
|
292
|
+
hash_with_references[k] = ref_obj[:indirect_without_dictionary] if ref_obj.is_a?(Hash) && ref_obj[:indirect_without_dictionary]
|
293
|
+
ret = false
|
294
|
+
end
|
295
|
+
end
|
296
|
+
ret
|
297
|
+
end
|
298
|
+
|
299
|
+
|
300
|
+
def _each_object(object, limit_references = true, first_call = true, &block)
|
301
|
+
# #####################
|
302
|
+
# ## v.1.2 needs optimazation
|
303
|
+
# case
|
304
|
+
# when object.is_a?(Array)
|
305
|
+
# object.each {|obj| _each_object(obj, limit_references, &block)}
|
306
|
+
# when object.is_a?(Hash)
|
307
|
+
# yield(object)
|
308
|
+
# object.each do |k,v|
|
309
|
+
# unless (limit_references && k == :referenced_object)
|
310
|
+
# unless k == :Parent
|
311
|
+
# _each_object(v, limit_references, &block)
|
312
|
+
# end
|
313
|
+
# end
|
314
|
+
# end
|
315
|
+
# end
|
316
|
+
#####################
|
317
|
+
## v.2.1 needs optimazation
|
318
|
+
## version 2.1 is slightly faster then v.1.2
|
319
|
+
@already_visited = [] if first_call
|
320
|
+
unless limit_references
|
321
|
+
@already_visited << object.object_id
|
322
|
+
end
|
323
|
+
case
|
324
|
+
when object.is_a?(Array)
|
325
|
+
object.each {|obj| _each_object(obj, limit_references, false, &block)}
|
326
|
+
when object.is_a?(Hash)
|
327
|
+
yield(object)
|
328
|
+
unless limit_references && object[:is_reference_only]
|
329
|
+
object.each do |k,v|
|
330
|
+
_each_object(v, limit_references, false, &block) unless @already_visited.include? v.object_id
|
331
|
+
end
|
332
|
+
end
|
333
|
+
end
|
334
|
+
end
|
335
|
+
|
336
|
+
|
337
|
+
|
338
|
+
def _object_to_pdf object
|
339
|
+
case
|
340
|
+
when object.nil?
|
341
|
+
return "null"
|
342
|
+
when object.is_a?(String)
|
343
|
+
return _format_string_to_pdf object
|
344
|
+
when object.is_a?(Symbol)
|
345
|
+
return _format_name_to_pdf object
|
346
|
+
when object.is_a?(Array)
|
347
|
+
return _format_array_to_pdf object
|
348
|
+
when object.is_a?(Fixnum), object.is_a?(Float), object.is_a?(TrueClass), object.is_a?(FalseClass)
|
349
|
+
return object.to_s + " "
|
350
|
+
when object.is_a?(Hash)
|
351
|
+
return _format_hash_to_pdf object
|
352
|
+
else
|
353
|
+
return ''
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
def _format_string_to_pdf(object)
|
358
|
+
if @string_output == :literal #if format is set to Literal
|
359
|
+
#### can be better...
|
360
|
+
replacement_hash = {
|
361
|
+
"\x0A" => "\\n",
|
362
|
+
"\x0D" => "\\r",
|
363
|
+
"\x09" => "\\t",
|
364
|
+
"\x08" => "\\b",
|
365
|
+
"\xFF" => "\\f",
|
366
|
+
"\x28" => "\\(",
|
367
|
+
"\x29" => "\\)",
|
368
|
+
"\x5C" => "\\\\"
|
369
|
+
}
|
370
|
+
32.times {|i| replacement_hash[i.chr] ||= "\\#{i}"}
|
371
|
+
(256-128).times {|i| replacement_hash[(i + 127).chr] ||= "\\#{i+127}"}
|
372
|
+
("(" + ([].tap {|out| object.bytes.each {|byte| replacement_hash[ byte.chr ] ? (replacement_hash[ byte.chr ].bytes.each {|b| out << b}) : out << byte } }).pack('C*') + ")").force_encoding(Encoding::ASCII_8BIT)
|
373
|
+
else
|
374
|
+
# A hexadecimal string shall be written as a sequence of hexadecimal digits (0–9 and either A–F or a–f)
|
375
|
+
# encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
|
376
|
+
("<" + object.unpack('H*')[0] + ">").force_encoding(Encoding::ASCII_8BIT)
|
377
|
+
end
|
378
|
+
end
|
379
|
+
def _format_name_to_pdf(object)
|
380
|
+
# a name object is an atomic symbol uniquely defined by a sequence of ANY characters (8-bit values) except null (character code 0).
|
381
|
+
# print name as a simple string. all characters between ~ and ! (except #) can be raw
|
382
|
+
# the rest will have a number sign and their HEX equivalant
|
383
|
+
# from the standard:
|
384
|
+
# When writing a name in a PDF file, a SOLIDUS (2Fh) (/) shall be used to introduce a name. The SOLIDUS is not part of the name but is a prefix indicating that what follows is a sequence of characters representing the name in the PDF file and shall follow these rules:
|
385
|
+
# a) A NUMBER SIGN (23h) (#) in a name shall be written by using its 2-digit hexadecimal code (23), preceded by the NUMBER SIGN.
|
386
|
+
# b) Any character in a name that is a regular character (other than NUMBER SIGN) shall be written as itself or by using its 2-digit hexadecimal code, preceded by the NUMBER SIGN.
|
387
|
+
# c) Any character that is not a regular character shall be written using its 2-digit hexadecimal code, preceded by the NUMBER SIGN only.
|
388
|
+
# [0x00, 0x09, 0x0a, 0x0c, 0x0d, 0x20, 0x28, 0x29, 0x3c, 0x3e, 0x5b, 0x5d, 0x7b, 0x7d, 0x2f, 0x25]
|
389
|
+
out = object.to_s.bytes.map do |b|
|
390
|
+
case b
|
391
|
+
when 0..15
|
392
|
+
'#0' + b.to_s(16)
|
393
|
+
when 15..32, 35, 37, 40, 41, 47, 60, 62, 91, 93, 123, 125, 127..256
|
394
|
+
'#' + b.to_s(16)
|
395
|
+
else
|
396
|
+
b.chr
|
397
|
+
end
|
398
|
+
end
|
399
|
+
"/" + out.join()
|
400
|
+
end
|
401
|
+
def _format_array_to_pdf(object)
|
402
|
+
# An array shall be written as a sequence of objects enclosed in SQUARE BRACKETS (using LEFT SQUARE BRACKET (5Bh) and RIGHT SQUARE BRACKET (5Dh)).
|
403
|
+
# EXAMPLE [549 3.14 false (Ralph) /SomeName]
|
404
|
+
("[" + (object.collect {|item| _object_to_pdf(item)}).join(' ') + "]").force_encoding(Encoding::ASCII_8BIT)
|
405
|
+
|
406
|
+
end
|
407
|
+
|
408
|
+
def _format_hash_to_pdf(object)
|
409
|
+
# if the object is only a reference:
|
410
|
+
# special conditions apply, and there is only the setting of the reference (if needed) and output
|
411
|
+
if object[:is_reference_only]
|
412
|
+
#
|
413
|
+
if object[:referenced_object] && object[:referenced_object].is_a?(Hash)
|
414
|
+
object[:indirect_reference_id] = object[:referenced_object][:indirect_reference_id]
|
415
|
+
object[:indirect_generation_number] = object[:referenced_object][:indirect_generation_number]
|
416
|
+
end
|
417
|
+
object[:indirect_reference_id] ||= 0
|
418
|
+
object[:indirect_generation_number] ||= 0
|
419
|
+
return "#{object[:indirect_reference_id].to_s} #{object[:indirect_generation_number].to_s} R".force_encoding(Encoding::ASCII_8BIT)
|
420
|
+
end
|
421
|
+
|
422
|
+
# if the object is indirect...
|
423
|
+
out = []
|
424
|
+
if object[:indirect_reference_id]
|
425
|
+
object[:indirect_reference_id] ||= 0
|
426
|
+
object[:indirect_generation_number] ||= 0
|
427
|
+
out << "#{object[:indirect_reference_id].to_s} #{object[:indirect_generation_number].to_s} obj\n".force_encoding(Encoding::ASCII_8BIT)
|
428
|
+
if object[:indirect_without_dictionary]
|
429
|
+
out << _object_to_pdf(object[:indirect_without_dictionary])
|
430
|
+
out << "\nendobj\n"
|
431
|
+
return out.join().force_encoding(Encoding::ASCII_8BIT)
|
432
|
+
end
|
433
|
+
end
|
434
|
+
# correct stream length, if the object is a stream.
|
435
|
+
object[:Length] = object[:raw_stream_content].bytesize if object[:raw_stream_content]
|
436
|
+
|
437
|
+
# if the object is not a simple object, it is a dictionary
|
438
|
+
# A dictionary shall be written as a sequence of key-value pairs enclosed in double angle brackets (<<...>>)
|
439
|
+
# (using LESS-THAN SIGNs (3Ch) and GREATER-THAN SIGNs (3Eh)).
|
440
|
+
out << "<<\n".force_encoding(Encoding::ASCII_8BIT)
|
441
|
+
object.each do |key, value|
|
442
|
+
out << "#{_object_to_pdf key} #{_object_to_pdf value}\n".force_encoding(Encoding::ASCII_8BIT) unless PRIVATE_HASH_KEYS.include? key
|
443
|
+
end
|
444
|
+
out << ">>".force_encoding(Encoding::ASCII_8BIT)
|
445
|
+
out << "\nstream\n#{object[:raw_stream_content]}\nendstream".force_encoding(Encoding::ASCII_8BIT) if object[:raw_stream_content]
|
446
|
+
out << "\nendobj\n" if object[:indirect_reference_id]
|
447
|
+
out.join().force_encoding(Encoding::ASCII_8BIT)
|
448
|
+
end
|
449
|
+
|
450
|
+
|
451
|
+
|
452
|
+
end
|
453
|
+
end
|
454
|
+
|
455
|
+
|
456
|
+
## You can test performance with:
|
457
|
+
## puts Benchmark.measure { pdf = CombinePDF.new(file_name); pdf.save "test.pdf" } # PDFEditor.new_pdf
|
458
|
+
## demo: file_name = "/Users/2Be/Ruby/pdfs/encrypted.pdf"; pdf=0; puts Benchmark.measure { pdf = CombinePDF.new(file_name); pdf.save "test.pdf" }
|
459
|
+
## at the moment... my code it terribly slow for larger files... :(
|
460
|
+
## The file saving is solved (I hope)... but file loading is an issue.
|
461
|
+
## pdf.each_object {|obj| puts "Stream length: #{obj[:raw_stream_content].length} was registered as #{obj[:Length].is_a?(Hash)? obj[:Length][:referenced_object][:indirect_without_dictionary] : obj[:Length]}" if obj[:raw_stream_content] }
|
462
|
+
## pdf.objects.each {|obj| puts "#{obj.class.name}: #{obj[:indirect_reference_id]}, #{obj[:indirect_generation_number]} is: #{obj[:Type] || obj[:indirect_without_dictionary]}" }
|
463
|
+
## puts Benchmark.measure { 1000.times { (CombinePDF::PDFOperations.get_refernced_object pdf.objects, {indirect_reference_id: 100, indirect_generation_number:0}).object_id } }
|
464
|
+
## puts Benchmark.measure { 1000.times { (pdf.objects.select {|o| o[:indirect_reference_id]== 100 && o[:indirect_generation_number] == 0})[0].object_id } }
|
465
|
+
## puts Benchmark.measure { {}.tap {|out| pdf.objects.each {|o| out[ [o[:indirect_reference_id], o[:indirect_generation_number] ] ] = o }} }
|
466
|
+
|
467
|
+
|