combine_pdf 0.2.4 → 0.2.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +23 -5
- data/lib/combine_pdf/decrypt.rb +21 -21
- data/lib/combine_pdf/parser.rb +24 -3
- data/lib/combine_pdf/version.rb +1 -1
- metadata +4 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 61c4d75ddd1975e567b4b96359f852a4eb13fdf7
|
4
|
+
data.tar.gz: 304f0b46cf41a96eddc46c728a20caf9ff77912c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 256814f346635778e23265a652ce800e3c951e21e647943f752b2d231ddc68c90cc6ffeb7cfacaadd34b42faff97233637752cbd68329ff064728d673c989fdf
|
7
|
+
data.tar.gz: 29c90166699f5baf305398a1382e2f0db4ceb00e35391955cb5f6a30012a445fd48f3ce5e2d5154851ad1b9ae8d580f286de71ffe8b881e580b051faa1e1ec13
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,14 @@
|
|
2
2
|
|
3
3
|
***
|
4
4
|
|
5
|
+
Change log v.0.2.5
|
6
|
+
|
7
|
+
**feature**: circumvents an issue with 'wkhtmltopdf', where sometimes the `endobj` keyword would be missing, causing malformed PDF data. The parser will now attempt to auto-fix any `endobj` missing keywords.
|
8
|
+
|
9
|
+
**semi-fix**: make sure decryption is attempetd using actual values (vs. references). The code was updated for a similar result as should have been achived before.
|
10
|
+
|
11
|
+
***
|
12
|
+
|
5
13
|
Change log v.0.2.4
|
6
14
|
|
7
15
|
**fixed**: Fixed the default page sizes which weren't as described in the documentation and now default to US Letter. The documentation was also fixed. No major version bump is declered since the defaults were faulty and weren't as described (fixed a bug, not changed the API).
|
data/README.md
CHANGED
@@ -1,4 +1,6 @@
|
|
1
1
|
# CombinePDF - the ruby way for merging PDF files
|
2
|
+
[![Gem Version](https://badge.fury.io/rb/combine_pdf.svg)](http://badge.fury.io/rb/combine_pdf)
|
3
|
+
|
2
4
|
CombinePDF is a nifty model, written in pure Ruby, to parse PDF files and combine (merge) them with other PDF files, watermark them or stamp them (all using the PDF file format and pure Ruby code).
|
3
5
|
|
4
6
|
# Install
|
@@ -72,7 +74,7 @@ pdf.save "file_with_numbering.pdf"
|
|
72
74
|
|
73
75
|
Numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values - see documentation.
|
74
76
|
|
75
|
-
## Loading PDF data
|
77
|
+
## Loading and Rendering PDF data
|
76
78
|
|
77
79
|
Loading PDF data can be done from file system or directly from the memory.
|
78
80
|
|
@@ -82,19 +84,35 @@ Loading data from a file is easy:
|
|
82
84
|
pdf = CombinePDF.load("file.pdf")
|
83
85
|
```
|
84
86
|
|
85
|
-
|
87
|
+
You can also parse PDF files from memory. Loading from the memory is especially effective for importing PDF data recieved through the internet or from a different authoring library such as Prawn:
|
86
88
|
|
87
89
|
```ruby
|
88
|
-
pdf_data =
|
90
|
+
pdf_data = prawn_pdf_document.render # Import PDF data from Prawn
|
89
91
|
pdf = CombinePDF.parse(pdf_data)
|
90
92
|
```
|
91
93
|
|
92
|
-
|
94
|
+
Similarly, you can output a string of PDF data using `.to_pdf`. For example, to let a user download the PDF from a [Rails](http://rubyonrails.org) or [Plezi](https://github.com/boazsegev/plezi) app:
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
# in a controller action
|
98
|
+
send_data combined_file.to_pdf, filename: "combined.pdf", type: "application/pdf"
|
99
|
+
```
|
100
|
+
|
101
|
+
Or in [Sinatra](http://www.sinatrarb.com):
|
102
|
+
|
103
|
+
```ruby
|
104
|
+
# in your path's block
|
105
|
+
status 200
|
106
|
+
body combined_file.to_pdf
|
107
|
+
headers 'content-type' => "application/pdf"
|
108
|
+
```
|
109
|
+
|
110
|
+
If you prefer to save the PDF data to a file, you can always use the `save` method as we did in our earlier examples.
|
93
111
|
|
94
112
|
Demo
|
95
113
|
====
|
96
114
|
|
97
|
-
You can see a Demo for a ["Bates stumping web-app"](http://combine-pdf-demo.herokuapp.com/bates) and read through it's [code](
|
115
|
+
You can see a Demo for a ["Bates stumping web-app"](http://combine-pdf-demo.herokuapp.com/bates) and read through it's [code](https://github.com/boazsegev/combine_pdf_demo/blob/c9914588e4116dcfdaa37f85727f442b064e2b04/pdf_controller.rb) . Good luck :)
|
98
116
|
|
99
117
|
Decryption & Filters
|
100
118
|
====================
|
data/lib/combine_pdf/decrypt.rb
CHANGED
@@ -17,6 +17,7 @@ module CombinePDF
|
|
17
17
|
|
18
18
|
# This is an internal class. you don't need it.
|
19
19
|
class PDFDecrypt
|
20
|
+
include CombinePDF::Renderer
|
20
21
|
|
21
22
|
# @!visibility private
|
22
23
|
|
@@ -25,9 +26,9 @@ module CombinePDF
|
|
25
26
|
# root_dictionary:: the root PDF dictionary, containing the Encrypt dictionary.
|
26
27
|
def initialize objects=[], root_dictionary = {}
|
27
28
|
@objects = objects
|
28
|
-
@encryption_dictionary = root_dictionary[:Encrypt]
|
29
|
+
@encryption_dictionary = actual_object(root_dictionary[:Encrypt])
|
29
30
|
raise "Cannot decrypt an encrypted file without an encryption dictionary!" unless @encryption_dictionary
|
30
|
-
@root_dictionary = root_dictionary
|
31
|
+
@root_dictionary = actual_object(root_dictionary)
|
31
32
|
@padding_key = [ 0x28, 0xBF, 0x4E, 0x5E, 0x4E, 0x75, 0x8A, 0x41,
|
32
33
|
0x64, 0x00, 0x4E, 0x56, 0xFF, 0xFA, 0x01, 0x08,
|
33
34
|
0x2E, 0x2E, 0x00, 0xB6, 0xD0, 0x68, 0x3E, 0x80,
|
@@ -41,7 +42,7 @@ module CombinePDF
|
|
41
42
|
def decrypt
|
42
43
|
raise_encrypted_error @encryption_dictionary unless @encryption_dictionary[:Filter] == :Standard
|
43
44
|
@key = set_general_key
|
44
|
-
case @encryption_dictionary[:V]
|
45
|
+
case actual_object(@encryption_dictionary[:V])
|
45
46
|
when 1,2
|
46
47
|
# raise_encrypted_error
|
47
48
|
_perform_decrypt_proc_ @objects, self.method(:decrypt_RC4)
|
@@ -49,10 +50,10 @@ module CombinePDF
|
|
49
50
|
# raise unsupported error for now
|
50
51
|
raise_encrypted_error
|
51
52
|
# make sure CF is a Hash (as required by the PDF standard for this type of encryption).
|
52
|
-
raise_encrypted_error unless @encryption_dictionary[:CF].is_a?(Hash)
|
53
|
+
raise_encrypted_error unless actual_object(@encryption_dictionary[:CF]).is_a?(Hash)
|
53
54
|
|
54
55
|
# do nothing if there is no data to decrypt except embeded files...?
|
55
|
-
return true unless (@encryption_dictionary[:CF].values.select { |v| !v[:AuthEvent] || v[:AuthEvent] == :DocOpen } ).empty?
|
56
|
+
return true unless (actual_object(@encryption_dictionary[:CF]).values.select { |v| !v[:AuthEvent] || v[:AuthEvent] == :DocOpen } ).empty?
|
56
57
|
|
57
58
|
# attempt to decrypt all strings?
|
58
59
|
# attempt to decrypy all streams
|
@@ -63,6 +64,8 @@ module CombinePDF
|
|
63
64
|
end
|
64
65
|
#rebuild stream lengths?
|
65
66
|
@objects
|
67
|
+
rescue => e
|
68
|
+
raise_encrypted_error
|
66
69
|
end
|
67
70
|
|
68
71
|
protected
|
@@ -71,17 +74,17 @@ module CombinePDF
|
|
71
74
|
# 1) make sure the initial key is 32 byte long (if no password, uses padding).
|
72
75
|
key = (password.bytes[0..32].to_a + @padding_key)[0..31].to_a.pack('C*').force_encoding(Encoding::ASCII_8BIT)
|
73
76
|
# 2) add the value of the encryption dictionary’s O entry
|
74
|
-
key << @encryption_dictionary[:O].to_s
|
77
|
+
key << actual_object(@encryption_dictionary[:O]).to_s
|
75
78
|
# 3) Convert the integer value of the P entry to a 32-bit unsigned binary number
|
76
79
|
# and pass these bytes low-order byte first
|
77
|
-
key << [@encryption_dictionary[:P]].pack('i')
|
80
|
+
key << [actual_object(@encryption_dictionary[:P])].pack('i')
|
78
81
|
# 4) Pass the first element of the file’s file identifier array
|
79
82
|
# (the value of the ID entry in the document’s trailer dictionary
|
80
|
-
key << @root_dictionary[:ID][0]
|
83
|
+
key << actual_object(@root_dictionary[:ID])[0]
|
81
84
|
# # 4(a) (Security handlers of revision 4 or greater)
|
82
85
|
# # if document metadata is not being encrypted, add 4 bytes with the value 0xFFFFFFFF.
|
83
|
-
if @encryption_dictionary[:R] >= 4
|
84
|
-
unless @encryption_dictionary[:EncryptMetadata] == false #default is true and nil != false
|
86
|
+
if actual_object(@encryption_dictionary[:R]) >= 4
|
87
|
+
unless actual_object(@encryption_dictionary)[:EncryptMetadata] == false #default is true and nil != false
|
85
88
|
key << "\x00\x00\x00\x00"
|
86
89
|
else
|
87
90
|
key << "\xFF\xFF\xFF\xFF"
|
@@ -94,17 +97,17 @@ module CombinePDF
|
|
94
97
|
# pass the first n bytes of the output as input into a new MD5 hash,
|
95
98
|
# where n is the number of bytes of the encryption key as defined by the value of
|
96
99
|
# the encryption dictionary’s Length entry.
|
97
|
-
if @encryption_dictionary[:R] >= 3
|
100
|
+
if actual_object(@encryption_dictionary[:R]) >= 3
|
98
101
|
50.times do|i|
|
99
|
-
key = Digest::MD5.digest(key[0
|
102
|
+
key = Digest::MD5.digest(key[0...actual_object(@encryption_dictionary[:Length])])
|
100
103
|
end
|
101
104
|
end
|
102
105
|
# 6) Set the encryption key to the first n bytes of the output from the final MD5 hash,
|
103
106
|
# where n shall always be 5 for security handlers of revision 2 but,
|
104
107
|
# for security handlers of revision 3 or greater,
|
105
108
|
# shall depend on the value of the encryption dictionary’s Length entry.
|
106
|
-
if @encryption_dictionary[:R] >= 3
|
107
|
-
@key = key[0..(@encryption_dictionary[:Length]/8)]
|
109
|
+
if actual_object(@encryption_dictionary[:R]) >= 3
|
110
|
+
@key = key[0..(actual_object(@encryption_dictionary[:Length])/8)]
|
108
111
|
else
|
109
112
|
@key = key[0..4]
|
110
113
|
end
|
@@ -150,14 +153,11 @@ module CombinePDF
|
|
150
153
|
when object.is_a?(Array)
|
151
154
|
object.map! { |item| _perform_decrypt_proc_(item, decrypt_proc, encrypted_id, encrypted_generation, encrypted_filter) }
|
152
155
|
when object.is_a?(Hash)
|
153
|
-
encrypted_id ||= object[:indirect_reference_id]
|
154
|
-
encrypted_generation ||= object[:indirect_generation_number]
|
155
|
-
encrypted_filter ||= object[:Filter]
|
156
|
+
encrypted_id ||= actual_object(object[:indirect_reference_id])
|
157
|
+
encrypted_generation ||= actual_object(object[:indirect_generation_number])
|
158
|
+
encrypted_filter ||= actual_object(object[:Filter])
|
156
159
|
if object[:raw_stream_content]
|
157
|
-
stream_length = object[:Length]
|
158
|
-
if stream_length.is_a?(Hash) && stream_length[:is_reference_only]
|
159
|
-
stream_length = get_refernced_object(stream_length)[:indirect_without_dictionary]
|
160
|
-
end
|
160
|
+
stream_length = actual_object(object[:Length])
|
161
161
|
actual_length = object[:raw_stream_content].length
|
162
162
|
warn "Stream registeded length was #{object[:Length].to_s} and the actual length was #{actual_length}." if actual_length < stream_length
|
163
163
|
length = [ stream_length, actual_length].min
|
data/lib/combine_pdf/parser.rb
CHANGED
@@ -331,10 +331,26 @@ module CombinePDF
|
|
331
331
|
@scanner.skip_until(/\%\%EOF/)
|
332
332
|
end
|
333
333
|
|
334
|
-
when @scanner.scan(/[\s]+/)
|
335
|
-
# do nothing
|
336
|
-
# warn "White Space, do nothing"
|
334
|
+
when @scanner.scan(/[\s]+/)
|
335
|
+
# Generally, do nothing
|
337
336
|
nil
|
337
|
+
when @scanner.scan(/obj[\s]*/)
|
338
|
+
# Fix wkhtmltopdf PDF authoring issue - missing 'endobj' keywords
|
339
|
+
unless out[-4].nil? || out[-4].is_a?(Hash)
|
340
|
+
keep = []
|
341
|
+
keep << out.pop
|
342
|
+
keep << out.pop
|
343
|
+
|
344
|
+
if out.last.is_a? Hash
|
345
|
+
out << out.pop.merge({indirect_generation_number: out.pop, indirect_reference_id: out.pop})
|
346
|
+
else
|
347
|
+
out << {indirect_without_dictionary: out.pop, indirect_generation_number: out.pop, indirect_reference_id: out.pop}
|
348
|
+
end
|
349
|
+
warn "'endobj' keyword was missing for Object ID: #{out.last[:indirect_reference_id]}, trying to auto-fix issue, but might fail."
|
350
|
+
|
351
|
+
out << keep.pop
|
352
|
+
out << keep.pop
|
353
|
+
end
|
338
354
|
else
|
339
355
|
# always advance
|
340
356
|
# warn "Advnacing for unknown reason..."
|
@@ -454,6 +470,11 @@ module CombinePDF
|
|
454
470
|
obj.delete(:indirect_reference_id); obj.delete(:indirect_generation_number)
|
455
471
|
end
|
456
472
|
self
|
473
|
+
# rescue => e
|
474
|
+
# puts (@parsed.select {|o| !o.is_a?(Hash)})
|
475
|
+
# puts (@parsed)
|
476
|
+
# puts (@references)
|
477
|
+
# raise e
|
457
478
|
end
|
458
479
|
|
459
480
|
# @private
|
data/lib/combine_pdf/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: combine_pdf
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Boaz Segev
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-08
|
11
|
+
date: 2015-09-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ruby-rc4
|
@@ -102,9 +102,10 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
102
102
|
version: '0'
|
103
103
|
requirements: []
|
104
104
|
rubyforge_project:
|
105
|
-
rubygems_version: 2.4.
|
105
|
+
rubygems_version: 2.4.5
|
106
106
|
signing_key:
|
107
107
|
specification_version: 4
|
108
108
|
summary: Combine, stamp and watermark PDF files in pure Ruby.
|
109
109
|
test_files:
|
110
110
|
- test/console
|
111
|
+
has_rdoc:
|