combine_pdf 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/lib/combine_pdf.rb +76 -73
- data/lib/combine_pdf/combine_pdf_basic_writer.rb +1 -1
- data/lib/combine_pdf/combine_pdf_decrypt.rb +35 -36
- data/lib/combine_pdf/combine_pdf_filter.rb +1 -1
- data/lib/combine_pdf/combine_pdf_parser.rb +1 -1
- data/lib/combine_pdf/combine_pdf_pdf.rb +49 -20
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 47a88cf13f6bb93a5a75cbe4c227c204e16e3ec3
|
4
|
+
data.tar.gz: 3571716409c5586fef95c067211d4b890baa017a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 89cce7a4ebc0dee9d51563568ea64918d8047b1d37fd7084f9dea424e6e4a06c5f5ad731fc868051d6f9c264d51ab6d7a93e8fbeb402a43542a3f4f109269273
|
7
|
+
data.tar.gz: 80a54f7d2c58399115b4be46a5b039fe1d0ece65c714ec878d1c22a0bf55a4d273506f8893448fdf2c1284a2646b9c1a43590e6da889e6ea25401ecd323e9161
|
data/lib/combine_pdf.rb
CHANGED
@@ -1,95 +1,94 @@
|
|
1
1
|
# -*- encoding : utf-8 -*-
|
2
|
-
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
## >> pdf = CombinePDF.parse "%PDF-1.4 .... [data]"
|
18
|
-
## It's also easy to start an empty pdf:
|
19
|
-
## >> pdf = CombinePDF.new
|
20
|
-
## Merging is a breeze:
|
21
|
-
## >> pdf << CombinePDF.new "another_file_name.pdf"
|
22
|
-
## and saving the final PDF is a one-liner:
|
23
|
-
## >> pdf.save "output_file_name.pdf"
|
24
|
-
## Also, as a side effect, we can get all sorts of info about our pdf... such as the page count:
|
25
|
-
## >> pdf.version # will tell you the PDF version (if discovered). you can also reset this manually.
|
26
|
-
## >> pdf.pages.length # will tell you how much pages are actually displayed
|
27
|
-
## >> pdf.all_pages.length # will tell you how many page objects actually exist (can be more or less then the pages displayed)
|
28
|
-
## >> pdf.info # a hash with the Info dictionary from the PDF file (if discovered).
|
29
|
-
## === Stamp PDF files
|
30
|
-
## <b>has issues with specific PDF files - please see the issues</b>: https://github.com/boazsegev/combine_pdf/issues/2
|
31
|
-
## You can use PDF files as stamps.
|
32
|
-
## For instance, lets say you have this wonderful PDF (maybe one you created with prawn), and you want to stump the company header and footer on every page.
|
33
|
-
## So you created your Prawn PDF file (Amazing library and hard work there, I totally recommend to have a look @ https://github.com/prawnpdf/prawn ):
|
34
|
-
## >> prawn_pdf = Prawn::Document.new
|
35
|
-
## >> ...(fill your new PDF with goodies)...
|
36
|
-
## Stamping every page is a breeze.
|
37
|
-
## We start by moving the PDF created by prawn into a CombinePDF object.
|
38
|
-
## >> pdf = CombinePDF.parse prawn_pdf.render
|
39
|
-
## Next we extract the stamp from our stamp pdf template:
|
40
|
-
## >> pdf_stamp = CombinePDF.new "stamp_file_name.pdf"
|
41
|
-
## >> stamp_page = pdf_stamp.pages[0]
|
42
|
-
## And off we stamp each page:
|
43
|
-
## >> pdf.pages.each {|page| pages << stamp_page}
|
44
|
-
## Of cource, we can save the stamped output:
|
45
|
-
## >> pdf.save "output_file_name.pdf"
|
46
|
-
## === Decryption & Filters
|
47
|
-
## Some PDF files are encrypted and some are compressed (the use of filters)...
|
48
|
-
## There is very little support for encrypted files and very very basic and limited support for compressed files.
|
49
|
-
## I need help with that.
|
50
|
-
## === Comments and file structure
|
51
|
-
## If you want to help with the code, please be aware:
|
52
|
-
## I'm a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.
|
53
|
-
## The code itself should be very straight forward, but feel free to ask whatever you want.
|
54
|
-
## === Credit
|
55
|
-
## Caige Nichols wrote an amazing RC4 gem which I used in my code.
|
56
|
-
## I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.
|
57
|
-
## Credit to his wonderful is given here. Please respect his license and copyright... and mine.
|
58
|
-
## === License
|
59
|
-
## GPLv3
|
60
|
-
########################################################
|
2
|
+
|
3
|
+
# this file is part of the CombinePDF library and the code
|
4
|
+
# is subject to the same license (GPLv3).
|
5
|
+
#########################################################
|
6
|
+
|
7
|
+
|
8
|
+
|
9
|
+
# PDF object types cross reference:
|
10
|
+
# Indirect objects, references, dictionaries and streams are Hash
|
11
|
+
# arrays are Array
|
12
|
+
# strings are String
|
13
|
+
# names are Symbols (String.to_sym)
|
14
|
+
# numbers are Fixnum or Float
|
15
|
+
# boolean are TrueClass or FalseClass
|
16
|
+
|
61
17
|
require 'zlib'
|
62
18
|
require 'strscan'
|
63
19
|
require 'combine_pdf/combine_pdf_pdf'
|
64
20
|
require 'combine_pdf/combine_pdf_decrypt'
|
65
21
|
require 'combine_pdf/combine_pdf_filter'
|
66
22
|
require 'combine_pdf/combine_pdf_parser'
|
23
|
+
|
24
|
+
# This is a pure ruby library to merge PDF files.
|
25
|
+
# In the future, this library will also allow stamping and watermarking PDFs (it allows this now, only with some issues).
|
26
|
+
#
|
27
|
+
# PDF objects can be used to combine or to inject data.
|
28
|
+
# == Combine / Merge
|
29
|
+
# To combine PDF files (or data):
|
30
|
+
# pdf = CombinePDF.new
|
31
|
+
# pdf << CombinePDF.new "file1.pdf" # one way to combine, very fast.
|
32
|
+
# CombinePDF.new("file2.pdf").pages.each {|page| pdf << page} # different way to combine, slower.
|
33
|
+
# pdf.save "combined.pdf"
|
34
|
+
# == Stamp / Watermark
|
35
|
+
# <b>has issues with specific PDF files - please see the issues</b>: https://github.com/boazsegev/combine_pdf/issues/2
|
36
|
+
# To combine PDF files (or data), first create the stamp from a PDF file:
|
37
|
+
# stamp_pdf_file = CombinePDF.new "stamp_pdf_file.pdf"
|
38
|
+
# stamp_page = stamp_pdf_file.pages[0]
|
39
|
+
# After the stamp was created, inject to PDF pages:
|
40
|
+
# pdf = CombinePDF.new "file1.pdf"
|
41
|
+
# pdf.pages.each {|page| page << stamp_page}
|
42
|
+
# Notice the << operator is on a page and not a PDF object. The << operator acts differently on PDF objects and on Pages.
|
43
|
+
#
|
44
|
+
# Notice that page objects are Hash class objects and the << operator was added to the Page instances without altering the class.
|
45
|
+
#
|
46
|
+
# == Decryption & Filters
|
47
|
+
#
|
48
|
+
# Some PDF files are encrypted and some are compressed (the use of filters)...
|
49
|
+
#
|
50
|
+
# There is very little support for encrypted files and very very basic and limited support for compressed files.
|
51
|
+
#
|
52
|
+
# I need help with that.
|
53
|
+
#
|
54
|
+
# == Comments and file structure
|
55
|
+
#
|
56
|
+
# If you want to help with the code, please be aware:
|
57
|
+
#
|
58
|
+
# I'm a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.
|
59
|
+
#
|
60
|
+
# The code itself should be very straight forward, but feel free to ask whatever you want.
|
61
|
+
#
|
62
|
+
# == Credit
|
63
|
+
#
|
64
|
+
# Caige Nichols wrote an amazing RC4 gem which I used in my code.
|
65
|
+
#
|
66
|
+
# I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.
|
67
|
+
#
|
68
|
+
# Credit to his wonderful is given here. Please respect his license and copyright... and mine.
|
69
|
+
#
|
70
|
+
# == License
|
71
|
+
#
|
72
|
+
# GPLv3
|
67
73
|
module CombinePDF
|
68
74
|
module_function
|
69
|
-
################################################################
|
70
|
-
## These are the "gateway" functions for the model.
|
71
|
-
## These functions are open to the public.
|
72
|
-
################################################################
|
73
|
-
# PDF object types cross reference:
|
74
|
-
# Indirect objects, references, dictionaries and streams are Hash
|
75
|
-
# arrays are Array
|
76
|
-
# strings are String
|
77
|
-
# names are Symbols (String.to_sym)
|
78
|
-
# numbers are Fixnum or Float
|
79
|
-
# boolean are TrueClass or FalseClass
|
80
75
|
|
76
|
+
# Create an empty PDF object or create a PDF object from a file (parsing the file).
|
77
|
+
# file_name:: is the name of a file to be parsed.
|
81
78
|
def new(file_name = "")
|
82
79
|
raise TypeError, "couldn't parse and data, expecting type String" unless file_name.is_a? String
|
83
80
|
return PDF.new() if file_name == ''
|
84
81
|
PDF.new( PDFParser.new( IO.read(file_name).force_encoding(Encoding::ASCII_8BIT) ) )
|
85
82
|
end
|
83
|
+
# Create a PDF object from a raw PDF data (parsing the data).
|
84
|
+
# data:: is a string that represents the content of a PDF file.
|
86
85
|
def parse(data)
|
87
86
|
raise TypeError, "couldn't parse and data, expecting type String" unless data.is_a? String
|
88
87
|
PDF.new( PDFParser.new(data) )
|
89
88
|
end
|
90
89
|
end
|
91
90
|
|
92
|
-
module CombinePDF
|
91
|
+
module CombinePDF #:nodoc: all
|
93
92
|
################################################################
|
94
93
|
## These are common functions, used within the different classes
|
95
94
|
## These functions aren't open to the public.
|
@@ -105,7 +104,7 @@ module CombinePDF
|
|
105
104
|
41 => 41, #)
|
106
105
|
92 => 92 #\
|
107
106
|
}
|
108
|
-
module PDFOperations
|
107
|
+
module PDFOperations #:nodoc: all
|
109
108
|
module_function
|
110
109
|
def inject_to_page page = {Type: :Page, MediaBox: [0,0,612.0,792.0], Resources: {}, Contents: []}, stream = nil, top = true
|
111
110
|
# make sure both the page reciving the new data and the injected page are of the correct data type.
|
@@ -158,9 +157,12 @@ module CombinePDF
|
|
158
157
|
page
|
159
158
|
end
|
160
159
|
# copy_and_secure_for_injection(page)
|
161
|
-
# - page is a page in the pages array, i.e.
|
160
|
+
# - page is a page in the pages array, i.e.
|
161
|
+
# pdf.pages[0]
|
162
162
|
# takes a page object and:
|
163
|
+
#
|
163
164
|
# makes a deep copy of the page (Ruby defaults to pointers, so this will copy the memory).
|
165
|
+
#
|
164
166
|
# then it will rewrite the content stream with renamed resources, so as to avoid name conflicts.
|
165
167
|
def copy_and_secure_for_injection(page)
|
166
168
|
# copy page
|
@@ -335,6 +337,7 @@ module CombinePDF
|
|
335
337
|
|
336
338
|
|
337
339
|
|
340
|
+
# Formats an object into PDF format. This is used my the PDF object to format the PDF file and it is used in the secure injection which is still being developed.
|
338
341
|
def _object_to_pdf object
|
339
342
|
case
|
340
343
|
when object.nil?
|
@@ -5,7 +5,7 @@
|
|
5
5
|
## is subject to the same license.
|
6
6
|
########################################################
|
7
7
|
|
8
|
-
module CombinePDF
|
8
|
+
module CombinePDF #:nodoc: all
|
9
9
|
class PDFDecrypt
|
10
10
|
|
11
11
|
def initialize objects=[], root_doctionary = {}
|
@@ -151,48 +151,47 @@ module CombinePDF
|
|
151
151
|
## copying it from the web page I had in my cache.
|
152
152
|
## This wonderful work was done by Caige Nichols.
|
153
153
|
#####################################################
|
154
|
+
# class RC4
|
155
|
+
# def initialize(str)
|
156
|
+
# begin
|
157
|
+
# raise SyntaxError, "RC4: Key supplied is blank" if str.eql?('')
|
154
158
|
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
+
# @q1, @q2 = 0, 0
|
160
|
+
# @key = []
|
161
|
+
# str.each_byte { |elem| @key << elem } while @key.size < 256
|
162
|
+
# @key.slice!(256..@key.size-1) if @key.size >= 256
|
163
|
+
# @s = (0..255).to_a
|
164
|
+
# j = 0
|
165
|
+
# 0.upto(255) do |i|
|
166
|
+
# j = (j + @s[i] + @key[i] ) % 256
|
167
|
+
# @s[i], @s[j] = @s[j], @s[i]
|
168
|
+
# end
|
169
|
+
# end
|
170
|
+
# end
|
159
171
|
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
@key.slice!(256..@key.size-1) if @key.size >= 256
|
164
|
-
@s = (0..255).to_a
|
165
|
-
j = 0
|
166
|
-
0.upto(255) do |i|
|
167
|
-
j = (j + @s[i] + @key[i] ) % 256
|
168
|
-
@s[i], @s[j] = @s[j], @s[i]
|
169
|
-
end
|
170
|
-
end
|
171
|
-
end
|
172
|
+
# def encrypt!(text)
|
173
|
+
# process text
|
174
|
+
# end
|
172
175
|
|
173
|
-
|
174
|
-
|
175
|
-
|
176
|
+
# def encrypt(text)
|
177
|
+
# process text.dup
|
178
|
+
# end
|
176
179
|
|
177
|
-
|
178
|
-
process text.dup
|
179
|
-
end
|
180
|
+
# alias_method :decrypt, :encrypt
|
180
181
|
|
181
|
-
|
182
|
+
# private
|
182
183
|
|
183
|
-
|
184
|
+
# def process(text)
|
185
|
+
# text.unpack("C*").map { |c| c ^ round }.pack("C*")
|
186
|
+
# end
|
184
187
|
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
@s[@q1], @s[@q2] = @s[@q2], @s[@q1]
|
193
|
-
@s[(@s[@q1]+@s[@q2]) % 256]
|
194
|
-
end
|
195
|
-
end
|
188
|
+
# def round
|
189
|
+
# @q1 = (@q1 + 1) % 256
|
190
|
+
# @q2 = (@q2 + @s[@q1]) % 256
|
191
|
+
# @s[@q1], @s[@q2] = @s[@q2], @s[@q1]
|
192
|
+
# @s[(@s[@q1]+@s[@q2]) % 256]
|
193
|
+
# end
|
194
|
+
# end
|
196
195
|
|
197
196
|
end
|
198
197
|
|
@@ -4,7 +4,7 @@
|
|
4
4
|
## this file is part of the CombinePDF library and the code
|
5
5
|
## is subject to the same license.
|
6
6
|
########################################################
|
7
|
-
module CombinePDF
|
7
|
+
module CombinePDF #:nodoc: all
|
8
8
|
|
9
9
|
########################################################
|
10
10
|
## This is the Parser class.
|
@@ -5,11 +5,26 @@
|
|
5
5
|
## is subject to the same license.
|
6
6
|
########################################################
|
7
7
|
module CombinePDF
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
8
|
+
#######################################################
|
9
|
+
# PDF class is the PDF object that can save itself to
|
10
|
+
# a file and that can be used as a container for a full
|
11
|
+
# PDF file data, including version etc'.
|
12
|
+
#
|
13
|
+
# PDF objects can be used to combine or to inject data.
|
14
|
+
# == Combine
|
15
|
+
# To combine PDF files (or data):
|
16
|
+
# pdf = CombinePDF.new
|
17
|
+
# pdf << CombinePDF.new "file1.pdf" # one way to combine, very fast.
|
18
|
+
# CombinePDF.new("file2.pdf").pages.each {|page| pdf << page} # different way to combine, slower.
|
19
|
+
# pdf.save "combined.pdf"
|
20
|
+
# == Stamp / Watermark
|
21
|
+
# To combine PDF files (or data), first create the stamp from a PDF file:
|
22
|
+
# stamp_pdf_file = CombinePDF.new "stamp_pdf_file.pdf"
|
23
|
+
# stamp_page = stamp_pdf_file.pages[0]
|
24
|
+
# After the stamp was created, inject to PDF pages:
|
25
|
+
# pdf = CombinePDF.new "file1.pdf"
|
26
|
+
# pdf.pages.each {|page| page << stamp_page} # notice the << operator is on a page and not a PDF object.
|
27
|
+
#######################################################
|
13
28
|
class PDF
|
14
29
|
attr_reader :objects, :info
|
15
30
|
attr_accessor :string_output
|
@@ -43,7 +58,9 @@ module CombinePDF
|
|
43
58
|
end
|
44
59
|
|
45
60
|
# Formats the data to PDF formats and returns a binary string that represents the PDF file content.
|
61
|
+
#
|
46
62
|
# This method is used by the save(file_name) method to save the content to a file.
|
63
|
+
#
|
47
64
|
# use this to export the PDF file without saving to disk (such as sending through HTTP ect').
|
48
65
|
def to_pdf
|
49
66
|
#reset version if not specified
|
@@ -90,15 +107,22 @@ module CombinePDF
|
|
90
107
|
end
|
91
108
|
|
92
109
|
# Seve the PDF to file.
|
93
|
-
#
|
94
|
-
#
|
95
|
-
#
|
110
|
+
#
|
111
|
+
# file_name:: is a string or path object for the output.
|
112
|
+
#
|
113
|
+
# <b>Notice!</b> if the file exists, it <b>WILL</b> be overwritten.
|
96
114
|
def save(file_name)
|
97
115
|
IO.binwrite file_name, to_pdf
|
98
116
|
end
|
99
|
-
# this
|
117
|
+
# this method returns all the pages cataloged in the catalog.
|
118
|
+
#
|
100
119
|
# if no catalog is passed, it seeks the existing catalog(s) and searches
|
101
120
|
# for any registered Page objects.
|
121
|
+
#
|
122
|
+
# This method also adds the << operator to each page instance, so that content can be
|
123
|
+
# injected to the pages, as described above.
|
124
|
+
#
|
125
|
+
# (page objects are Hash class objects. the << operator is added to the specific instances without changing the class)
|
102
126
|
def pages(catalogs = nil)
|
103
127
|
page_list = []
|
104
128
|
if catalogs == nil
|
@@ -136,19 +160,13 @@ module CombinePDF
|
|
136
160
|
page_list
|
137
161
|
end
|
138
162
|
|
139
|
-
# this function returns all the Page objects - regardless of order and even if not cataloged
|
140
|
-
# could be used for finding "lost" pages... but actually rather useless.
|
141
|
-
def all_pages
|
142
|
-
#########
|
143
|
-
## Only return the page item, but make sure all references are connected so that
|
144
|
-
## referenced items and be reached through the connections.
|
145
|
-
[].tap {|out| each_object {|obj| out << obj if obj.is_a?(Hash) && obj[:Type] == :Page } }
|
146
|
-
end
|
147
|
-
|
148
163
|
# this function adds pages or CombinePDF objects at the end of the file (merge)
|
149
164
|
# for example:
|
165
|
+
#
|
150
166
|
# pdf = CombinePDF.new "first_file.pdf"
|
167
|
+
#
|
151
168
|
# pdf << CombinePDF.new "second_file.pdf"
|
169
|
+
#
|
152
170
|
# pdf.save "both_files_merged.pdf"
|
153
171
|
def << (obj)
|
154
172
|
#########
|
@@ -181,7 +199,16 @@ module CombinePDF
|
|
181
199
|
warn "Shouldn't add objects to the file if they are not top-level indirect PDF objects."
|
182
200
|
end
|
183
201
|
end
|
184
|
-
|
202
|
+
end
|
203
|
+
class PDF #:nodoc: all
|
204
|
+
# this function returns all the Page objects - regardless of order and even if not cataloged
|
205
|
+
# could be used for finding "lost" pages... but actually rather useless.
|
206
|
+
def all_pages
|
207
|
+
#########
|
208
|
+
## Only return the page item, but make sure all references are connected so that
|
209
|
+
## referenced items and be reached through the connections.
|
210
|
+
[].tap {|out| each_object {|obj| out << obj if obj.is_a?(Hash) && obj[:Type] == :Page } }
|
211
|
+
end
|
185
212
|
def serialize_objects_and_references(object = nil)
|
186
213
|
warn "connecting objects with their references (serialize_objects_and_references)."
|
187
214
|
|
@@ -322,7 +349,8 @@ module CombinePDF
|
|
322
349
|
catalog_object
|
323
350
|
end
|
324
351
|
# this is an alternative to the rebuild_catalog catalog method
|
325
|
-
# this method
|
352
|
+
# this method is used by the to_pdf method, for streamlining the PDF output.
|
353
|
+
# there is no point is calling the method before preparing the output.
|
326
354
|
def rebuild_catalog_and_objects
|
327
355
|
catalog = rebuild_catalog
|
328
356
|
@objects = []
|
@@ -332,6 +360,7 @@ module CombinePDF
|
|
332
360
|
catalog
|
333
361
|
end
|
334
362
|
|
363
|
+
# disabled, don't use. simpley returns true.
|
335
364
|
def rebuild_resources
|
336
365
|
|
337
366
|
warn "Resources re-building disabled as it isn't worth the price in peformance as of yet."
|