podoff 1.1.1 → 1.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.txt +7 -0
- data/README.md +244 -0
- data/lib/podoff.rb +57 -46
- data/out.txt +1 -0
- data/spec/alpha_spec.rb +40 -0
- data/spec/core_spec.rb +9 -10
- data/spec/document_spec.rb +75 -44
- data/spec/obj_spec.rb +10 -9
- data/spec/spec_helper.rb +66 -0
- metadata +4 -2
data/CHANGELOG.txt
CHANGED
@@ -2,6 +2,13 @@
|
|
2
2
|
= podoff CHANGELOG.txt
|
3
3
|
|
4
4
|
|
5
|
+
== podoff 1.2.0 released 2015-11-11
|
6
|
+
|
7
|
+
- require encoding upon loading and parsing, introduce Document#encoding
|
8
|
+
- drop Podoff::Obj#page_number
|
9
|
+
- use /Kids in /Pages to determine pages and page order
|
10
|
+
|
11
|
+
|
5
12
|
== podoff 1.1.1 released 2015-10-26
|
6
13
|
|
7
14
|
- reworked xref table output
|
data/README.md
CHANGED
@@ -6,17 +6,261 @@
|
|
6
6
|
|
7
7
|
A Ruby tool to deface PDF documents.
|
8
8
|
|
9
|
+
Uses "incremental updates" to do so.
|
10
|
+
|
11
|
+
Podoff is used to write over PDF documents. Those documents should first be uncompressed (and recompressed) (how? see [below](#preparing-documents-for-use-with-podoff))
|
12
|
+
|
13
|
+
```ruby
|
14
|
+
require 'podoff'
|
15
|
+
|
16
|
+
d = Podoff.load('d2.pdf')
|
17
|
+
# load my d2.pdf
|
18
|
+
|
19
|
+
fo = d.add_base_font('Helvetica')
|
20
|
+
# make sure the document knows about "Helvetica"
|
21
|
+
# (one of the base 13 or 14 fonts PDF readers know about)
|
22
|
+
|
23
|
+
|
24
|
+
pa = d.page(1)
|
25
|
+
# grab first page of the document
|
26
|
+
|
27
|
+
pa.insert_font('/MyHelvetica', fo)
|
28
|
+
# link "MyHelvetica" to the base font above for this page
|
29
|
+
|
30
|
+
st =
|
31
|
+
d.add_stream {
|
32
|
+
tf '/MyHelvetica', 12 # Helvetica size 12
|
33
|
+
bt 100, 100, "#{Time.now} stamped via podoff" # text at bottom left
|
34
|
+
}
|
35
|
+
|
36
|
+
pa.insert_content(st)
|
37
|
+
# add content to page
|
38
|
+
|
39
|
+
d.write('d3.pdf')
|
40
|
+
# write stamped document to d3.pdf
|
41
|
+
```
|
42
|
+
|
43
|
+
For more about the podoff "api", read ["how I use podoff"](#how-i-use-podoff).
|
44
|
+
|
9
45
|
If you're looking for serious libraries, look at
|
10
46
|
|
11
47
|
* https://github.com/payrollhero/pdf_tempura
|
12
48
|
* https://github.com/prawnpdf/prawn-templates
|
13
49
|
|
14
50
|
|
51
|
+
## preparing documents for use with podoff
|
52
|
+
|
53
|
+
Podoff is naive and can't read xref tables in object streams. You have to work against PDF documents that have vanilla xref tables. [Qpdf](http://qpdf.sourceforge.net/) to the rescue.
|
54
|
+
|
55
|
+
Given a doc0.pdf you can produce such a document by doing:
|
56
|
+
```
|
57
|
+
qpdf --object-streams=disable doc0.pdf doc1.pdf
|
58
|
+
```
|
59
|
+
doc1.pdf is now ready for overwriting with podoff.
|
60
|
+
|
61
|
+
qpdf has rewritten the PDF, extracting the xref table but keeping the streams compressed.
|
62
|
+
|
63
|
+
|
64
|
+
## bin/podoff
|
65
|
+
|
66
|
+
`bin/podoff` is a command-line tool for to preparing/check PDFs before use.
|
67
|
+
|
68
|
+
```
|
69
|
+
$ ./bin/podoff -h
|
70
|
+
|
71
|
+
Usage: ./bin/podoff [option] {fname}
|
72
|
+
|
73
|
+
-o, --objs List objs
|
74
|
+
-w, --rewrite Rewrite
|
75
|
+
-s, --stamp Apply time stamp at bottom of each page
|
76
|
+
-r, --recompress Recompress
|
77
|
+
--version Show version
|
78
|
+
-h, --help Show this message
|
79
|
+
```
|
80
|
+
|
81
|
+
`--recompress` is mostly an alias for `qpdf --object-streams=disable in.pdf out.pf`
|
82
|
+
|
83
|
+
`--stamp` is used to check whether podoff can add a time stamp on each page of an input PDF.
|
84
|
+
|
85
|
+
|
86
|
+
## how I use podoff
|
87
|
+
|
88
|
+
In the application which necessitated the creation of podoff, there are two PDF to generate from time to time.
|
89
|
+
|
90
|
+
I keep those two PDFs in memory.
|
91
|
+
|
92
|
+
```ruby
|
93
|
+
# lib/myapp/pdf.rb
|
94
|
+
|
95
|
+
require 'podoff'
|
96
|
+
|
97
|
+
module MyApp::Pdf
|
98
|
+
|
99
|
+
DOC0 = Podoff.load('pdf_templates/d0.pdf')
|
100
|
+
DOC1 = Podoff.load('pdf_templates/d1.pdf')
|
101
|
+
|
102
|
+
def generate_doc0(data, path)
|
103
|
+
|
104
|
+
d = DOC0.dup # shallow copy of the document
|
105
|
+
d.add_fonts
|
106
|
+
|
107
|
+
pa2 = d.page(2)
|
108
|
+
st = d.add_stream # open stream...
|
109
|
+
|
110
|
+
st.font 'MyHelv', 12 # font is an alias to tf
|
111
|
+
st.text 100, 100, data['customer_name']
|
112
|
+
st.text 100, 80, data['customer_phone']
|
113
|
+
st.text 100, 60, data['date'] if data['date']
|
114
|
+
# fill in customer info on page 2
|
115
|
+
|
116
|
+
pa2.insert_content(st) ... close stream (yes, you can use a block too)
|
117
|
+
|
118
|
+
pa3 = d.page(3)
|
119
|
+
pa3.insert_content(d.add_stream { check 52, 100 }) if data['discount']
|
120
|
+
# a single check on page 3 if the customer gets a discount
|
121
|
+
|
122
|
+
d.write(path)
|
123
|
+
end
|
124
|
+
|
125
|
+
# ...
|
126
|
+
end
|
127
|
+
|
128
|
+
module Podoff # adding a few helper methods to the podoff classes
|
129
|
+
|
130
|
+
class Document
|
131
|
+
|
132
|
+
# Makes sure Helvetica and ZapfDingbats are available
|
133
|
+
# on each page of the document
|
134
|
+
#
|
135
|
+
def add_fonts
|
136
|
+
|
137
|
+
fo0 = add_base_font('/Helvetica')
|
138
|
+
fo1 = add_base_font('/ZapfDingbats')
|
139
|
+
|
140
|
+
pages.each { |pa|
|
141
|
+
pa = re_add(pa)
|
142
|
+
pa.insert_font('/MyHelv', fo0)
|
143
|
+
pa.insert_font('/MyZapf', fo1)
|
144
|
+
}
|
145
|
+
end
|
146
|
+
end
|
147
|
+
|
148
|
+
class Stream
|
149
|
+
|
150
|
+
# Places a check mark ✓ at x, y
|
151
|
+
#
|
152
|
+
def check(x, y)
|
153
|
+
|
154
|
+
font = @font # save current font
|
155
|
+
self.tf '/MyZapf', 12 # switch to ZapfDingbats size 12
|
156
|
+
self.bt x, y, '3' # check mark
|
157
|
+
@font = font # get back to saved font
|
158
|
+
end
|
159
|
+
end
|
160
|
+
end
|
161
|
+
```
|
162
|
+
|
163
|
+
The documents are kept in memory, as generation request comes, the get duplicated, incrementally updated and the filled documents are written to disk. The duplication doesn't copy the whole document file, only the references to the "obj" in the document get copied.
|
164
|
+
|
165
|
+
### Podoff::Document
|
166
|
+
|
167
|
+
```ruby
|
168
|
+
class Podoff::Document
|
169
|
+
|
170
|
+
def self.load(path, encoding='iso-8859-1')
|
171
|
+
# Podoff.load(path, encoding) is a shortcut to this method
|
172
|
+
|
173
|
+
def dup
|
174
|
+
# Makes a shallow copy of the document
|
175
|
+
|
176
|
+
def add_base_font(name)
|
177
|
+
# Given a name in the base 13/14 fonts readers are supposed to know,
|
178
|
+
# ensures the document has access to the font.
|
179
|
+
# Usually "Helvetica" or "ZapfDingbats".
|
180
|
+
|
181
|
+
def pages
|
182
|
+
# Returns an array of all the objs that are pages
|
183
|
+
|
184
|
+
def page(index)
|
185
|
+
# Starts at 1, returns a page obj. Understands negative indexes, like
|
186
|
+
# -1 for the last page.
|
187
|
+
|
188
|
+
def add_stream(src=nil, &block)
|
189
|
+
# Prepares a new obj with a stream
|
190
|
+
# If src is given places the src string in the stream.
|
191
|
+
# If a block is given executes the block in the context of the
|
192
|
+
# Podoff::Stream instance.
|
193
|
+
# If no src and no block, simply returns the Podoff::Stream wrapped inside
|
194
|
+
# of the new obj (see example code above)
|
195
|
+
|
196
|
+
def re_add(obj_or_ref)
|
197
|
+
# Given an obj or a ref (like "1234 0") to an obj, copies that obj
|
198
|
+
# and re-adds it to the document.
|
199
|
+
# This is necessary for the incremental updates podoff uses, if you add
|
200
|
+
# an obj to the Contents list of a page, you have to add it to the
|
201
|
+
# re-added page, not directly to the original page.
|
202
|
+
|
203
|
+
def write(path=:string)
|
204
|
+
# Writes the document, with incremental updates to a file given by its path.
|
205
|
+
# If the path is :string, will simply return the string containing the
|
206
|
+
# whole document
|
207
|
+
|
208
|
+
def rewrite(path=:string)
|
209
|
+
# Like #write, but squashes the incremental updates in the document.
|
210
|
+
# Takes more time and memory and might fail (remember, podoff is very
|
211
|
+
# naive (as his author is)). Test with care...
|
212
|
+
|
213
|
+
#
|
214
|
+
# a bit lower-level...
|
215
|
+
|
216
|
+
def objs
|
217
|
+
# returns the hash { String/obj_ref => Podoff::Obj/obj_instance }
|
218
|
+
```
|
219
|
+
|
220
|
+
### Podoff::Obj
|
221
|
+
|
222
|
+
A PDF document is mostly a hierarchy of `obj` elements. `Podoff::Obj` points to such elements (see `Podoff::Document#objs`).
|
223
|
+
|
224
|
+
```ruby
|
225
|
+
class Podoff::Obj
|
226
|
+
|
227
|
+
def insert_font(font_nick, font_obj_or_ref)
|
228
|
+
def insert_contents(obj_or_ref)
|
229
|
+
```
|
230
|
+
|
231
|
+
### Podoff::Stream
|
232
|
+
|
233
|
+
TODO
|
234
|
+
|
235
|
+
```ruby
|
236
|
+
class Podoff::Stream
|
237
|
+
|
238
|
+
def tf(font_name, font_size)
|
239
|
+
alias :font :tf
|
240
|
+
|
241
|
+
def bt(x, y, text)
|
242
|
+
alias :text :bt
|
243
|
+
```
|
244
|
+
|
245
|
+
|
15
246
|
## disclaimer
|
16
247
|
|
17
248
|
The author of this tool/library have no link whatsoever with the authors of the sample PDF documents found under `pdfs/`. Those documents have been selected because they are representative of the PDF forms podoff is meant to ~~deface~~fill.
|
18
249
|
|
19
250
|
|
251
|
+
## known bugs
|
252
|
+
|
253
|
+
* podoff parsing is naive, documents that contain uncompressed streams with "endobj", "startxref", "/Root" will disorient podoff
|
254
|
+
* completely candid about encoding (only used it for British English documents so far)
|
255
|
+
|
256
|
+
|
257
|
+
## links
|
258
|
+
|
259
|
+
* http://qpdf.sourceforge.net/ source: https://github.com/qpdf/qpdf
|
260
|
+
|
261
|
+
* http://www.slideshare.net/ange4771/advanced-pdf-tricks
|
262
|
+
|
263
|
+
|
20
264
|
## LICENSE
|
21
265
|
|
22
266
|
MIT, see [LICENSE.txt](LICENSE.txt)
|
data/lib/podoff.rb
CHANGED
@@ -30,23 +30,26 @@ require 'stringio'
|
|
30
30
|
|
31
31
|
module Podoff
|
32
32
|
|
33
|
-
VERSION = '1.
|
33
|
+
VERSION = '1.2.0'
|
34
34
|
|
35
|
-
def self.load(path, encoding
|
35
|
+
def self.load(path, encoding)
|
36
36
|
|
37
37
|
Podoff::Document.load(path, encoding)
|
38
38
|
end
|
39
39
|
|
40
|
-
def self.parse(s)
|
40
|
+
def self.parse(s, encoding)
|
41
41
|
|
42
|
-
Podoff::Document.new(s)
|
42
|
+
Podoff::Document.new(s, encoding)
|
43
43
|
end
|
44
44
|
|
45
45
|
class Document
|
46
46
|
|
47
|
-
def self.load(path, encoding
|
47
|
+
def self.load(path, encoding)
|
48
48
|
|
49
|
-
Podoff::Document.new(
|
49
|
+
Podoff::Document.new(
|
50
|
+
File.open(path, 'r:' + encoding) { |f| f.read },
|
51
|
+
encoding
|
52
|
+
)
|
50
53
|
end
|
51
54
|
|
52
55
|
def self.parse(s)
|
@@ -54,6 +57,8 @@ module Podoff
|
|
54
57
|
Podoff::Document.new(s)
|
55
58
|
end
|
56
59
|
|
60
|
+
attr_reader :encoding
|
61
|
+
|
57
62
|
attr_reader :scanner
|
58
63
|
attr_reader :version
|
59
64
|
attr_reader :xref
|
@@ -63,11 +68,13 @@ module Podoff
|
|
63
68
|
#
|
64
69
|
attr_reader :additions
|
65
70
|
|
66
|
-
def initialize(s)
|
71
|
+
def initialize(s, encoding)
|
67
72
|
|
68
73
|
fail ArgumentError.new('not a PDF file') \
|
69
74
|
unless s.match(/\A%PDF-\d+\.\d+\s/)
|
70
75
|
|
76
|
+
@encoding = encoding
|
77
|
+
|
71
78
|
@scanner = ::StringScanner.new(s)
|
72
79
|
@version = nil
|
73
80
|
@xref = nil
|
@@ -113,11 +120,6 @@ module Podoff
|
|
113
120
|
@scanner.string
|
114
121
|
end
|
115
122
|
|
116
|
-
def extract_ref(s)
|
117
|
-
|
118
|
-
s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
|
119
|
-
end
|
120
|
-
|
121
123
|
def updated?
|
122
124
|
|
123
125
|
@additions.any?
|
@@ -129,6 +131,8 @@ module Podoff
|
|
129
131
|
|
130
132
|
self.class.allocate.instance_eval do
|
131
133
|
|
134
|
+
@encoding = o.encoding
|
135
|
+
|
132
136
|
@scanner = ::StringScanner.new(o.source)
|
133
137
|
@xref = o.xref
|
134
138
|
|
@@ -146,26 +150,23 @@ module Podoff
|
|
146
150
|
|
147
151
|
def pages
|
148
152
|
|
149
|
-
|
150
|
-
end
|
151
|
-
|
152
|
-
def page(index)
|
153
|
+
#@objs.values.select { |o| o.type == '/Page' }
|
153
154
|
|
154
|
-
|
155
|
+
ps = @objs.values.find { |o| o.type == '/Pages' }
|
156
|
+
return nil unless ps
|
155
157
|
|
156
|
-
|
157
|
-
|
158
|
+
extract_refs(ps.attributes[:kids]).collect { |r| @objs[r] }
|
159
|
+
end
|
158
160
|
|
159
|
-
|
160
|
-
index > 0 ? pas.at(index - 1) : pas.at(index)
|
161
|
-
) unless pas.first.attributes[:pagenum]
|
161
|
+
def page(index)
|
162
162
|
|
163
163
|
if index < 0
|
164
|
-
|
165
|
-
|
164
|
+
pages[index]
|
165
|
+
elsif index == 0
|
166
|
+
nil
|
167
|
+
else
|
168
|
+
pages[index - 1]
|
166
169
|
end
|
167
|
-
|
168
|
-
pas.find { |pa| pa.page_number == index }
|
169
170
|
end
|
170
171
|
|
171
172
|
def new_ref
|
@@ -224,7 +225,9 @@ module Podoff
|
|
224
225
|
add(obj)
|
225
226
|
end
|
226
227
|
|
227
|
-
def write(path)
|
228
|
+
def write(path=:string, encoding=nil)
|
229
|
+
|
230
|
+
encoding ||= @encoding
|
228
231
|
|
229
232
|
f =
|
230
233
|
case path
|
@@ -232,6 +235,8 @@ module Podoff
|
|
232
235
|
when String then File.open(path, 'wb')
|
233
236
|
else path
|
234
237
|
end
|
238
|
+
f.set_encoding(encoding) # internal encoding: nil
|
239
|
+
#f.set_encoding(encoding, encoding)
|
235
240
|
|
236
241
|
f.write(source)
|
237
242
|
|
@@ -241,19 +246,19 @@ module Podoff
|
|
241
246
|
|
242
247
|
@additions.values.each do |o|
|
243
248
|
f.write("\n")
|
244
|
-
pointers[o.ref.split(' ').first.to_i] = f.pos
|
245
|
-
f.write(o.to_s)
|
249
|
+
pointers[o.ref.split(' ').first.to_i] = f.pos
|
250
|
+
f.write(o.to_s.force_encoding(encoding))
|
246
251
|
end
|
247
252
|
f.write("\n\n")
|
248
253
|
|
249
|
-
xref = f.pos
|
254
|
+
xref = f.pos
|
250
255
|
|
251
256
|
write_xref(f, pointers)
|
252
257
|
|
253
258
|
f.write("trailer\n")
|
254
259
|
f.write("<<\n")
|
255
260
|
f.write("/Prev #{self.xref}\n")
|
256
|
-
f.write("/Size #{objs.size}\n")
|
261
|
+
f.write("/Size #{objs.size + 1}\n")
|
257
262
|
f.write("/Root #{root} R\n")
|
258
263
|
f.write(">>\n")
|
259
264
|
f.write("startxref #{xref}\n")
|
@@ -265,7 +270,9 @@ module Podoff
|
|
265
270
|
f.is_a?(StringIO) ? f.string : nil
|
266
271
|
end
|
267
272
|
|
268
|
-
def rewrite(path=:string)
|
273
|
+
def rewrite(path=:string, encoding=nil)
|
274
|
+
|
275
|
+
encoding ||= @encoding
|
269
276
|
|
270
277
|
f =
|
271
278
|
case path
|
@@ -273,6 +280,7 @@ module Podoff
|
|
273
280
|
when String then File.open(path, 'wb')
|
274
281
|
else path
|
275
282
|
end
|
283
|
+
f.set_encoding(encoding)
|
276
284
|
|
277
285
|
v = source.match(/%PDF-\d+\.\d+/)[0]
|
278
286
|
f.write(v)
|
@@ -281,18 +289,18 @@ module Podoff
|
|
281
289
|
pointers = {}
|
282
290
|
|
283
291
|
objs.keys.sort.each do |k|
|
284
|
-
pointers[k.split(' ').first.to_i] = f.pos
|
285
|
-
f.write(objs[k].source)
|
292
|
+
pointers[k.split(' ').first.to_i] = f.pos
|
293
|
+
f.write(objs[k].source.force_encoding(encoding))
|
286
294
|
f.write("\n")
|
287
295
|
end
|
288
296
|
|
289
|
-
xref = f.pos
|
297
|
+
xref = f.pos
|
290
298
|
|
291
299
|
write_xref(f, pointers)
|
292
300
|
|
293
301
|
f.write("trailer\n")
|
294
302
|
f.write("<<\n")
|
295
|
-
f.write("/Size #{objs.size}\n")
|
303
|
+
f.write("/Size #{objs.size + 1}\n")
|
296
304
|
f.write("/Root #{root} R\n")
|
297
305
|
f.write(">>\n")
|
298
306
|
f.write("startxref #{xref}\n")
|
@@ -309,7 +317,7 @@ module Podoff
|
|
309
317
|
|
310
318
|
f.write("xref\n")
|
311
319
|
f.write("0 1\n")
|
312
|
-
f.write("0000000000 65535 f\n")
|
320
|
+
f.write("0000000000 65535 f \n")
|
313
321
|
|
314
322
|
pointers
|
315
323
|
.keys
|
@@ -321,7 +329,7 @@ module Podoff
|
|
321
329
|
}
|
322
330
|
.each { |part|
|
323
331
|
f.write("#{part.first} #{part.size}\n")
|
324
|
-
part.each { |k| f.write(sprintf("%010d 00000 n\n", pointers[k])) }
|
332
|
+
part.each { |k| f.write(sprintf("%010d 00000 n \n", pointers[k])) }
|
325
333
|
}
|
326
334
|
end
|
327
335
|
|
@@ -332,12 +340,21 @@ module Podoff
|
|
332
340
|
|
333
341
|
s
|
334
342
|
end
|
343
|
+
|
344
|
+
def extract_ref(s)
|
345
|
+
|
346
|
+
s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
|
347
|
+
end
|
348
|
+
|
349
|
+
def extract_refs(s)
|
350
|
+
|
351
|
+
s.gsub(/\s+/, ' ').scan(/(\d+ \d+) R/).collect(&:first)
|
352
|
+
end
|
335
353
|
end
|
336
354
|
|
337
355
|
class Obj
|
338
356
|
|
339
|
-
ATTRIBUTES =
|
340
|
-
{ type: 'Type', contents: 'Contents', pagenum: 'pdftk_PageNum' }
|
357
|
+
ATTRIBUTES = { type: 'Type', contents: 'Contents', kids: 'Kids' }
|
341
358
|
|
342
359
|
def self.extract(doc)
|
343
360
|
|
@@ -413,12 +430,6 @@ module Podoff
|
|
413
430
|
@attributes && @attributes[:type]
|
414
431
|
end
|
415
432
|
|
416
|
-
def page_number
|
417
|
-
|
418
|
-
r = @attributes && @attributes[:pagenum]
|
419
|
-
r ? r.to_i : nil
|
420
|
-
end
|
421
|
-
|
422
433
|
def insert_font(nick, obj_or_ref)
|
423
434
|
|
424
435
|
fail ArgumentError.new("target '#{ref}' not a replica") \
|
data/out.txt
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
utf-8
|
data/spec/alpha_spec.rb
ADDED
@@ -0,0 +1,40 @@
|
|
1
|
+
|
2
|
+
#
|
3
|
+
# specifying podoff
|
4
|
+
#
|
5
|
+
# Tue Nov 10 21:01:51 JST 2015
|
6
|
+
#
|
7
|
+
|
8
|
+
require 'spec_helper'
|
9
|
+
|
10
|
+
|
11
|
+
describe 'fixtures:' do
|
12
|
+
|
13
|
+
Dir['pdfs/*.pdf'].each do |path|
|
14
|
+
|
15
|
+
describe path do
|
16
|
+
|
17
|
+
it 'is a valid pdf document' do
|
18
|
+
|
19
|
+
expect(path).to be_a_valid_pdf
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
describe 'pdfs/t0.pdf' do
|
25
|
+
|
26
|
+
it 'is encoded as UTF-8' do
|
27
|
+
|
28
|
+
expect('pdfs/t0.pdf').to be_encoded_as('utf-8')
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
describe 'pdfs/udocument0.pdf' do
|
33
|
+
|
34
|
+
it 'is encoded as ISO-8859-1' do
|
35
|
+
|
36
|
+
expect('pdfs/udocument0.pdf').to be_encoded_as('latin1')
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
data/spec/core_spec.rb
CHANGED
@@ -14,11 +14,11 @@ describe Podoff do
|
|
14
14
|
|
15
15
|
it 'loads a PDF document' do
|
16
16
|
|
17
|
-
d = Podoff.load('pdfs/t0.pdf')
|
17
|
+
d = Podoff.load('pdfs/t0.pdf', 'utf-8')
|
18
18
|
|
19
19
|
expect(d.class).to eq(Podoff::Document)
|
20
20
|
expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
|
21
|
-
expect(d.xref).to eq(
|
21
|
+
expect(d.xref).to eq(413)
|
22
22
|
|
23
23
|
#pp d.objs.values.collect(&:to_a)
|
24
24
|
|
@@ -41,25 +41,25 @@ describe Podoff do
|
|
41
41
|
|
42
42
|
it 'loads a PDF document' do
|
43
43
|
|
44
|
-
d = Podoff.load('pdfs/udocument0.pdf')
|
44
|
+
d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
|
45
45
|
|
46
46
|
expect(d.class).to eq(Podoff::Document)
|
47
|
-
expect(d.xref).to eq(
|
47
|
+
expect(d.xref).to eq(1612815)
|
48
48
|
expect(d.objs.size).to eq(273)
|
49
49
|
expect(d.objs.keys).to include('1 0')
|
50
50
|
expect(d.objs.keys).to include('273 0')
|
51
51
|
|
52
|
-
expect(d.root).to eq('
|
52
|
+
expect(d.root).to eq('1 0')
|
53
53
|
|
54
54
|
expect(d.pages.size).to eq(3)
|
55
55
|
end
|
56
56
|
|
57
57
|
it 'loads a PDF document with incremental updates' do
|
58
58
|
|
59
|
-
d = Podoff.load('pdfs/t1.pdf')
|
59
|
+
d = Podoff.load('pdfs/t1.pdf', 'utf-8')
|
60
60
|
|
61
61
|
expect(d.class).to eq(Podoff::Document)
|
62
|
-
expect(d.xref).to eq(
|
62
|
+
expect(d.xref).to eq(704)
|
63
63
|
expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
|
64
64
|
|
65
65
|
expect(d.obj_counters.keys).to eq(
|
@@ -72,7 +72,7 @@ describe Podoff do
|
|
72
72
|
|
73
73
|
it 'loads a [re]compressed PDF documents' do
|
74
74
|
|
75
|
-
d = Podoff.load('pdfs/qdocument0.pdf')
|
75
|
+
d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
|
76
76
|
|
77
77
|
expect(d.class).to eq(Podoff::Document)
|
78
78
|
expect(d.xref).to eq(1612815)
|
@@ -85,14 +85,13 @@ describe Podoff do
|
|
85
85
|
#end
|
86
86
|
|
87
87
|
expect(d.pages.size).to eq(3)
|
88
|
-
expect(d.pages.first.attributes[:pagenum]).to eq('1')
|
89
88
|
expect(d.objs['46 0'].attributes[:type]).to eq('/Annot')
|
90
89
|
end
|
91
90
|
|
92
91
|
it 'rejects items that are not PDF documents' do
|
93
92
|
|
94
93
|
expect {
|
95
|
-
Podoff.load('spec/spec_helper.rb')
|
94
|
+
Podoff.load('spec/spec_helper.rb', 'utf-8')
|
96
95
|
}.to raise_error(ArgumentError, 'not a PDF file')
|
97
96
|
end
|
98
97
|
end
|
data/spec/document_spec.rb
CHANGED
@@ -12,7 +12,7 @@ describe Podoff::Document do
|
|
12
12
|
|
13
13
|
before :all do
|
14
14
|
|
15
|
-
@d = Podoff.load('pdfs/udocument0.pdf')
|
15
|
+
@d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
|
16
16
|
end
|
17
17
|
|
18
18
|
describe '#objs' do
|
@@ -39,10 +39,9 @@ describe Podoff::Document do
|
|
39
39
|
it 'returns a page given an index (starts at 1)' do
|
40
40
|
|
41
41
|
p = @d.page(1)
|
42
|
+
expect(p.ref).to eq('56 0')
|
42
43
|
expect(p.class).to eq(Podoff::Obj)
|
43
44
|
expect(p.type).to eq('/Page')
|
44
|
-
expect(p.attributes[:pagenum]).to eq('1')
|
45
|
-
expect(p.page_number).to eq(1)
|
46
45
|
end
|
47
46
|
|
48
47
|
it 'returns nil if the page doesn\'t exist' do
|
@@ -51,12 +50,11 @@ describe Podoff::Document do
|
|
51
50
|
expect(@d.page(9)).to eq(nil)
|
52
51
|
end
|
53
52
|
|
54
|
-
it 'returns
|
53
|
+
it 'returns a page given an index (starts at 1) (2)' do
|
55
54
|
|
56
|
-
d = Podoff::Document.load('pdfs/t2.pdf')
|
55
|
+
d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
|
57
56
|
|
58
57
|
expect(d.page(1).ref).to eq('3 0')
|
59
|
-
expect(d.page(1).page_number).to eq(nil)
|
60
58
|
|
61
59
|
expect(d.page(0)).to eq(nil)
|
62
60
|
expect(d.page(2)).to eq(nil)
|
@@ -64,16 +62,14 @@ describe Podoff::Document do
|
|
64
62
|
|
65
63
|
it 'returns pages from the last when the index is negative' do
|
66
64
|
|
67
|
-
expect(@d.page(-1).ref).to eq('
|
68
|
-
expect(@d.page(-1).page_number).to eq(3)
|
65
|
+
expect(@d.page(-1).ref).to eq('58 0')
|
69
66
|
end
|
70
67
|
|
71
|
-
it 'returns pages from the last when the index is negative (
|
68
|
+
it 'returns pages from the last when the index is negative (2)' do
|
72
69
|
|
73
|
-
d = Podoff::Document.load('pdfs/t2.pdf')
|
70
|
+
d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
|
74
71
|
|
75
72
|
expect(d.page(-1).ref).to eq('3 0')
|
76
|
-
expect(d.page(-1).page_number).to eq(nil)
|
77
73
|
end
|
78
74
|
end
|
79
75
|
|
@@ -86,6 +82,8 @@ describe Podoff::Document do
|
|
86
82
|
expect(d.class).to eq(Podoff::Document)
|
87
83
|
expect(d.hash).not_to eq(@d.hash)
|
88
84
|
|
85
|
+
expect(d.encoding).to eq('iso-8859-1')
|
86
|
+
|
89
87
|
expect(d.objs.hash).not_to eq(@d.objs.hash)
|
90
88
|
|
91
89
|
expect(d.objs.values.first.hash).not_to eq(@d.objs.values.first.hash)
|
@@ -95,7 +93,7 @@ describe Podoff::Document do
|
|
95
93
|
expect(d.objs.values.first.document).to equal(d)
|
96
94
|
expect(@d.objs.values.first.document).to equal(@d)
|
97
95
|
|
98
|
-
expect(d.root).to eq('
|
96
|
+
expect(d.root).to eq('1 0')
|
99
97
|
end
|
100
98
|
|
101
99
|
it 'sports objs with properly recomputed attributes' do
|
@@ -112,7 +110,7 @@ describe Podoff::Document do
|
|
112
110
|
|
113
111
|
before :each do
|
114
112
|
|
115
|
-
@d = Podoff.load('pdfs/t0.pdf')
|
113
|
+
@d = Podoff.load('pdfs/t0.pdf', 'utf-8')
|
116
114
|
end
|
117
115
|
|
118
116
|
describe '#add_base_font' do
|
@@ -132,9 +130,11 @@ describe Podoff::Document do
|
|
132
130
|
'7 0 obj <</Type /Font /Subtype /Type1 /BaseFont /Helvetica>> endobj')
|
133
131
|
|
134
132
|
s = @d.write(:string)
|
135
|
-
d = Podoff.parse(s)
|
133
|
+
d = Podoff.parse(s, 'utf-8')
|
134
|
+
|
135
|
+
expect(d.xref).to eq(686)
|
136
136
|
|
137
|
-
expect(
|
137
|
+
expect(s).to be_a_valid_pdf
|
138
138
|
end
|
139
139
|
|
140
140
|
it 'doesn\'t mind a slash in front of the font name' do
|
@@ -175,9 +175,13 @@ endstream
|
|
175
175
|
endobj
|
176
176
|
}.strip)
|
177
177
|
|
178
|
-
|
178
|
+
s = @d.write(:string)
|
179
|
+
|
180
|
+
expect(s).to be_a_valid_pdf
|
181
|
+
|
182
|
+
d = Podoff.parse(s, 'utf-8')
|
179
183
|
|
180
|
-
expect(d.xref).to eq(
|
184
|
+
expect(d.xref).to eq(711)
|
181
185
|
end
|
182
186
|
|
183
187
|
it 'accepts a block' do
|
@@ -202,10 +206,14 @@ endstream
|
|
202
206
|
endobj
|
203
207
|
}.strip)
|
204
208
|
|
205
|
-
|
209
|
+
s = @d.write(:string)
|
206
210
|
|
207
|
-
expect(
|
208
|
-
|
211
|
+
expect(s).to be_a_valid_pdf
|
212
|
+
|
213
|
+
d = Podoff.parse(s, 'utf-8')
|
214
|
+
|
215
|
+
expect(d.source.index('<</Length 97>>')).to eq(625)
|
216
|
+
expect(d.xref).to eq(763)
|
209
217
|
end
|
210
218
|
|
211
219
|
it 'returns the open stream when no arg given' do
|
@@ -250,12 +258,11 @@ endobj
|
|
250
258
|
|
251
259
|
it 'recomputes the attributes correctly' do
|
252
260
|
|
253
|
-
d = Podoff.load('pdfs/qdocument0.pdf')
|
261
|
+
d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
|
254
262
|
|
255
263
|
pa = d.re_add(d.page(1))
|
256
264
|
|
257
|
-
expect(pa.attributes).to eq(
|
258
|
-
{ type: '/Page', contents: '151 0 R', pagenum: '1' })
|
265
|
+
expect(pa.attributes).to eq({ type: '/Page', contents: '151 0 R' })
|
259
266
|
end
|
260
267
|
end
|
261
268
|
end
|
@@ -275,7 +282,7 @@ endobj
|
|
275
282
|
|
276
283
|
it 'writes open streams as well' do
|
277
284
|
|
278
|
-
d = Podoff.load('pdfs/t0.pdf')
|
285
|
+
d = Podoff.load('pdfs/t0.pdf', 'utf-8')
|
279
286
|
|
280
287
|
pa = d.re_add(d.page(1))
|
281
288
|
st = d.add_stream
|
@@ -293,12 +300,12 @@ BT 10 20 Td (hello open stream) Tj ET
|
|
293
300
|
endstream
|
294
301
|
endobj
|
295
302
|
}.strip)
|
296
|
-
).to eq(
|
303
|
+
).to eq(729)
|
297
304
|
end
|
298
305
|
|
299
306
|
it 'writes a proper xref table' do
|
300
307
|
|
301
|
-
d = Podoff.load('pdfs/t0.pdf')
|
308
|
+
d = Podoff.load('pdfs/t0.pdf', 'utf-8')
|
302
309
|
|
303
310
|
pa = d.re_add(d.page(1))
|
304
311
|
st = d.add_stream
|
@@ -307,21 +314,23 @@ endobj
|
|
307
314
|
|
308
315
|
s = d.write(:string)
|
309
316
|
|
310
|
-
expect(s
|
317
|
+
expect(s).to be_a_valid_pdf
|
318
|
+
|
319
|
+
expect(s[814..-1].strip).to eq(%{
|
311
320
|
xref
|
312
321
|
0 1
|
313
|
-
0000000000 65535 f
|
322
|
+
0000000000 65535 f
|
314
323
|
3 1
|
315
|
-
|
324
|
+
0000000617 00000 n
|
316
325
|
7 1
|
317
|
-
|
326
|
+
0000000729 00000 n
|
318
327
|
trailer
|
319
328
|
<<
|
320
|
-
/Prev
|
321
|
-
/Size
|
329
|
+
/Prev 413
|
330
|
+
/Size 8
|
322
331
|
/Root 1 0 R
|
323
332
|
>>
|
324
|
-
startxref
|
333
|
+
startxref 815
|
325
334
|
%%EOF
|
326
335
|
}.strip)
|
327
336
|
end
|
@@ -331,7 +340,7 @@ startxref 809
|
|
331
340
|
|
332
341
|
it 'rewrites a document in one go' do
|
333
342
|
|
334
|
-
d = Podoff.load('pdfs/t2.pdf')
|
343
|
+
d = Podoff.load('pdfs/t2.pdf', 'utf-8')
|
335
344
|
|
336
345
|
s = d.rewrite(:string)
|
337
346
|
|
@@ -361,23 +370,45 @@ endstream
|
|
361
370
|
endobj
|
362
371
|
xref
|
363
372
|
0 1
|
364
|
-
0000000000 65535 f
|
373
|
+
0000000000 65535 f
|
365
374
|
1 7
|
366
|
-
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
375
|
+
0000000009 00000 n
|
376
|
+
0000000056 00000 n
|
377
|
+
0000000111 00000 n
|
378
|
+
0000000221 00000 n
|
379
|
+
0000000260 00000 n
|
380
|
+
0000000328 00000 n
|
381
|
+
0000000419 00000 n
|
373
382
|
trailer
|
374
383
|
<<
|
375
|
-
/Size
|
384
|
+
/Size 8
|
376
385
|
/Root 1 0 R
|
377
386
|
>>
|
378
|
-
startxref
|
387
|
+
startxref 510
|
379
388
|
%%EOF
|
380
389
|
}.strip)
|
390
|
+
|
391
|
+
expect(s).to be_a_valid_pdf
|
392
|
+
end
|
393
|
+
end
|
394
|
+
|
395
|
+
describe '#extract_refs' do
|
396
|
+
|
397
|
+
it 'extracts a ref' do
|
398
|
+
|
399
|
+
expect(
|
400
|
+
Podoff::Document.allocate.send(:extract_refs, '17 0 R')
|
401
|
+
).to eq([ '17 0' ])
|
402
|
+
expect(
|
403
|
+
Podoff::Document.allocate.send(:extract_refs, ' 17 0 R')
|
404
|
+
).to eq([ '17 0' ])
|
405
|
+
end
|
406
|
+
|
407
|
+
it 'extracts a list of ref' do
|
408
|
+
|
409
|
+
expect(
|
410
|
+
Podoff::Document.allocate.send(:extract_refs, '[17 0 R 6 0 R]')
|
411
|
+
).to eq([ '17 0', '6 0' ])
|
381
412
|
end
|
382
413
|
end
|
383
414
|
end
|
data/spec/obj_spec.rb
CHANGED
@@ -12,7 +12,7 @@ describe Podoff::Obj do
|
|
12
12
|
|
13
13
|
before :all do
|
14
14
|
|
15
|
-
@d = Podoff.load('pdfs/udocument0.pdf')
|
15
|
+
@d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
|
16
16
|
end
|
17
17
|
|
18
18
|
describe '#document' do
|
@@ -30,7 +30,8 @@ describe Podoff::Obj do
|
|
30
30
|
o = @d.objs['20 0']
|
31
31
|
|
32
32
|
expect(o.source).to eq(%{
|
33
|
-
20 0 obj
|
33
|
+
20 0 obj
|
34
|
+
<< /DA (/Calibri,Bold 10 Tf 0 g) /F 4 /FT /Tx /MK << >> /P 58 0 R /Rect [ 448.723 652.574 490.603 667.749 ] /Subtype /Widget /T (State) /TU (State) /Type /Annot >>
|
34
35
|
endobj
|
35
36
|
}.strip)
|
36
37
|
end
|
@@ -86,12 +87,12 @@ endobj
|
|
86
87
|
|
87
88
|
it 'returns the type of the obj' do
|
88
89
|
|
89
|
-
expect(@d.objs['
|
90
|
+
expect(@d.objs['12 0'].type).to eq('/Font')
|
90
91
|
end
|
91
92
|
|
92
93
|
it 'returns nil if there is no type' do
|
93
94
|
|
94
|
-
expect(@d.objs['
|
95
|
+
expect(@d.objs['59 0'].type).to eq(nil)
|
95
96
|
end
|
96
97
|
|
97
98
|
it 'works on open streams' do
|
@@ -150,7 +151,7 @@ endobj
|
|
150
151
|
|
151
152
|
before :each do
|
152
153
|
|
153
|
-
@d = Podoff.load('pdfs/udocument0.pdf')
|
154
|
+
@d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
|
154
155
|
end
|
155
156
|
|
156
157
|
describe '#insert_contents' do
|
@@ -178,7 +179,7 @@ endobj
|
|
178
179
|
|
179
180
|
pa.insert_contents(st)
|
180
181
|
|
181
|
-
expect(pa.source).to match(/\/Contents \[
|
182
|
+
expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
|
182
183
|
end
|
183
184
|
|
184
185
|
it 'accepts an obj ref' do
|
@@ -189,7 +190,7 @@ endobj
|
|
189
190
|
|
190
191
|
pa.insert_contents(st.ref)
|
191
192
|
|
192
|
-
expect(pa.source).to match(/\/Contents \[
|
193
|
+
expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
|
193
194
|
end
|
194
195
|
end
|
195
196
|
|
@@ -237,14 +238,14 @@ endobj
|
|
237
238
|
|
238
239
|
it 'adds to a list of references' do
|
239
240
|
|
240
|
-
d = Podoff.load('pdfs/qdocument0.pdf')
|
241
|
+
d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
|
241
242
|
|
242
243
|
o = d.re_add('56 0')
|
243
244
|
|
244
245
|
o.send(:add_to_attribute, :contents, '9999 0')
|
245
246
|
|
246
247
|
expect(o.attributes).to eq(
|
247
|
-
{ type: '/Page', contents: '[151 0 R 9999 0 R]'
|
248
|
+
{ type: '/Page', contents: '[151 0 R 9999 0 R]' })
|
248
249
|
end
|
249
250
|
end
|
250
251
|
end
|
data/spec/spec_helper.rb
CHANGED
@@ -10,3 +10,69 @@ require 'ostruct'
|
|
10
10
|
|
11
11
|
require 'podoff'
|
12
12
|
|
13
|
+
|
14
|
+
RSpec::Matchers.define :be_encoded_as do |encoding|
|
15
|
+
|
16
|
+
match do |path|
|
17
|
+
|
18
|
+
fail ArgumentError.new("expecting a path (String) not a #{path.class}") \
|
19
|
+
unless path.is_a?(String)
|
20
|
+
|
21
|
+
$vic_r =
|
22
|
+
`(vim -c 'execute \"silent !echo \" . &fileencoding . " > _enc.txt" | q' #{path} > /dev/null 2>&1); cat _enc.txt; rm _enc.txt`.strip.downcase
|
23
|
+
|
24
|
+
$vic_r == encoding.downcase
|
25
|
+
end
|
26
|
+
|
27
|
+
failure_message do |path|
|
28
|
+
|
29
|
+
"expected #{encoding.downcase.inspect}, got #{$vic_r.to_s.inspect}"
|
30
|
+
end
|
31
|
+
end
|
32
|
+
|
33
|
+
|
34
|
+
RSpec::Matchers.define :be_a_valid_pdf do
|
35
|
+
|
36
|
+
match do |o|
|
37
|
+
|
38
|
+
path =
|
39
|
+
if /\A%PDF-\d/.match(o)
|
40
|
+
File.open('tmp/_under_check.pdf', 'wb') { |f| f.write(o) }
|
41
|
+
'tmp/_under_check.pdf'
|
42
|
+
else
|
43
|
+
o
|
44
|
+
end
|
45
|
+
|
46
|
+
file_cmd =
|
47
|
+
/darwin/.match(RUBY_PLATFORM) ? 'file -I' : 'file -i'
|
48
|
+
vim_cmd =
|
49
|
+
"vim -c 'execute \"silent !echo \" . &fileencoding | q'"
|
50
|
+
|
51
|
+
cmd = [
|
52
|
+
"echo '* vim :'",
|
53
|
+
"#{vim_cmd} #{path}",
|
54
|
+
"echo '* #{file_cmd} :'",
|
55
|
+
"#{file_cmd} #{path}",
|
56
|
+
"echo",
|
57
|
+
"qpdf --check #{path}"
|
58
|
+
]
|
59
|
+
$qpdf_r = `(#{cmd.join('; ')}) 2>&1`
|
60
|
+
`#{file_cmd} #{path}; echo; qpdf --check #{path} 2>&1`
|
61
|
+
|
62
|
+
$qpdf_r = "#{$qpdf_r}\nexit: #{$?.exitstatus}"
|
63
|
+
#puts "." * 80
|
64
|
+
#puts $qpdf_r
|
65
|
+
|
66
|
+
$qpdf_r.match(/exit: 0$/)
|
67
|
+
end
|
68
|
+
|
69
|
+
failure_message do |o|
|
70
|
+
|
71
|
+
%{
|
72
|
+
--- qpdf ---------------------------------------------------------------------->
|
73
|
+
#{$qpdf_r}
|
74
|
+
<-- qpdf -----------------------------------------------------------------------
|
75
|
+
}.strip
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: podoff
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.2.0
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2015-
|
12
|
+
date: 2015-11-11 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rake
|
@@ -52,6 +52,7 @@ extra_rdoc_files: []
|
|
52
52
|
files:
|
53
53
|
- Rakefile
|
54
54
|
- lib/podoff.rb
|
55
|
+
- spec/alpha_spec.rb
|
55
56
|
- spec/core_spec.rb
|
56
57
|
- spec/document_spec.rb
|
57
58
|
- spec/obj_spec.rb
|
@@ -61,6 +62,7 @@ files:
|
|
61
62
|
- podoff.gemspec
|
62
63
|
- CHANGELOG.txt
|
63
64
|
- LICENSE.txt
|
65
|
+
- out.txt
|
64
66
|
- todo.txt
|
65
67
|
- README.md
|
66
68
|
homepage: http://github.com/jmettraux/podoff
|