podoff 1.1.1 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG.txt CHANGED
@@ -2,6 +2,13 @@
2
2
  = podoff CHANGELOG.txt
3
3
 
4
4
 
5
+ == podoff 1.2.0 released 2015-11-11
6
+
7
+ - require encoding upon loading and parsing, introduce Document#encoding
8
+ - drop Podoff::Obj#page_number
9
+ - use /Kids in /Pages to determine pages and page order
10
+
11
+
5
12
  == podoff 1.1.1 released 2015-10-26
6
13
 
7
14
  - reworked xref table output
data/README.md CHANGED
@@ -6,17 +6,261 @@
6
6
 
7
7
  A Ruby tool to deface PDF documents.
8
8
 
9
+ Uses "incremental updates" to do so.
10
+
11
+ Podoff is used to write over PDF documents. Those documents should first be uncompressed (and recompressed) (how? see [below](#preparing-documents-for-use-with-podoff))
12
+
13
+ ```ruby
14
+ require 'podoff'
15
+
16
+ d = Podoff.load('d2.pdf')
17
+ # load my d2.pdf
18
+
19
+ fo = d.add_base_font('Helvetica')
20
+ # make sure the document knows about "Helvetica"
21
+ # (one of the base 13 or 14 fonts PDF readers know about)
22
+
23
+
24
+ pa = d.page(1)
25
+ # grab first page of the document
26
+
27
+ pa.insert_font('/MyHelvetica', fo)
28
+ # link "MyHelvetica" to the base font above for this page
29
+
30
+ st =
31
+ d.add_stream {
32
+ tf '/MyHelvetica', 12 # Helvetica size 12
33
+ bt 100, 100, "#{Time.now} stamped via podoff" # text at bottom left
34
+ }
35
+
36
+ pa.insert_content(st)
37
+ # add content to page
38
+
39
+ d.write('d3.pdf')
40
+ # write stamped document to d3.pdf
41
+ ```
42
+
43
+ For more about the podoff "api", read ["how I use podoff"](#how-i-use-podoff).
44
+
9
45
  If you're looking for serious libraries, look at
10
46
 
11
47
  * https://github.com/payrollhero/pdf_tempura
12
48
  * https://github.com/prawnpdf/prawn-templates
13
49
 
14
50
 
51
+ ## preparing documents for use with podoff
52
+
53
+ Podoff is naive and can't read xref tables in object streams. You have to work against PDF documents that have vanilla xref tables. [Qpdf](http://qpdf.sourceforge.net/) to the rescue.
54
+
55
+ Given a doc0.pdf you can produce such a document by doing:
56
+ ```
57
+ qpdf --object-streams=disable doc0.pdf doc1.pdf
58
+ ```
59
+ doc1.pdf is now ready for overwriting with podoff.
60
+
61
+ qpdf has rewritten the PDF, extracting the xref table but keeping the streams compressed.
62
+
63
+
64
+ ## bin/podoff
65
+
66
+ `bin/podoff` is a command-line tool for to preparing/check PDFs before use.
67
+
68
+ ```
69
+ $ ./bin/podoff -h
70
+
71
+ Usage: ./bin/podoff [option] {fname}
72
+
73
+ -o, --objs List objs
74
+ -w, --rewrite Rewrite
75
+ -s, --stamp Apply time stamp at bottom of each page
76
+ -r, --recompress Recompress
77
+ --version Show version
78
+ -h, --help Show this message
79
+ ```
80
+
81
+ `--recompress` is mostly an alias for `qpdf --object-streams=disable in.pdf out.pf`
82
+
83
+ `--stamp` is used to check whether podoff can add a time stamp on each page of an input PDF.
84
+
85
+
86
+ ## how I use podoff
87
+
88
+ In the application which necessitated the creation of podoff, there are two PDF to generate from time to time.
89
+
90
+ I keep those two PDFs in memory.
91
+
92
+ ```ruby
93
+ # lib/myapp/pdf.rb
94
+
95
+ require 'podoff'
96
+
97
+ module MyApp::Pdf
98
+
99
+ DOC0 = Podoff.load('pdf_templates/d0.pdf')
100
+ DOC1 = Podoff.load('pdf_templates/d1.pdf')
101
+
102
+ def generate_doc0(data, path)
103
+
104
+ d = DOC0.dup # shallow copy of the document
105
+ d.add_fonts
106
+
107
+ pa2 = d.page(2)
108
+ st = d.add_stream # open stream...
109
+
110
+ st.font 'MyHelv', 12 # font is an alias to tf
111
+ st.text 100, 100, data['customer_name']
112
+ st.text 100, 80, data['customer_phone']
113
+ st.text 100, 60, data['date'] if data['date']
114
+ # fill in customer info on page 2
115
+
116
+ pa2.insert_content(st) ... close stream (yes, you can use a block too)
117
+
118
+ pa3 = d.page(3)
119
+ pa3.insert_content(d.add_stream { check 52, 100 }) if data['discount']
120
+ # a single check on page 3 if the customer gets a discount
121
+
122
+ d.write(path)
123
+ end
124
+
125
+ # ...
126
+ end
127
+
128
+ module Podoff # adding a few helper methods to the podoff classes
129
+
130
+ class Document
131
+
132
+ # Makes sure Helvetica and ZapfDingbats are available
133
+ # on each page of the document
134
+ #
135
+ def add_fonts
136
+
137
+ fo0 = add_base_font('/Helvetica')
138
+ fo1 = add_base_font('/ZapfDingbats')
139
+
140
+ pages.each { |pa|
141
+ pa = re_add(pa)
142
+ pa.insert_font('/MyHelv', fo0)
143
+ pa.insert_font('/MyZapf', fo1)
144
+ }
145
+ end
146
+ end
147
+
148
+ class Stream
149
+
150
+ # Places a check mark ✓ at x, y
151
+ #
152
+ def check(x, y)
153
+
154
+ font = @font # save current font
155
+ self.tf '/MyZapf', 12 # switch to ZapfDingbats size 12
156
+ self.bt x, y, '3' # check mark
157
+ @font = font # get back to saved font
158
+ end
159
+ end
160
+ end
161
+ ```
162
+
163
+ The documents are kept in memory, as generation request comes, the get duplicated, incrementally updated and the filled documents are written to disk. The duplication doesn't copy the whole document file, only the references to the "obj" in the document get copied.
164
+
165
+ ### Podoff::Document
166
+
167
+ ```ruby
168
+ class Podoff::Document
169
+
170
+ def self.load(path, encoding='iso-8859-1')
171
+ # Podoff.load(path, encoding) is a shortcut to this method
172
+
173
+ def dup
174
+ # Makes a shallow copy of the document
175
+
176
+ def add_base_font(name)
177
+ # Given a name in the base 13/14 fonts readers are supposed to know,
178
+ # ensures the document has access to the font.
179
+ # Usually "Helvetica" or "ZapfDingbats".
180
+
181
+ def pages
182
+ # Returns an array of all the objs that are pages
183
+
184
+ def page(index)
185
+ # Starts at 1, returns a page obj. Understands negative indexes, like
186
+ # -1 for the last page.
187
+
188
+ def add_stream(src=nil, &block)
189
+ # Prepares a new obj with a stream
190
+ # If src is given places the src string in the stream.
191
+ # If a block is given executes the block in the context of the
192
+ # Podoff::Stream instance.
193
+ # If no src and no block, simply returns the Podoff::Stream wrapped inside
194
+ # of the new obj (see example code above)
195
+
196
+ def re_add(obj_or_ref)
197
+ # Given an obj or a ref (like "1234 0") to an obj, copies that obj
198
+ # and re-adds it to the document.
199
+ # This is necessary for the incremental updates podoff uses, if you add
200
+ # an obj to the Contents list of a page, you have to add it to the
201
+ # re-added page, not directly to the original page.
202
+
203
+ def write(path=:string)
204
+ # Writes the document, with incremental updates to a file given by its path.
205
+ # If the path is :string, will simply return the string containing the
206
+ # whole document
207
+
208
+ def rewrite(path=:string)
209
+ # Like #write, but squashes the incremental updates in the document.
210
+ # Takes more time and memory and might fail (remember, podoff is very
211
+ # naive (as his author is)). Test with care...
212
+
213
+ #
214
+ # a bit lower-level...
215
+
216
+ def objs
217
+ # returns the hash { String/obj_ref => Podoff::Obj/obj_instance }
218
+ ```
219
+
220
+ ### Podoff::Obj
221
+
222
+ A PDF document is mostly a hierarchy of `obj` elements. `Podoff::Obj` points to such elements (see `Podoff::Document#objs`).
223
+
224
+ ```ruby
225
+ class Podoff::Obj
226
+
227
+ def insert_font(font_nick, font_obj_or_ref)
228
+ def insert_contents(obj_or_ref)
229
+ ```
230
+
231
+ ### Podoff::Stream
232
+
233
+ TODO
234
+
235
+ ```ruby
236
+ class Podoff::Stream
237
+
238
+ def tf(font_name, font_size)
239
+ alias :font :tf
240
+
241
+ def bt(x, y, text)
242
+ alias :text :bt
243
+ ```
244
+
245
+
15
246
  ## disclaimer
16
247
 
17
248
  The author of this tool/library have no link whatsoever with the authors of the sample PDF documents found under `pdfs/`. Those documents have been selected because they are representative of the PDF forms podoff is meant to ~~deface~~fill.
18
249
 
19
250
 
251
+ ## known bugs
252
+
253
+ * podoff parsing is naive, documents that contain uncompressed streams with "endobj", "startxref", "/Root" will disorient podoff
254
+ * completely candid about encoding (only used it for British English documents so far)
255
+
256
+
257
+ ## links
258
+
259
+ * http://qpdf.sourceforge.net/ source: https://github.com/qpdf/qpdf
260
+
261
+ * http://www.slideshare.net/ange4771/advanced-pdf-tricks
262
+
263
+
20
264
  ## LICENSE
21
265
 
22
266
  MIT, see [LICENSE.txt](LICENSE.txt)
data/lib/podoff.rb CHANGED
@@ -30,23 +30,26 @@ require 'stringio'
30
30
 
31
31
  module Podoff
32
32
 
33
- VERSION = '1.1.1'
33
+ VERSION = '1.2.0'
34
34
 
35
- def self.load(path, encoding='iso-8859-1')
35
+ def self.load(path, encoding)
36
36
 
37
37
  Podoff::Document.load(path, encoding)
38
38
  end
39
39
 
40
- def self.parse(s)
40
+ def self.parse(s, encoding)
41
41
 
42
- Podoff::Document.new(s)
42
+ Podoff::Document.new(s, encoding)
43
43
  end
44
44
 
45
45
  class Document
46
46
 
47
- def self.load(path, encoding='iso-8859-1')
47
+ def self.load(path, encoding)
48
48
 
49
- Podoff::Document.new(File.open(path, 'r:' + encoding) { |f| f.read })
49
+ Podoff::Document.new(
50
+ File.open(path, 'r:' + encoding) { |f| f.read },
51
+ encoding
52
+ )
50
53
  end
51
54
 
52
55
  def self.parse(s)
@@ -54,6 +57,8 @@ module Podoff
54
57
  Podoff::Document.new(s)
55
58
  end
56
59
 
60
+ attr_reader :encoding
61
+
57
62
  attr_reader :scanner
58
63
  attr_reader :version
59
64
  attr_reader :xref
@@ -63,11 +68,13 @@ module Podoff
63
68
  #
64
69
  attr_reader :additions
65
70
 
66
- def initialize(s)
71
+ def initialize(s, encoding)
67
72
 
68
73
  fail ArgumentError.new('not a PDF file') \
69
74
  unless s.match(/\A%PDF-\d+\.\d+\s/)
70
75
 
76
+ @encoding = encoding
77
+
71
78
  @scanner = ::StringScanner.new(s)
72
79
  @version = nil
73
80
  @xref = nil
@@ -113,11 +120,6 @@ module Podoff
113
120
  @scanner.string
114
121
  end
115
122
 
116
- def extract_ref(s)
117
-
118
- s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
119
- end
120
-
121
123
  def updated?
122
124
 
123
125
  @additions.any?
@@ -129,6 +131,8 @@ module Podoff
129
131
 
130
132
  self.class.allocate.instance_eval do
131
133
 
134
+ @encoding = o.encoding
135
+
132
136
  @scanner = ::StringScanner.new(o.source)
133
137
  @xref = o.xref
134
138
 
@@ -146,26 +150,23 @@ module Podoff
146
150
 
147
151
  def pages
148
152
 
149
- @objs.values.select { |o| o.type == '/Page' }
150
- end
151
-
152
- def page(index)
153
+ #@objs.values.select { |o| o.type == '/Page' }
153
154
 
154
- return nil if index == 0
155
+ ps = @objs.values.find { |o| o.type == '/Pages' }
156
+ return nil unless ps
155
157
 
156
- pas = pages
157
- return nil if pas.empty?
158
+ extract_refs(ps.attributes[:kids]).collect { |r| @objs[r] }
159
+ end
158
160
 
159
- return (
160
- index > 0 ? pas.at(index - 1) : pas.at(index)
161
- ) unless pas.first.attributes[:pagenum]
161
+ def page(index)
162
162
 
163
163
  if index < 0
164
- max = pas.inject(0) { |n, pa| [ n, pa.page_number ].max }
165
- index = max + 1 + index
164
+ pages[index]
165
+ elsif index == 0
166
+ nil
167
+ else
168
+ pages[index - 1]
166
169
  end
167
-
168
- pas.find { |pa| pa.page_number == index }
169
170
  end
170
171
 
171
172
  def new_ref
@@ -224,7 +225,9 @@ module Podoff
224
225
  add(obj)
225
226
  end
226
227
 
227
- def write(path)
228
+ def write(path=:string, encoding=nil)
229
+
230
+ encoding ||= @encoding
228
231
 
229
232
  f =
230
233
  case path
@@ -232,6 +235,8 @@ module Podoff
232
235
  when String then File.open(path, 'wb')
233
236
  else path
234
237
  end
238
+ f.set_encoding(encoding) # internal encoding: nil
239
+ #f.set_encoding(encoding, encoding)
235
240
 
236
241
  f.write(source)
237
242
 
@@ -241,19 +246,19 @@ module Podoff
241
246
 
242
247
  @additions.values.each do |o|
243
248
  f.write("\n")
244
- pointers[o.ref.split(' ').first.to_i] = f.pos + 1
245
- f.write(o.to_s)
249
+ pointers[o.ref.split(' ').first.to_i] = f.pos
250
+ f.write(o.to_s.force_encoding(encoding))
246
251
  end
247
252
  f.write("\n\n")
248
253
 
249
- xref = f.pos + 1
254
+ xref = f.pos
250
255
 
251
256
  write_xref(f, pointers)
252
257
 
253
258
  f.write("trailer\n")
254
259
  f.write("<<\n")
255
260
  f.write("/Prev #{self.xref}\n")
256
- f.write("/Size #{objs.size}\n")
261
+ f.write("/Size #{objs.size + 1}\n")
257
262
  f.write("/Root #{root} R\n")
258
263
  f.write(">>\n")
259
264
  f.write("startxref #{xref}\n")
@@ -265,7 +270,9 @@ module Podoff
265
270
  f.is_a?(StringIO) ? f.string : nil
266
271
  end
267
272
 
268
- def rewrite(path=:string)
273
+ def rewrite(path=:string, encoding=nil)
274
+
275
+ encoding ||= @encoding
269
276
 
270
277
  f =
271
278
  case path
@@ -273,6 +280,7 @@ module Podoff
273
280
  when String then File.open(path, 'wb')
274
281
  else path
275
282
  end
283
+ f.set_encoding(encoding)
276
284
 
277
285
  v = source.match(/%PDF-\d+\.\d+/)[0]
278
286
  f.write(v)
@@ -281,18 +289,18 @@ module Podoff
281
289
  pointers = {}
282
290
 
283
291
  objs.keys.sort.each do |k|
284
- pointers[k.split(' ').first.to_i] = f.pos + 1
285
- f.write(objs[k].source)
292
+ pointers[k.split(' ').first.to_i] = f.pos
293
+ f.write(objs[k].source.force_encoding(encoding))
286
294
  f.write("\n")
287
295
  end
288
296
 
289
- xref = f.pos + 1
297
+ xref = f.pos
290
298
 
291
299
  write_xref(f, pointers)
292
300
 
293
301
  f.write("trailer\n")
294
302
  f.write("<<\n")
295
- f.write("/Size #{objs.size}\n")
303
+ f.write("/Size #{objs.size + 1}\n")
296
304
  f.write("/Root #{root} R\n")
297
305
  f.write(">>\n")
298
306
  f.write("startxref #{xref}\n")
@@ -309,7 +317,7 @@ module Podoff
309
317
 
310
318
  f.write("xref\n")
311
319
  f.write("0 1\n")
312
- f.write("0000000000 65535 f\n")
320
+ f.write("0000000000 65535 f \n")
313
321
 
314
322
  pointers
315
323
  .keys
@@ -321,7 +329,7 @@ module Podoff
321
329
  }
322
330
  .each { |part|
323
331
  f.write("#{part.first} #{part.size}\n")
324
- part.each { |k| f.write(sprintf("%010d 00000 n\n", pointers[k])) }
332
+ part.each { |k| f.write(sprintf("%010d 00000 n \n", pointers[k])) }
325
333
  }
326
334
  end
327
335
 
@@ -332,12 +340,21 @@ module Podoff
332
340
 
333
341
  s
334
342
  end
343
+
344
+ def extract_ref(s)
345
+
346
+ s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
347
+ end
348
+
349
+ def extract_refs(s)
350
+
351
+ s.gsub(/\s+/, ' ').scan(/(\d+ \d+) R/).collect(&:first)
352
+ end
335
353
  end
336
354
 
337
355
  class Obj
338
356
 
339
- ATTRIBUTES =
340
- { type: 'Type', contents: 'Contents', pagenum: 'pdftk_PageNum' }
357
+ ATTRIBUTES = { type: 'Type', contents: 'Contents', kids: 'Kids' }
341
358
 
342
359
  def self.extract(doc)
343
360
 
@@ -413,12 +430,6 @@ module Podoff
413
430
  @attributes && @attributes[:type]
414
431
  end
415
432
 
416
- def page_number
417
-
418
- r = @attributes && @attributes[:pagenum]
419
- r ? r.to_i : nil
420
- end
421
-
422
433
  def insert_font(nick, obj_or_ref)
423
434
 
424
435
  fail ArgumentError.new("target '#{ref}' not a replica") \
data/out.txt ADDED
@@ -0,0 +1 @@
1
+ utf-8
@@ -0,0 +1,40 @@
1
+
2
+ #
3
+ # specifying podoff
4
+ #
5
+ # Tue Nov 10 21:01:51 JST 2015
6
+ #
7
+
8
+ require 'spec_helper'
9
+
10
+
11
+ describe 'fixtures:' do
12
+
13
+ Dir['pdfs/*.pdf'].each do |path|
14
+
15
+ describe path do
16
+
17
+ it 'is a valid pdf document' do
18
+
19
+ expect(path).to be_a_valid_pdf
20
+ end
21
+ end
22
+ end
23
+
24
+ describe 'pdfs/t0.pdf' do
25
+
26
+ it 'is encoded as UTF-8' do
27
+
28
+ expect('pdfs/t0.pdf').to be_encoded_as('utf-8')
29
+ end
30
+ end
31
+
32
+ describe 'pdfs/udocument0.pdf' do
33
+
34
+ it 'is encoded as ISO-8859-1' do
35
+
36
+ expect('pdfs/udocument0.pdf').to be_encoded_as('latin1')
37
+ end
38
+ end
39
+ end
40
+
data/spec/core_spec.rb CHANGED
@@ -14,11 +14,11 @@ describe Podoff do
14
14
 
15
15
  it 'loads a PDF document' do
16
16
 
17
- d = Podoff.load('pdfs/t0.pdf')
17
+ d = Podoff.load('pdfs/t0.pdf', 'utf-8')
18
18
 
19
19
  expect(d.class).to eq(Podoff::Document)
20
20
  expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
21
- expect(d.xref).to eq(414)
21
+ expect(d.xref).to eq(413)
22
22
 
23
23
  #pp d.objs.values.collect(&:to_a)
24
24
 
@@ -41,25 +41,25 @@ describe Podoff do
41
41
 
42
42
  it 'loads a PDF document' do
43
43
 
44
- d = Podoff.load('pdfs/udocument0.pdf')
44
+ d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
45
45
 
46
46
  expect(d.class).to eq(Podoff::Document)
47
- expect(d.xref).to eq(3138351)
47
+ expect(d.xref).to eq(1612815)
48
48
  expect(d.objs.size).to eq(273)
49
49
  expect(d.objs.keys).to include('1 0')
50
50
  expect(d.objs.keys).to include('273 0')
51
51
 
52
- expect(d.root).to eq('65 0')
52
+ expect(d.root).to eq('1 0')
53
53
 
54
54
  expect(d.pages.size).to eq(3)
55
55
  end
56
56
 
57
57
  it 'loads a PDF document with incremental updates' do
58
58
 
59
- d = Podoff.load('pdfs/t1.pdf')
59
+ d = Podoff.load('pdfs/t1.pdf', 'utf-8')
60
60
 
61
61
  expect(d.class).to eq(Podoff::Document)
62
- expect(d.xref).to eq(698)
62
+ expect(d.xref).to eq(704)
63
63
  expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
64
64
 
65
65
  expect(d.obj_counters.keys).to eq(
@@ -72,7 +72,7 @@ describe Podoff do
72
72
 
73
73
  it 'loads a [re]compressed PDF documents' do
74
74
 
75
- d = Podoff.load('pdfs/qdocument0.pdf')
75
+ d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
76
76
 
77
77
  expect(d.class).to eq(Podoff::Document)
78
78
  expect(d.xref).to eq(1612815)
@@ -85,14 +85,13 @@ describe Podoff do
85
85
  #end
86
86
 
87
87
  expect(d.pages.size).to eq(3)
88
- expect(d.pages.first.attributes[:pagenum]).to eq('1')
89
88
  expect(d.objs['46 0'].attributes[:type]).to eq('/Annot')
90
89
  end
91
90
 
92
91
  it 'rejects items that are not PDF documents' do
93
92
 
94
93
  expect {
95
- Podoff.load('spec/spec_helper.rb')
94
+ Podoff.load('spec/spec_helper.rb', 'utf-8')
96
95
  }.to raise_error(ArgumentError, 'not a PDF file')
97
96
  end
98
97
  end
@@ -12,7 +12,7 @@ describe Podoff::Document do
12
12
 
13
13
  before :all do
14
14
 
15
- @d = Podoff.load('pdfs/udocument0.pdf')
15
+ @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
16
16
  end
17
17
 
18
18
  describe '#objs' do
@@ -39,10 +39,9 @@ describe Podoff::Document do
39
39
  it 'returns a page given an index (starts at 1)' do
40
40
 
41
41
  p = @d.page(1)
42
+ expect(p.ref).to eq('56 0')
42
43
  expect(p.class).to eq(Podoff::Obj)
43
44
  expect(p.type).to eq('/Page')
44
- expect(p.attributes[:pagenum]).to eq('1')
45
- expect(p.page_number).to eq(1)
46
45
  end
47
46
 
48
47
  it 'returns nil if the page doesn\'t exist' do
@@ -51,12 +50,11 @@ describe Podoff::Document do
51
50
  expect(@d.page(9)).to eq(nil)
52
51
  end
53
52
 
54
- it 'returns the page, even for a doc without pdftk_PageNum' do
53
+ it 'returns a page given an index (starts at 1) (2)' do
55
54
 
56
- d = Podoff::Document.load('pdfs/t2.pdf')
55
+ d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
57
56
 
58
57
  expect(d.page(1).ref).to eq('3 0')
59
- expect(d.page(1).page_number).to eq(nil)
60
58
 
61
59
  expect(d.page(0)).to eq(nil)
62
60
  expect(d.page(2)).to eq(nil)
@@ -64,16 +62,14 @@ describe Podoff::Document do
64
62
 
65
63
  it 'returns pages from the last when the index is negative' do
66
64
 
67
- expect(@d.page(-1).ref).to eq('33 0')
68
- expect(@d.page(-1).page_number).to eq(3)
65
+ expect(@d.page(-1).ref).to eq('58 0')
69
66
  end
70
67
 
71
- it 'returns pages from the last when the index is negative (no PageNum)' do
68
+ it 'returns pages from the last when the index is negative (2)' do
72
69
 
73
- d = Podoff::Document.load('pdfs/t2.pdf')
70
+ d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
74
71
 
75
72
  expect(d.page(-1).ref).to eq('3 0')
76
- expect(d.page(-1).page_number).to eq(nil)
77
73
  end
78
74
  end
79
75
 
@@ -86,6 +82,8 @@ describe Podoff::Document do
86
82
  expect(d.class).to eq(Podoff::Document)
87
83
  expect(d.hash).not_to eq(@d.hash)
88
84
 
85
+ expect(d.encoding).to eq('iso-8859-1')
86
+
89
87
  expect(d.objs.hash).not_to eq(@d.objs.hash)
90
88
 
91
89
  expect(d.objs.values.first.hash).not_to eq(@d.objs.values.first.hash)
@@ -95,7 +93,7 @@ describe Podoff::Document do
95
93
  expect(d.objs.values.first.document).to equal(d)
96
94
  expect(@d.objs.values.first.document).to equal(@d)
97
95
 
98
- expect(d.root).to eq('65 0')
96
+ expect(d.root).to eq('1 0')
99
97
  end
100
98
 
101
99
  it 'sports objs with properly recomputed attributes' do
@@ -112,7 +110,7 @@ describe Podoff::Document do
112
110
 
113
111
  before :each do
114
112
 
115
- @d = Podoff.load('pdfs/t0.pdf')
113
+ @d = Podoff.load('pdfs/t0.pdf', 'utf-8')
116
114
  end
117
115
 
118
116
  describe '#add_base_font' do
@@ -132,9 +130,11 @@ describe Podoff::Document do
132
130
  '7 0 obj <</Type /Font /Subtype /Type1 /BaseFont /Helvetica>> endobj')
133
131
 
134
132
  s = @d.write(:string)
135
- d = Podoff.parse(s)
133
+ d = Podoff.parse(s, 'utf-8')
134
+
135
+ expect(d.xref).to eq(686)
136
136
 
137
- expect(d.xref).to eq(680)
137
+ expect(s).to be_a_valid_pdf
138
138
  end
139
139
 
140
140
  it 'doesn\'t mind a slash in front of the font name' do
@@ -175,9 +175,13 @@ endstream
175
175
  endobj
176
176
  }.strip)
177
177
 
178
- d = Podoff.parse(@d.write(:string))
178
+ s = @d.write(:string)
179
+
180
+ expect(s).to be_a_valid_pdf
181
+
182
+ d = Podoff.parse(s, 'utf-8')
179
183
 
180
- expect(d.xref).to eq(705)
184
+ expect(d.xref).to eq(711)
181
185
  end
182
186
 
183
187
  it 'accepts a block' do
@@ -202,10 +206,14 @@ endstream
202
206
  endobj
203
207
  }.strip)
204
208
 
205
- d = Podoff.parse(@d.write(:string))
209
+ s = @d.write(:string)
206
210
 
207
- expect(d.source.index('<</Length 97>>')).to eq(618)
208
- expect(d.xref).to eq(757)
211
+ expect(s).to be_a_valid_pdf
212
+
213
+ d = Podoff.parse(s, 'utf-8')
214
+
215
+ expect(d.source.index('<</Length 97>>')).to eq(625)
216
+ expect(d.xref).to eq(763)
209
217
  end
210
218
 
211
219
  it 'returns the open stream when no arg given' do
@@ -250,12 +258,11 @@ endobj
250
258
 
251
259
  it 'recomputes the attributes correctly' do
252
260
 
253
- d = Podoff.load('pdfs/qdocument0.pdf')
261
+ d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
254
262
 
255
263
  pa = d.re_add(d.page(1))
256
264
 
257
- expect(pa.attributes).to eq(
258
- { type: '/Page', contents: '151 0 R', pagenum: '1' })
265
+ expect(pa.attributes).to eq({ type: '/Page', contents: '151 0 R' })
259
266
  end
260
267
  end
261
268
  end
@@ -275,7 +282,7 @@ endobj
275
282
 
276
283
  it 'writes open streams as well' do
277
284
 
278
- d = Podoff.load('pdfs/t0.pdf')
285
+ d = Podoff.load('pdfs/t0.pdf', 'utf-8')
279
286
 
280
287
  pa = d.re_add(d.page(1))
281
288
  st = d.add_stream
@@ -293,12 +300,12 @@ BT 10 20 Td (hello open stream) Tj ET
293
300
  endstream
294
301
  endobj
295
302
  }.strip)
296
- ).to eq(722)
303
+ ).to eq(729)
297
304
  end
298
305
 
299
306
  it 'writes a proper xref table' do
300
307
 
301
- d = Podoff.load('pdfs/t0.pdf')
308
+ d = Podoff.load('pdfs/t0.pdf', 'utf-8')
302
309
 
303
310
  pa = d.re_add(d.page(1))
304
311
  st = d.add_stream
@@ -307,21 +314,23 @@ endobj
307
314
 
308
315
  s = d.write(:string)
309
316
 
310
- expect(s[808..-1].strip).to eq(%{
317
+ expect(s).to be_a_valid_pdf
318
+
319
+ expect(s[814..-1].strip).to eq(%{
311
320
  xref
312
321
  0 1
313
- 0000000000 65535 f
322
+ 0000000000 65535 f
314
323
  3 1
315
- 0000000611 00000 n
324
+ 0000000617 00000 n
316
325
  7 1
317
- 0000000723 00000 n
326
+ 0000000729 00000 n
318
327
  trailer
319
328
  <<
320
- /Prev 414
321
- /Size 7
329
+ /Prev 413
330
+ /Size 8
322
331
  /Root 1 0 R
323
332
  >>
324
- startxref 809
333
+ startxref 815
325
334
  %%EOF
326
335
  }.strip)
327
336
  end
@@ -331,7 +340,7 @@ startxref 809
331
340
 
332
341
  it 'rewrites a document in one go' do
333
342
 
334
- d = Podoff.load('pdfs/t2.pdf')
343
+ d = Podoff.load('pdfs/t2.pdf', 'utf-8')
335
344
 
336
345
  s = d.rewrite(:string)
337
346
 
@@ -361,23 +370,45 @@ endstream
361
370
  endobj
362
371
  xref
363
372
  0 1
364
- 0000000000 65535 f
373
+ 0000000000 65535 f
365
374
  1 7
366
- 0000000010 00000 n
367
- 0000000057 00000 n
368
- 0000000112 00000 n
369
- 0000000222 00000 n
370
- 0000000261 00000 n
371
- 0000000329 00000 n
372
- 0000000420 00000 n
375
+ 0000000009 00000 n
376
+ 0000000056 00000 n
377
+ 0000000111 00000 n
378
+ 0000000221 00000 n
379
+ 0000000260 00000 n
380
+ 0000000328 00000 n
381
+ 0000000419 00000 n
373
382
  trailer
374
383
  <<
375
- /Size 7
384
+ /Size 8
376
385
  /Root 1 0 R
377
386
  >>
378
- startxref 511
387
+ startxref 510
379
388
  %%EOF
380
389
  }.strip)
390
+
391
+ expect(s).to be_a_valid_pdf
392
+ end
393
+ end
394
+
395
+ describe '#extract_refs' do
396
+
397
+ it 'extracts a ref' do
398
+
399
+ expect(
400
+ Podoff::Document.allocate.send(:extract_refs, '17 0 R')
401
+ ).to eq([ '17 0' ])
402
+ expect(
403
+ Podoff::Document.allocate.send(:extract_refs, ' 17 0 R')
404
+ ).to eq([ '17 0' ])
405
+ end
406
+
407
+ it 'extracts a list of ref' do
408
+
409
+ expect(
410
+ Podoff::Document.allocate.send(:extract_refs, '[17 0 R 6 0 R]')
411
+ ).to eq([ '17 0', '6 0' ])
381
412
  end
382
413
  end
383
414
  end
data/spec/obj_spec.rb CHANGED
@@ -12,7 +12,7 @@ describe Podoff::Obj do
12
12
 
13
13
  before :all do
14
14
 
15
- @d = Podoff.load('pdfs/udocument0.pdf')
15
+ @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
16
16
  end
17
17
 
18
18
  describe '#document' do
@@ -30,7 +30,8 @@ describe Podoff::Obj do
30
30
  o = @d.objs['20 0']
31
31
 
32
32
  expect(o.source).to eq(%{
33
- 20 0 obj [21 0 R]
33
+ 20 0 obj
34
+ << /DA (/Calibri,Bold 10 Tf 0 g) /F 4 /FT /Tx /MK << >> /P 58 0 R /Rect [ 448.723 652.574 490.603 667.749 ] /Subtype /Widget /T (State) /TU (State) /Type /Annot >>
34
35
  endobj
35
36
  }.strip)
36
37
  end
@@ -86,12 +87,12 @@ endobj
86
87
 
87
88
  it 'returns the type of the obj' do
88
89
 
89
- expect(@d.objs['23 0'].type).to eq('/Font')
90
+ expect(@d.objs['12 0'].type).to eq('/Font')
90
91
  end
91
92
 
92
93
  it 'returns nil if there is no type' do
93
94
 
94
- expect(@d.objs['17 0'].type).to eq(nil)
95
+ expect(@d.objs['59 0'].type).to eq(nil)
95
96
  end
96
97
 
97
98
  it 'works on open streams' do
@@ -150,7 +151,7 @@ endobj
150
151
 
151
152
  before :each do
152
153
 
153
- @d = Podoff.load('pdfs/udocument0.pdf')
154
+ @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
154
155
  end
155
156
 
156
157
  describe '#insert_contents' do
@@ -178,7 +179,7 @@ endobj
178
179
 
179
180
  pa.insert_contents(st)
180
181
 
181
- expect(pa.source).to match(/\/Contents \[3 0 R #{st.ref} R\]\n/)
182
+ expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
182
183
  end
183
184
 
184
185
  it 'accepts an obj ref' do
@@ -189,7 +190,7 @@ endobj
189
190
 
190
191
  pa.insert_contents(st.ref)
191
192
 
192
- expect(pa.source).to match(/\/Contents \[3 0 R #{st.ref} R\]\n/)
193
+ expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
193
194
  end
194
195
  end
195
196
 
@@ -237,14 +238,14 @@ endobj
237
238
 
238
239
  it 'adds to a list of references' do
239
240
 
240
- d = Podoff.load('pdfs/qdocument0.pdf')
241
+ d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
241
242
 
242
243
  o = d.re_add('56 0')
243
244
 
244
245
  o.send(:add_to_attribute, :contents, '9999 0')
245
246
 
246
247
  expect(o.attributes).to eq(
247
- { type: '/Page', contents: '[151 0 R 9999 0 R]', pagenum: '1' })
248
+ { type: '/Page', contents: '[151 0 R 9999 0 R]' })
248
249
  end
249
250
  end
250
251
  end
data/spec/spec_helper.rb CHANGED
@@ -10,3 +10,69 @@ require 'ostruct'
10
10
 
11
11
  require 'podoff'
12
12
 
13
+
14
+ RSpec::Matchers.define :be_encoded_as do |encoding|
15
+
16
+ match do |path|
17
+
18
+ fail ArgumentError.new("expecting a path (String) not a #{path.class}") \
19
+ unless path.is_a?(String)
20
+
21
+ $vic_r =
22
+ `(vim -c 'execute \"silent !echo \" . &fileencoding . " > _enc.txt" | q' #{path} > /dev/null 2>&1); cat _enc.txt; rm _enc.txt`.strip.downcase
23
+
24
+ $vic_r == encoding.downcase
25
+ end
26
+
27
+ failure_message do |path|
28
+
29
+ "expected #{encoding.downcase.inspect}, got #{$vic_r.to_s.inspect}"
30
+ end
31
+ end
32
+
33
+
34
+ RSpec::Matchers.define :be_a_valid_pdf do
35
+
36
+ match do |o|
37
+
38
+ path =
39
+ if /\A%PDF-\d/.match(o)
40
+ File.open('tmp/_under_check.pdf', 'wb') { |f| f.write(o) }
41
+ 'tmp/_under_check.pdf'
42
+ else
43
+ o
44
+ end
45
+
46
+ file_cmd =
47
+ /darwin/.match(RUBY_PLATFORM) ? 'file -I' : 'file -i'
48
+ vim_cmd =
49
+ "vim -c 'execute \"silent !echo \" . &fileencoding | q'"
50
+
51
+ cmd = [
52
+ "echo '* vim :'",
53
+ "#{vim_cmd} #{path}",
54
+ "echo '* #{file_cmd} :'",
55
+ "#{file_cmd} #{path}",
56
+ "echo",
57
+ "qpdf --check #{path}"
58
+ ]
59
+ $qpdf_r = `(#{cmd.join('; ')}) 2>&1`
60
+ `#{file_cmd} #{path}; echo; qpdf --check #{path} 2>&1`
61
+
62
+ $qpdf_r = "#{$qpdf_r}\nexit: #{$?.exitstatus}"
63
+ #puts "." * 80
64
+ #puts $qpdf_r
65
+
66
+ $qpdf_r.match(/exit: 0$/)
67
+ end
68
+
69
+ failure_message do |o|
70
+
71
+ %{
72
+ --- qpdf ---------------------------------------------------------------------->
73
+ #{$qpdf_r}
74
+ <-- qpdf -----------------------------------------------------------------------
75
+ }.strip
76
+ end
77
+ end
78
+
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: podoff
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.1
4
+ version: 1.2.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-10-25 00:00:00.000000000 Z
12
+ date: 2015-11-11 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
@@ -52,6 +52,7 @@ extra_rdoc_files: []
52
52
  files:
53
53
  - Rakefile
54
54
  - lib/podoff.rb
55
+ - spec/alpha_spec.rb
55
56
  - spec/core_spec.rb
56
57
  - spec/document_spec.rb
57
58
  - spec/obj_spec.rb
@@ -61,6 +62,7 @@ files:
61
62
  - podoff.gemspec
62
63
  - CHANGELOG.txt
63
64
  - LICENSE.txt
65
+ - out.txt
64
66
  - todo.txt
65
67
  - README.md
66
68
  homepage: http://github.com/jmettraux/podoff