RubyGems - podoff - Versions diffs - 1.1.1 → 1.2.0 - Mend

podoff 1.1.1 → 1.2.0

Files changed (10) hide show

data/CHANGELOG.txt CHANGED Viewed

@@ -2,6 +2,13 @@
 = podoff CHANGELOG.txt
+== podoff 1.2.0  released 2015-11-11
+- require encoding upon loading and parsing, introduce Document#encoding
+- drop Podoff::Obj#page_number
+- use /Kids in /Pages to determine pages and page order
 == podoff 1.1.1  released 2015-10-26
 - reworked xref table output

data/README.md CHANGED Viewed

@@ -6,17 +6,261 @@
 A Ruby tool to deface PDF documents.
+Uses "incremental updates" to do so.
+Podoff is used to write over PDF documents. Those documents should first be uncompressed (and recompressed) (how? see [below](#preparing-documents-for-use-with-podoff))
+```ruby
+require 'podoff'
+d = Podoff.load('d2.pdf')
+  # load my d2.pdf
+fo = d.add_base_font('Helvetica')
+  # make sure the document knows about "Helvetica"
+  # (one of the base 13 or 14 fonts PDF readers know about)
+pa = d.page(1)
+  # grab first page of the document
+pa.insert_font('/MyHelvetica', fo)
+  # link "MyHelvetica" to the base font above for this page
+st =
+  d.add_stream {
+    tf '/MyHelvetica', 12 # Helvetica size 12
+    bt 100, 100, "#{Time.now} stamped via podoff" # text at bottom left
+  }
+pa.insert_content(st)
+  # add content to page
+d.write('d3.pdf')
+  # write stamped document to d3.pdf
+```
+For more about the podoff "api", read ["how I use podoff"](#how-i-use-podoff).
 If you're looking for serious libraries, look at
 * https://github.com/payrollhero/pdf_tempura
 * https://github.com/prawnpdf/prawn-templates
+## preparing documents for use with podoff
+Podoff is naive and can't read xref tables in object streams. You have to work against PDF documents that have vanilla xref tables. [Qpdf](http://qpdf.sourceforge.net/) to the rescue.
+Given a doc0.pdf you can produce such a document by doing:
+```
+qpdf --object-streams=disable doc0.pdf doc1.pdf
+```
+doc1.pdf is now ready for overwriting with podoff.
+qpdf has rewritten the PDF, extracting the xref table but keeping the streams compressed.
+## bin/podoff
+`bin/podoff` is a command-line tool for to preparing/check PDFs before use.
+```
+$ ./bin/podoff -h
+Usage: ./bin/podoff [option] {fname}
+    -o, --objs                       List objs
+    -w, --rewrite                    Rewrite
+    -s, --stamp                      Apply time stamp at bottom of each page
+    -r, --recompress                 Recompress
+    --version                        Show version
+    -h, --help                       Show this message
+```
+`--recompress` is mostly an alias for `qpdf --object-streams=disable in.pdf out.pf`
+`--stamp` is used to check whether podoff can add a time stamp on each page of an input PDF.
+## how I use podoff
+In the application which necessitated the creation of podoff, there are two PDF to generate from time to time.
+I keep those two PDFs in memory.
+```ruby
+# lib/myapp/pdf.rb
+require 'podoff'
+module MyApp::Pdf
+  DOC0 = Podoff.load('pdf_templates/d0.pdf')
+  DOC1 = Podoff.load('pdf_templates/d1.pdf')
+  def generate_doc0(data, path)
+    d = DOC0.dup # shallow copy of the document
+    d.add_fonts
+    pa2 = d.page(2)
+    st = d.add_stream # open stream...
+    st.font 'MyHelv', 12 # font is an alias to tf
+    st.text 100, 100, data['customer_name']
+    st.text 100, 80, data['customer_phone']
+    st.text 100, 60, data['date'] if data['date']
+      # fill in customer info on page 2
+    pa2.insert_content(st) ... close stream (yes, you can use a block too)
+    pa3 = d.page(3)
+    pa3.insert_content(d.add_stream { check 52, 100 }) if data['discount']
+      # a single check on page 3 if the customer gets a discount
+    d.write(path)
+  end
+  # ...
+end
+module Podoff # adding a few helper methods to the podoff classes
+  class Document
+    # Makes sure Helvetica and ZapfDingbats are available
+    # on each page of the document
+    #
+    def add_fonts
+      fo0 = add_base_font('/Helvetica')
+      fo1 = add_base_font('/ZapfDingbats')
+      pages.each { |pa|
+        pa = re_add(pa)
+        pa.insert_font('/MyHelv', fo0)
+        pa.insert_font('/MyZapf', fo1)
+      }
+    end
+  end
+  class Stream
+    # Places a check mark ✓ at x, y
+    #
+    def check(x, y)
+      font = @font            # save current font
+      self.tf '/MyZapf', 12   # switch to ZapfDingbats size 12
+      self.bt x, y, '3'       # check mark
+      @font = font            # get back to saved font
+    end
+  end
+end
+```
+The documents are kept in memory, as generation request comes, the get duplicated, incrementally updated and the filled documents are written to disk. The duplication doesn't copy the whole document file, only the references to the "obj" in the document get copied.
+### Podoff::Document
+```ruby
+class Podoff::Document
+  def self.load(path, encoding='iso-8859-1')
+    # Podoff.load(path, encoding) is a shortcut to this method
+  def dup
+    # Makes a shallow copy of the document
+  def add_base_font(name)
+    # Given a name in the base 13/14 fonts readers are supposed to know,
+    # ensures the document has access to the font.
+    # Usually "Helvetica" or "ZapfDingbats".
+  def pages
+    # Returns an array of all the objs that are pages
+  def page(index)
+    # Starts at 1, returns a page obj. Understands negative indexes, like
+    # -1 for the last page.
+  def add_stream(src=nil, &block)
+    # Prepares a new obj with a stream
+    # If src is given places the src string in the stream.
+    # If a block is given executes the block in the context of the
+    # Podoff::Stream instance.
+    # If no src and no block, simply returns the Podoff::Stream wrapped inside
+    # of the new obj (see example code above)
+  def re_add(obj_or_ref)
+    # Given an obj or a ref (like "1234 0") to an obj, copies that obj
+    # and re-adds it to the document.
+    # This is necessary for the incremental updates podoff uses, if you add
+    # an obj to the Contents list of a page, you have to add it to the
+    # re-added page, not directly to the original page.
+  def write(path=:string)
+    # Writes the document, with incremental updates to a file given by its path.
+    # If the path is :string, will simply return the string containing the
+    # whole document
+  def rewrite(path=:string)
+    # Like #write, but squashes the incremental updates in the document.
+    # Takes more time and memory and might fail (remember, podoff is very
+    # naive (as his author is)). Test with care...
+  #
+  # a bit lower-level...
+  def objs
+    # returns the hash { String/obj_ref => Podoff::Obj/obj_instance }
+```
+### Podoff::Obj
+A PDF document is mostly a hierarchy of `obj` elements. `Podoff::Obj` points to such elements (see `Podoff::Document#objs`).
+```ruby
+class Podoff::Obj
+  def insert_font(font_nick, font_obj_or_ref)
+  def insert_contents(obj_or_ref)
+```
+### Podoff::Stream
+TODO
+```ruby
+class Podoff::Stream
+  def tf(font_name, font_size)
+  alias :font :tf
+  def bt(x, y, text)
+  alias :text :bt
+```
 ## disclaimer
 The author of this tool/library have no link whatsoever with the authors of the sample PDF documents found under `pdfs/`. Those documents have been selected because they are representative of the PDF forms podoff is meant to ~~deface~~fill.
+## known bugs
+* podoff parsing is naive, documents that contain uncompressed streams with "endobj", "startxref", "/Root" will disorient podoff
+* completely candid about encoding (only used it for British English documents so far)
+## links
+* http://qpdf.sourceforge.net/ source: https://github.com/qpdf/qpdf
+* http://www.slideshare.net/ange4771/advanced-pdf-tricks
 ## LICENSE
 MIT, see [LICENSE.txt](LICENSE.txt)

data/lib/podoff.rb CHANGED Viewed

@@ -30,23 +30,26 @@ require 'stringio'
 module Podoff
-  VERSION = '1.1.1'
+  VERSION = '1.2.0'
-  def self.load(path, encoding='iso-8859-1')
+  def self.load(path, encoding)
     Podoff::Document.load(path, encoding)
   end
-  def self.parse(s)
+  def self.parse(s, encoding)
-    Podoff::Document.new(s)
+    Podoff::Document.new(s, encoding)
   end
   class Document
-    def self.load(path, encoding='iso-8859-1')
+    def self.load(path, encoding)
-      Podoff::Document.new(File.open(path, 'r:' + encoding) { |f| f.read })
+      Podoff::Document.new(
+        File.open(path, 'r:' + encoding) { |f| f.read },
+        encoding
+      )
     end
     def self.parse(s)
@@ -54,6 +57,8 @@ module Podoff
       Podoff::Document.new(s)
     end
+    attr_reader :encoding
     attr_reader :scanner
     attr_reader :version
     attr_reader :xref
@@ -63,11 +68,13 @@ module Podoff
     #
     attr_reader :additions
-    def initialize(s)
+    def initialize(s, encoding)
       fail ArgumentError.new('not a PDF file') \
         unless s.match(/\A%PDF-\d+\.\d+\s/)
+      @encoding = encoding
       @scanner = ::StringScanner.new(s)
       @version = nil
       @xref = nil
@@ -113,11 +120,6 @@ module Podoff
       @scanner.string
     end
-    def extract_ref(s)
-      s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
-    end
     def updated?
       @additions.any?
@@ -129,6 +131,8 @@ module Podoff
       self.class.allocate.instance_eval do
+        @encoding = o.encoding
         @scanner = ::StringScanner.new(o.source)
         @xref = o.xref
@@ -146,26 +150,23 @@ module Podoff
     def pages
-      @objs.values.select { |o| o.type == '/Page' }
-    end
-    def page(index)
+      #@objs.values.select { |o| o.type == '/Page' }
-      return nil if index == 0
+      ps = @objs.values.find { |o| o.type == '/Pages' }
+      return nil unless ps
-      pas = pages
-      return nil if pas.empty?
+      extract_refs(ps.attributes[:kids]).collect { |r| @objs[r] }
+    end
-      return (
-        index > 0 ? pas.at(index - 1) : pas.at(index)
-      ) unless pas.first.attributes[:pagenum]
+    def page(index)
       if index < 0
-        max = pas.inject(0) { |n, pa| [ n, pa.page_number ].max }
-        index = max + 1 + index
+        pages[index]
+      elsif index == 0
+        nil
+      else
+        pages[index - 1]
       end
-      pas.find { |pa| pa.page_number == index }
     end
     def new_ref
@@ -224,7 +225,9 @@ module Podoff
       add(obj)
     end
-    def write(path)
+    def write(path=:string, encoding=nil)
+      encoding ||= @encoding
       f =
         case path
@@ -232,6 +235,8 @@ module Podoff
           when String then File.open(path, 'wb')
           else path
         end
+      f.set_encoding(encoding) # internal encoding: nil
+      #f.set_encoding(encoding, encoding)
       f.write(source)
@@ -241,19 +246,19 @@ module Podoff
         @additions.values.each do |o|
           f.write("\n")
-          pointers[o.ref.split(' ').first.to_i] = f.pos + 1
-          f.write(o.to_s)
+          pointers[o.ref.split(' ').first.to_i] = f.pos
+          f.write(o.to_s.force_encoding(encoding))
         end
         f.write("\n\n")
-        xref = f.pos + 1
+        xref = f.pos
         write_xref(f, pointers)
         f.write("trailer\n")
         f.write("<<\n")
         f.write("/Prev #{self.xref}\n")
-        f.write("/Size #{objs.size}\n")
+        f.write("/Size #{objs.size + 1}\n")
         f.write("/Root #{root} R\n")
         f.write(">>\n")
         f.write("startxref #{xref}\n")
@@ -265,7 +270,9 @@ module Podoff
       f.is_a?(StringIO) ? f.string : nil
     end
-    def rewrite(path=:string)
+    def rewrite(path=:string, encoding=nil)
+      encoding ||= @encoding
       f =
         case path
@@ -273,6 +280,7 @@ module Podoff
           when String then File.open(path, 'wb')
           else path
         end
+      f.set_encoding(encoding)
       v = source.match(/%PDF-\d+\.\d+/)[0]
       f.write(v)
@@ -281,18 +289,18 @@ module Podoff
       pointers = {}
       objs.keys.sort.each do |k|
-        pointers[k.split(' ').first.to_i] = f.pos + 1
-        f.write(objs[k].source)
+        pointers[k.split(' ').first.to_i] = f.pos
+        f.write(objs[k].source.force_encoding(encoding))
         f.write("\n")
       end
-      xref = f.pos + 1
+      xref = f.pos
       write_xref(f, pointers)
       f.write("trailer\n")
       f.write("<<\n")
-      f.write("/Size #{objs.size}\n")
+      f.write("/Size #{objs.size + 1}\n")
       f.write("/Root #{root} R\n")
       f.write(">>\n")
       f.write("startxref #{xref}\n")
@@ -309,7 +317,7 @@ module Podoff
       f.write("xref\n")
       f.write("0 1\n")
-      f.write("0000000000 65535 f\n")
+      f.write("0000000000 65535 f \n")
       pointers
         .keys
@@ -321,7 +329,7 @@ module Podoff
         }
         .each { |part|
           f.write("#{part.first} #{part.size}\n")
-          part.each { |k| f.write(sprintf("%010d 00000 n\n", pointers[k])) }
+          part.each { |k| f.write(sprintf("%010d 00000 n \n", pointers[k])) }
         }
     end
@@ -332,12 +340,21 @@ module Podoff
       s
     end
+    def extract_ref(s)
+      s.gsub(/\s+/, ' ').gsub(/[^0-9 ]+/, '').strip
+    end
+    def extract_refs(s)
+      s.gsub(/\s+/, ' ').scan(/(\d+ \d+) R/).collect(&:first)
+    end
   end
   class Obj
-    ATTRIBUTES =
-      { type: 'Type', contents: 'Contents', pagenum: 'pdftk_PageNum' }
+    ATTRIBUTES = { type: 'Type', contents: 'Contents', kids: 'Kids' }
     def self.extract(doc)
@@ -413,12 +430,6 @@ module Podoff
       @attributes && @attributes[:type]
     end
-    def page_number
-      r = @attributes && @attributes[:pagenum]
-      r ? r.to_i : nil
-    end
     def insert_font(nick, obj_or_ref)
       fail ArgumentError.new("target '#{ref}' not a replica") \

data/out.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ utf-8

data/spec/alpha_spec.rb ADDED Viewed

@@ -0,0 +1,40 @@
+#
+# specifying podoff
+#
+# Tue Nov 10 21:01:51 JST 2015
+#
+require 'spec_helper'
+describe 'fixtures:' do
+  Dir['pdfs/*.pdf'].each do |path|
+    describe path do
+      it 'is a valid pdf document' do
+        expect(path).to be_a_valid_pdf
+      end
+    end
+  end
+  describe 'pdfs/t0.pdf' do
+    it 'is encoded as UTF-8' do
+      expect('pdfs/t0.pdf').to be_encoded_as('utf-8')
+    end
+  end
+  describe 'pdfs/udocument0.pdf' do
+    it 'is encoded as ISO-8859-1' do
+      expect('pdfs/udocument0.pdf').to be_encoded_as('latin1')
+    end
+  end
+end

data/spec/core_spec.rb CHANGED Viewed

@@ -14,11 +14,11 @@ describe Podoff do
     it 'loads a PDF document' do
-      d = Podoff.load('pdfs/t0.pdf')
+      d = Podoff.load('pdfs/t0.pdf', 'utf-8')
       expect(d.class).to eq(Podoff::Document)
       expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
-      expect(d.xref).to eq(414)
+      expect(d.xref).to eq(413)
       #pp d.objs.values.collect(&:to_a)
@@ -41,25 +41,25 @@ describe Podoff do
     it 'loads a PDF document' do
-      d = Podoff.load('pdfs/udocument0.pdf')
+      d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
       expect(d.class).to eq(Podoff::Document)
-      expect(d.xref).to eq(3138351)
+      expect(d.xref).to eq(1612815)
       expect(d.objs.size).to eq(273)
       expect(d.objs.keys).to include('1 0')
       expect(d.objs.keys).to include('273 0')
-      expect(d.root).to eq('65 0')
+      expect(d.root).to eq('1 0')
       expect(d.pages.size).to eq(3)
     end
     it 'loads a PDF document with incremental updates' do
-      d = Podoff.load('pdfs/t1.pdf')
+      d = Podoff.load('pdfs/t1.pdf', 'utf-8')
       expect(d.class).to eq(Podoff::Document)
-      expect(d.xref).to eq(698)
+      expect(d.xref).to eq(704)
       expect(d.objs.keys).to eq([ '1 0', '2 0', '3 0', '4 0', '5 0', '6 0' ])
       expect(d.obj_counters.keys).to eq(
@@ -72,7 +72,7 @@ describe Podoff do
     it 'loads a [re]compressed PDF documents' do
-      d = Podoff.load('pdfs/qdocument0.pdf')
+      d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
       expect(d.class).to eq(Podoff::Document)
       expect(d.xref).to eq(1612815)
@@ -85,14 +85,13 @@ describe Podoff do
       #end
       expect(d.pages.size).to eq(3)
-      expect(d.pages.first.attributes[:pagenum]).to eq('1')
       expect(d.objs['46 0'].attributes[:type]).to eq('/Annot')
     end
     it 'rejects items that are not PDF documents' do
       expect {
-        Podoff.load('spec/spec_helper.rb')
+        Podoff.load('spec/spec_helper.rb', 'utf-8')
       }.to raise_error(ArgumentError, 'not a PDF file')
     end
   end

data/spec/document_spec.rb CHANGED Viewed

@@ -12,7 +12,7 @@ describe Podoff::Document do
   before :all do
-    @d = Podoff.load('pdfs/udocument0.pdf')
+    @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
   end
   describe '#objs' do
@@ -39,10 +39,9 @@ describe Podoff::Document do
     it 'returns a page given an index (starts at 1)' do
       p = @d.page(1)
+      expect(p.ref).to eq('56 0')
       expect(p.class).to eq(Podoff::Obj)
       expect(p.type).to eq('/Page')
-      expect(p.attributes[:pagenum]).to eq('1')
-      expect(p.page_number).to eq(1)
     end
     it 'returns nil if the page doesn\'t exist' do
@@ -51,12 +50,11 @@ describe Podoff::Document do
       expect(@d.page(9)).to eq(nil)
     end
-    it 'returns the page, even for a doc without pdftk_PageNum' do
+    it 'returns a page given an index (starts at 1) (2)' do
-      d = Podoff::Document.load('pdfs/t2.pdf')
+      d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
       expect(d.page(1).ref).to eq('3 0')
-      expect(d.page(1).page_number).to eq(nil)
       expect(d.page(0)).to eq(nil)
       expect(d.page(2)).to eq(nil)
@@ -64,16 +62,14 @@ describe Podoff::Document do
     it 'returns pages from the last when the index is negative' do
-      expect(@d.page(-1).ref).to eq('33 0')
-      expect(@d.page(-1).page_number).to eq(3)
+      expect(@d.page(-1).ref).to eq('58 0')
     end
-    it 'returns pages from the last when the index is negative (no PageNum)' do
+    it 'returns pages from the last when the index is negative (2)' do
-      d = Podoff::Document.load('pdfs/t2.pdf')
+      d = Podoff::Document.load('pdfs/t2.pdf', 'utf-8')
       expect(d.page(-1).ref).to eq('3 0')
-      expect(d.page(-1).page_number).to eq(nil)
     end
   end
@@ -86,6 +82,8 @@ describe Podoff::Document do
       expect(d.class).to eq(Podoff::Document)
       expect(d.hash).not_to eq(@d.hash)
+      expect(d.encoding).to eq('iso-8859-1')
       expect(d.objs.hash).not_to eq(@d.objs.hash)
       expect(d.objs.values.first.hash).not_to eq(@d.objs.values.first.hash)
@@ -95,7 +93,7 @@ describe Podoff::Document do
       expect(d.objs.values.first.document).to equal(d)
       expect(@d.objs.values.first.document).to equal(@d)
-      expect(d.root).to eq('65 0')
+      expect(d.root).to eq('1 0')
     end
     it 'sports objs with properly recomputed attributes' do
@@ -112,7 +110,7 @@ describe Podoff::Document do
     before :each do
-      @d = Podoff.load('pdfs/t0.pdf')
+      @d = Podoff.load('pdfs/t0.pdf', 'utf-8')
     end
     describe '#add_base_font' do
@@ -132,9 +130,11 @@ describe Podoff::Document do
           '7 0 obj <</Type /Font /Subtype /Type1 /BaseFont /Helvetica>> endobj')
         s = @d.write(:string)
-        d = Podoff.parse(s)
+        d = Podoff.parse(s, 'utf-8')
+        expect(d.xref).to eq(686)
-        expect(d.xref).to eq(680)
+        expect(s).to be_a_valid_pdf
       end
       it 'doesn\'t mind a slash in front of the font name' do
@@ -175,9 +175,13 @@ endstream
 endobj
         }.strip)
-        d = Podoff.parse(@d.write(:string))
+        s = @d.write(:string)
+        expect(s).to be_a_valid_pdf
+        d = Podoff.parse(s, 'utf-8')
-        expect(d.xref).to eq(705)
+        expect(d.xref).to eq(711)
       end
       it 'accepts a block' do
@@ -202,10 +206,14 @@ endstream
 endobj
         }.strip)
-        d = Podoff.parse(@d.write(:string))
+        s = @d.write(:string)
-        expect(d.source.index('<</Length 97>>')).to eq(618)
-        expect(d.xref).to eq(757)
+        expect(s).to be_a_valid_pdf
+        d = Podoff.parse(s, 'utf-8')
+        expect(d.source.index('<</Length 97>>')).to eq(625)
+        expect(d.xref).to eq(763)
       end
       it 'returns the open stream when no arg given' do
@@ -250,12 +258,11 @@ endobj
       it 'recomputes the attributes correctly' do
-        d = Podoff.load('pdfs/qdocument0.pdf')
+        d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
         pa = d.re_add(d.page(1))
-        expect(pa.attributes).to eq(
-          { type: '/Page', contents: '151 0 R', pagenum: '1' })
+        expect(pa.attributes).to eq({ type: '/Page', contents: '151 0 R' })
       end
     end
   end
@@ -275,7 +282,7 @@ endobj
     it 'writes open streams as well' do
-      d = Podoff.load('pdfs/t0.pdf')
+      d = Podoff.load('pdfs/t0.pdf', 'utf-8')
       pa = d.re_add(d.page(1))
       st = d.add_stream
@@ -293,12 +300,12 @@ BT 10 20 Td (hello open stream) Tj ET
 endstream
 endobj
         }.strip)
-      ).to eq(722)
+      ).to eq(729)
     end
     it 'writes a proper xref table' do
-      d = Podoff.load('pdfs/t0.pdf')
+      d = Podoff.load('pdfs/t0.pdf', 'utf-8')
       pa = d.re_add(d.page(1))
       st = d.add_stream
@@ -307,21 +314,23 @@ endobj
       s = d.write(:string)
-      expect(s[808..-1].strip).to eq(%{
+      expect(s).to be_a_valid_pdf
+      expect(s[814..-1].strip).to eq(%{
 xref
 0 1
-0000000000 65535 f
+0000000000 65535 f
 3 1
-0000000611 00000 n
+0000000617 00000 n
 7 1
-0000000723 00000 n
+0000000729 00000 n
 trailer
 <<
-/Prev 414
-/Size 7
+/Prev 413
+/Size 8
 /Root 1 0 R
 >>
-startxref 809
+startxref 815
 %%EOF
       }.strip)
     end
@@ -331,7 +340,7 @@ startxref 809
     it 'rewrites a document in one go' do
-      d = Podoff.load('pdfs/t2.pdf')
+      d = Podoff.load('pdfs/t2.pdf', 'utf-8')
       s = d.rewrite(:string)
@@ -361,23 +370,45 @@ endstream
 endobj
 xref
 0 1
-0000000000 65535 f
+0000000000 65535 f
 1 7
-0000000010 00000 n
-0000000057 00000 n
-0000000112 00000 n
-0000000222 00000 n
-0000000261 00000 n
-0000000329 00000 n
-0000000420 00000 n
+0000000009 00000 n
+0000000056 00000 n
+0000000111 00000 n
+0000000221 00000 n
+0000000260 00000 n
+0000000328 00000 n
+0000000419 00000 n
 trailer
 <<
-/Size 7
+/Size 8
 /Root 1 0 R
 >>
-startxref 511
+startxref 510
 %%EOF
       }.strip)
+      expect(s).to be_a_valid_pdf
+    end
+  end
+  describe '#extract_refs' do
+    it 'extracts a ref' do
+      expect(
+        Podoff::Document.allocate.send(:extract_refs, '17 0 R')
+      ).to eq([ '17 0' ])
+      expect(
+        Podoff::Document.allocate.send(:extract_refs, ' 17 0 R')
+      ).to eq([ '17 0' ])
+    end
+    it 'extracts a list of ref' do
+      expect(
+        Podoff::Document.allocate.send(:extract_refs, '[17 0 R 6 0 R]')
+      ).to eq([ '17 0', '6 0' ])
     end
   end
 end

data/spec/obj_spec.rb CHANGED Viewed

@@ -12,7 +12,7 @@ describe Podoff::Obj do
   before :all do
-    @d = Podoff.load('pdfs/udocument0.pdf')
+    @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
   end
   describe '#document' do
@@ -30,7 +30,8 @@ describe Podoff::Obj do
       o = @d.objs['20 0']
       expect(o.source).to eq(%{
-20 0 obj [21 0 R]
+20 0 obj
+<< /DA (/Calibri,Bold 10 Tf 0 g) /F 4 /FT /Tx /MK << >> /P 58 0 R /Rect [ 448.723 652.574 490.603 667.749 ] /Subtype /Widget /T (State) /TU (State) /Type /Annot >>
 endobj
       }.strip)
     end
@@ -86,12 +87,12 @@ endobj
     it 'returns the type of the obj' do
-      expect(@d.objs['23 0'].type).to eq('/Font')
+      expect(@d.objs['12 0'].type).to eq('/Font')
     end
     it 'returns nil if there is no type' do
-      expect(@d.objs['17 0'].type).to eq(nil)
+      expect(@d.objs['59 0'].type).to eq(nil)
     end
     it 'works on open streams' do
@@ -150,7 +151,7 @@ endobj
     before :each do
-      @d = Podoff.load('pdfs/udocument0.pdf')
+      @d = Podoff.load('pdfs/udocument0.pdf', 'iso-8859-1')
     end
     describe '#insert_contents' do
@@ -178,7 +179,7 @@ endobj
         pa.insert_contents(st)
-        expect(pa.source).to match(/\/Contents \[3 0 R #{st.ref} R\]\n/)
+        expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
       end
       it 'accepts an obj ref' do
@@ -189,7 +190,7 @@ endobj
         pa.insert_contents(st.ref)
-        expect(pa.source).to match(/\/Contents \[3 0 R #{st.ref} R\]\n/)
+        expect(pa.source).to match(/\/Contents \[151 0 R #{st.ref} R\]/)
       end
     end
@@ -237,14 +238,14 @@ endobj
       it 'adds to a list of references' do
-        d = Podoff.load('pdfs/qdocument0.pdf')
+        d = Podoff.load('pdfs/qdocument0.pdf', 'iso-8859-1')
         o = d.re_add('56 0')
         o.send(:add_to_attribute, :contents, '9999 0')
         expect(o.attributes).to eq(
-          { type: '/Page', contents: '[151 0 R 9999 0 R]', pagenum: '1' })
+          { type: '/Page', contents: '[151 0 R 9999 0 R]' })
       end
     end
   end

data/spec/spec_helper.rb CHANGED Viewed

@@ -10,3 +10,69 @@ require 'ostruct'
 require 'podoff'
+RSpec::Matchers.define :be_encoded_as do |encoding|
+  match do |path|
+    fail ArgumentError.new("expecting a path (String) not a #{path.class}") \
+      unless path.is_a?(String)
+    $vic_r =
+      `(vim -c 'execute \"silent !echo \" . &fileencoding . " > _enc.txt" | q' #{path} > /dev/null 2>&1); cat _enc.txt; rm _enc.txt`.strip.downcase
+    $vic_r == encoding.downcase
+  end
+  failure_message do |path|
+    "expected #{encoding.downcase.inspect}, got #{$vic_r.to_s.inspect}"
+  end
+end
+RSpec::Matchers.define :be_a_valid_pdf do
+  match do |o|
+    path =
+      if /\A%PDF-\d/.match(o)
+        File.open('tmp/_under_check.pdf', 'wb') { |f| f.write(o) }
+        'tmp/_under_check.pdf'
+      else
+        o
+      end
+    file_cmd =
+      /darwin/.match(RUBY_PLATFORM) ? 'file -I' : 'file -i'
+    vim_cmd =
+      "vim -c 'execute \"silent !echo \" . &fileencoding | q'"
+    cmd = [
+      "echo '* vim :'",
+      "#{vim_cmd} #{path}",
+      "echo '* #{file_cmd} :'",
+      "#{file_cmd} #{path}",
+      "echo",
+      "qpdf --check #{path}"
+    ]
+    $qpdf_r = `(#{cmd.join('; ')}) 2>&1`
+      `#{file_cmd} #{path}; echo; qpdf --check #{path} 2>&1`
+    $qpdf_r = "#{$qpdf_r}\nexit: #{$?.exitstatus}"
+#puts "." * 80
+#puts $qpdf_r
+    $qpdf_r.match(/exit: 0$/)
+  end
+  failure_message do |o|
+    %{
+--- qpdf ---------------------------------------------------------------------->
+#{$qpdf_r}
+<-- qpdf -----------------------------------------------------------------------
+    }.strip
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: podoff
 version: !ruby/object:Gem::Version
-  version: 1.1.1
+  version: 1.2.0
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-10-25 00:00:00.000000000 Z
+date: 2015-11-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -52,6 +52,7 @@ extra_rdoc_files: []
 files:
 - Rakefile
 - lib/podoff.rb
+- spec/alpha_spec.rb
 - spec/core_spec.rb
 - spec/document_spec.rb
 - spec/obj_spec.rb
@@ -61,6 +62,7 @@ files:
 - podoff.gemspec
 - CHANGELOG.txt
 - LICENSE.txt
+- out.txt
 - todo.txt
 - README.md
 homepage: http://github.com/jmettraux/podoff