RubyGems - pdf-reader - Versions diffs - 0.7.3 → 0.7.4 - Mend

pdf-reader 0.7.3 → 0.7.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

data/CHANGELOG CHANGED

@@ -1,4 +1,9 @@
-v0.7.3 (UNRELESED)
+v0.7.4 (7th August 2008)
+- Raise a MalformedPDFError if a content stream contains an unterminated string
+- Fix an bug that was causing an endless loop on some OSX systems
+  - valid strings were incorrectly thought to be unterminated
+v0.7.3 (11th June 2008)
 - Add a high level way to get direct access to a PDF object, including a new executable: pdf_object
 - Fix a hard loop bug caused by a content stream that is missing a final operator
 - Significantly simplified the internal code for encoding conversions

data/README.rdoc CHANGED

@@ -18,7 +18,7 @@ The recommended installation method is via Rubygems.
 PDF::Reader is designed with a callback-style architecture. The basic concept
 is to build a receiver class and pass that into PDF::Reader along with the PDF
-to process.
+to process.
 As PDF::Reader walks the file and encounters various objects (pages, text,
 images, shapes, etc) it will call methods on the receiver class.  What those
@@ -37,22 +37,22 @@ text will be converted to UTF-8 before it is passed back from PDF::Reader.
 = Exceptions
-There are two key exceptions that you will need to watch out for when processing a
+There are two key exceptions that you will need to watch out for when processing a
 PDF file:
-MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
-file should be valid, or that a corrupt file didn't raise an exception, please
+MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
+file should be valid, or that a corrupt file didn't raise an exception, please
 forward a copy of the file to the maintainers and we can attempt improve the code.
-UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
-support. Again, we welcome submissions of PDF files that exhibit these features to help
+UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
+support. Again, we welcome submissions of PDF files that exhibit these features to help
 us with future code improvements.
 MalformedPDFError has some subclasses if you want to detect finer grained issues. If you
 don't, 'rescue MalformedPDFError' will catch all the subclassed errors as well.
 Any other exceptions should be considered bugs in either PDF::Reader (please
-report it!) your receiver (please don't report it!).
+report it!) or your receiver (please don't report it!).
 = Maintainers
@@ -80,9 +80,9 @@ A simple app to count the number of pages in a PDF File.
     attr_accessor :page_count
     def initialize
-      @page_count = 0
+      @page_count = 0
     end
     # Called when page parsing ends
     def end_page
       @page_count += 1
@@ -97,7 +97,7 @@ A simple app to count the number of pages in a PDF File.
 WARNING: this will generate a *lot* of output, so you probably want to pipe
 it through less or to a text file.
   require 'rubygems'
   require 'pdf/reader'
@@ -107,7 +107,42 @@ it through less or to a text file.
     puts cb
   end
-== Extract metadata only
+== Extract all text from a single PDF
+  class PageTextReceiver
+    attr_accessor :content
+    def initialize
+      @content = []
+    end
+    # Called when page parsing starts
+    def begin_page(arg = nil)
+      @content << ""
+    end
+    # record text that is drawn on the page
+    def show_text(string, *params)
+      @content.last << string.strip
+    end
+    # there's a few text callbacks, so make sure we process them all
+    alias :super_show_text :show_text
+    alias :move_to_next_line_and_show_text :show_text
+    alias :set_spacing_next_line_show_text :show_text
+    # this final text callback takes slightly different arguments
+    def show_text_with_positioning(*params)
+      params = params.first
+      params.each { |str| show_text(str) if str.kind_of?(String)}
+    end
+  end
+  receiver = PageTextReceiver.new
+  pdf = PDF::Reader.file("somefile.pdf", receiver)
+  puts receiver.content.inspect
+== Extract metadata only
   require 'rubygems'
   require 'pdf/reader'
@@ -150,7 +185,7 @@ A simple app to display the number of pages in a PDF File.
   pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false)
   puts "#{receiver.pages} pages"
-== Basic RSpec of a generated PDF
+== Basic RSpec of a generated PDF
   require 'rubygems'
   require 'pdf/reader'

data/Rakefile CHANGED

@@ -6,7 +6,7 @@ require 'rake/testtask'
 require "rake/gempackagetask"
 require 'spec/rake/spectask'
-PKG_VERSION = "0.7.3"
+PKG_VERSION = "0.7.4"
 PKG_NAME = "pdf-reader"
 PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"

data/TODO CHANGED

@@ -1,4 +1,6 @@
 v0.8
+- optimise PDF::Reader::Reference#from_buffer
+  - ruby-prof shows the match() call in this function is a real killer
 - add extra callbacks
   - list implemented features
     - encrypted? tagged? bookmarks? annotated? optimised?

data/lib/pdf/reader/parser.rb CHANGED

@@ -118,11 +118,21 @@ class PDF::Reader
       while count != 0
         @buffer.ready_token(false, false)
-        i = @buffer.raw.index(/[\\\(\)]/)
+        # find the first occurance of ( ) [ \ or ]
+        #
+        # we used to use the following line, but it fails sometimes
+        # under OSX.
+        #   i = @buffer.raw.index(/[\\\(\)]/)
+        i = @buffer.raw.unpack("C*").index { |n| [40, 41, 91, 92, 93].include?(n) }
         if i.nil?
           str << @buffer.raw + "\n"
           @buffer.raw.replace("")
+          # if a content stream opens a string, but never closes it, we'll
+          # hit the end of the stream and still be appending stuff to the
+          # string. bad! This check prevents a hard loop.
+          raise MalformedPDFError, 'unterminated string in content stream' if @buffer.eof?
           next
         end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: pdf-reader
 version: !ruby/object:Gem::Version
-  version: 0.7.3
+  version: 0.7.4
 platform: ruby
 authors:
 - Peter Jones
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2008-06-11 00:00:00 +10:00
+date: 2008-08-07 00:00:00 +10:00
 default_executable:
 dependencies: []
@@ -84,7 +84,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
 requirements: []
 rubyforge_project: pdf-reader
-rubygems_version: 1.1.1
+rubygems_version: 1.2.0
 signing_key:
 specification_version: 2
 summary: A library for accessing the content of PDF files