RubyGems - pdf-reader - Versions diffs - 1.0.0.rc1 → 1.0.0 - Mend

pdf-reader 1.0.0.rc1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

data/CHANGELOG +4 -0
data/README.rdoc +12 -17
data/TODO +6 -17
data/lib/pdf/reader.rb +7 -7
data/lib/pdf/reader/filter.rb +1 -1
data/lib/pdf/reader/form_xobject.rb +1 -1
data/lib/pdf/reader/page.rb +1 -1
data/lib/pdf/reader/page_text_receiver.rb +2 -14
data/lib/pdf/reader/standard_security_handler.rb +5 -3
metadata +23 -24

data/CHANGELOG CHANGED

@@ -1,3 +1,7 @@
+v1.0.0 (16th January 2012)
+- support a new encryption variation
+- bugfix in PageTextRender (thanks Paul Gallagher)
 v1.0.0.rc1 (19th December 2011)
 - performance optimisations (all by Bernerd Schaefer)
 - some improvements to text extraction from form xobjects

data/README.rdoc CHANGED

@@ -1,18 +1,3 @@
-= !PLEASE NOTE!
-All the examples below are for the latest (pre-release) version of the gem (0.11)
-If you have installed the gem via the rubygems with the command:
-    $ gem install pdf-reader
-Then the examples below *will not work* for you. Please check the examples that
-come with previous version of the gem (0.10).
-If you want to install the latest version of this gem use the command:
-    $ gem install pdf-reader --prerelease
 = Release Notes
 The PDF::Reader library implements a PDF parser conforming as much as possible
@@ -59,7 +44,8 @@ an IO stream:
     puts reader.info
 If you open a PDF with File#open or IO#open, I strongly recommend using "rb"
-mode to ensure the file isn't mangled by ruby being 'helpful'.
+mode to ensure the file isn't mangled by ruby being 'helpful'. This is
+particularly important on windows and MRI >= 1.9.2.
     File.open("somefile.pdf", "rb") do |io|
       reader = PDF::Reader.new(io)
@@ -111,6 +97,15 @@ to UTF-8 before it is passed back from PDF::Reader.
 Strings that contain binary data (like font blobs) will be marked as such on
 M17N aware VMs.
+= Former API
+Version 1.0.0 of PDF::Reader introduced a new page-based API that provides
+efficient and easy access to any page.
+The previous API is marked as deprecated but will continue to work for the
+time being. Eventually calls to the old API will begin triggering deprecation
+warnings before it is completely removed in version 2.0.0.
 = Exceptions
 There are two key exceptions that you will need to watch out for when processing a
@@ -119,7 +114,7 @@ PDF file:
 MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
 file should be valid, or that a corrupt file didn't raise an exception, please
 forward a copy of the file to the maintainers (preferably via the google group)
-and we can attempt to improve the code.
+and we will attempt to improve the code.
 UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
 support. Again, we welcome submissions of PDF files that exhibit these features to help

data/TODO CHANGED

@@ -1,27 +1,19 @@
-v0.8
-- add extra callbacks
-  - list implemented features
-    - encrypted? tagged? bookmarks? annotated? optimised?
-- Allow more than just page content and metadata to be parsed (see spec section 3.6.1)
+This stuff would be great
+- improved access to document level objects and data
   - bookmarks?
   - outline?
   - articles?
   - viewer prefs?
-- Don't remove comment when tokenising in the middle of a string
+- Improve the speed of Encoding#to_utf8
 - Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
   poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
   from the Original encoding to Unicode.
 - detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
 - Improve interpretation of non content stream data (ie metadata). recognise dates, etc
-- Fix inheritance of page attributes. Resources has been done, but plenty of other attributes
-  are inheritable. See table 3.2.7 in the spec
-v0.9
-- Add a way to extract raster images
-  - see XObjects section of spec (section 4.7)
-- Add a way to extract font data?
-Sometime
+This might be useful, more research required
 - Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
   - Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
@@ -30,10 +22,7 @@ Sometime
 - Ship some extra receivers in the standard package, particuarly ones that are useful for running
   rspec over generated PDF files
-- When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there's no
-  sensible way to convert them to unicode
-- Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
+- Add support for additional filters: CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode
 - Add support for additional encodings:
   - Identity-V(I *think* this relates to vertical text. Not sure how we'd support it sensibly)

data/lib/pdf/reader.rb CHANGED

@@ -159,7 +159,7 @@ module PDF
       yield PDF::Reader.new(input, opts)
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     #
@@ -171,7 +171,7 @@ module PDF
       end
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     # Parse the given string, sending events to the given receiver.
@@ -182,7 +182,7 @@ module PDF
       end
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     # Parse the file with the given name, returning an unmarshalled ruby version of
@@ -194,7 +194,7 @@ module PDF
       }
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     # Parse the given string, returning an unmarshalled ruby version of represents
@@ -245,7 +245,7 @@ module PDF
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     # Given an IO object that contains PDF data, parse it.
@@ -263,7 +263,7 @@ module PDF
       self
     end
-    # DEPRECATED: this method was deprecated in version 0.11.0 and will
+    # DEPRECATED: this method was deprecated in version 1.0.0 and will
     #             eventually be removed
     #
     # Given an IO object that contains PDF data, return the contents of a single object
@@ -276,7 +276,7 @@ module PDF
     private
-    # recursively convert strings from outside a content stream intop UTF-8
+    # recursively convert strings from outside a content stream into UTF-8
     #
     def doc_strings_to_utf8(obj)
       case obj

data/lib/pdf/reader/filter.rb CHANGED

@@ -272,7 +272,7 @@ class PDF::Reader
         row += 1
       end
-      pixels.map { |row| row.flatten.pack("C*") }.join("")
+      pixels.map { |bytes| bytes.flatten.pack("C*") }.join("")
     end
   end
 end

data/lib/pdf/reader/form_xobject.rb CHANGED

@@ -76,7 +76,7 @@ module PDF
             params << token
           end
         end
-      rescue EOFError => e
+      rescue EOFError
         raise MalformedPDFError, "End Of File while processing a content stream"
       end
     end

data/lib/pdf/reader/page.rb CHANGED

@@ -133,7 +133,7 @@ module PDF
             params << token
           end
         end
-      rescue EOFError => e
+      rescue EOFError
         raise MalformedPDFError, "End Of File while processing a content stream"
       end

data/lib/pdf/reader/page_text_receiver.rb CHANGED

@@ -1,12 +1,6 @@
 # coding: utf-8
 require 'matrix'
-require 'yaml'
-begin
-  require 'psych'
-rescue LoadError
-end
 module PDF
   class Reader
@@ -32,7 +26,7 @@ module PDF
         @font_stack    = [build_fonts(page.fonts)]
         @xobject_stack = [page.xobjects]
         @content = {}
-        @stack   = [DEFAULT_GRAPHICS_STATE]
+        @stack   = [DEFAULT_GRAPHICS_STATE.dup]
       end
       def content
@@ -235,8 +229,6 @@ module PDF
       # underlying device space.
       #
       def transform(point, z = 1)
-        trm = text_rendering_matrix
         point.transform(text_rendering_matrix, z)
       end
@@ -286,7 +278,7 @@ module PDF
       end
       # private class for representing points on a cartesian plain. Used
-      # to simplify maths in the MinPpi class.
+      # to simplify maths.
       #
       class Point < Struct.new(:x, :y)
         def transform(trm, z)
@@ -295,10 +287,6 @@ module PDF
             (trm[0,1] * x) + (trm[1,1] * y) + (trm[2,1] * z)
           )
         end
-        def distance(point)
-          Math.hypot(point.x - @x, point.y - @y)
-        end
       end
     end
   end

data/lib/pdf/reader/standard_security_handler.rb CHANGED

@@ -79,7 +79,8 @@ class PDF::Reader
       objKey = @encrypt_key.dup
       (0..2).each { |e| objKey << (ref.id >> e*8 & 0xFF ) }
       (0..1).each { |e| objKey << (ref.gen >> e*8 & 0xFF ) }
-      rc4 = RC4.new( Digest::MD5.digest(objKey) )
+      length = objKey.length < 16 ? objKey.length : 16
+      rc4 = RC4.new( Digest::MD5.digest(objKey)[(0...length)] )
       rc4.decrypt(buf)
     end
@@ -144,10 +145,11 @@ class PDF::Reader
         out = Digest::MD5.digest(PassPadBytes.pack("C*") + @file_id)
         #zero doesn't matter -> so from 0-19
         20.times{ |i| out=RC4.new(xor_each_byte(keyBegins, i)).decrypt(out) }
+        pass = @user_key[(0...16)] == out
       else
-        out = RC4.new(keyBegins).encrypt(PassPadBytes.pack("C*"))
+        pass = RC4.new(keyBegins).encrypt(PassPadBytes.pack("C*")) == @user_key
       end
-      @user_key[(0...16)] == out ? keyBegins : nil
+      pass ? keyBegins : nil
     end
     def make_file_key( user_pass )

metadata CHANGED

@@ -1,19 +1,19 @@
 --- !ruby/object:Gem::Specification
 name: pdf-reader
 version: !ruby/object:Gem::Version
-  version: 1.0.0.rc1
-  prerelease: 6
+  version: 1.0.0
+  prerelease:
 platform: ruby
 authors:
 - James Healy
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-12-19 00:00:00.000000000 Z
+date: 2012-01-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
-  requirement: &19650680 !ruby/object:Gem::Requirement
+  requirement: &24844240 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *19650680
+  version_requirements: *24844240
 - !ruby/object:Gem::Dependency
   name: roodi
-  requirement: &19650220 !ruby/object:Gem::Requirement
+  requirement: &24843780 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *19650220
+  version_requirements: *24843780
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &19649720 !ruby/object:Gem::Requirement
+  requirement: &24843280 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -43,10 +43,10 @@ dependencies:
         version: '2.3'
   type: :development
   prerelease: false
-  version_requirements: *19649720
+  version_requirements: *24843280
 - !ruby/object:Gem::Dependency
   name: ZenTest
-  requirement: &19649220 !ruby/object:Gem::Requirement
+  requirement: &24842780 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -54,10 +54,10 @@ dependencies:
         version: 4.4.2
   type: :development
   prerelease: false
-  version_requirements: *19649220
+  version_requirements: *24842780
 - !ruby/object:Gem::Dependency
   name: Ascii85
-  requirement: &19648740 !ruby/object:Gem::Requirement
+  requirement: &24842320 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -65,10 +65,10 @@ dependencies:
         version: 1.0.0
   type: :runtime
   prerelease: false
-  version_requirements: *19648740
+  version_requirements: *24842320
 - !ruby/object:Gem::Dependency
   name: ruby-rc4
-  requirement: &19648280 !ruby/object:Gem::Requirement
+  requirement: &24841940 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -76,7 +76,7 @@ dependencies:
         version: '0'
   type: :runtime
   prerelease: false
-  version_requirements: *19648280
+  version_requirements: *24841940
 description: The PDF::Reader library implements a PDF parser conforming as much as
   possible to the PDF specification from Adobe
 email:
@@ -152,13 +152,12 @@ files:
 - bin/pdf_callbacks
 homepage: http://github.com/yob/pdf-reader
 licenses: []
-post_install_message: ! "\n  ********************************************\n\n  This
-  is a beta release of PDF::Reader to gather feedback on the proposed\n  API changes.\n\n
-  \ The old API is marked as deprecated but will continue to work with no\n  visible
-  warnings for now.\n\n  The new API is documented in the README and in rdoc for the
-  PDF::Reader,\n  PDF::Reader::Page and PDF::Reader::ObjectHash classes.\n\n  Do not
-  use this in production, stick to stable releases for that. If you do\n  take the
-  new API for a spin, please send any feedback my way.\n\n  ********************************************\n\n"
+post_install_message: ! "\n  ********************************************\n\n  v1.0.0
+  of PDF::Reader introduced a new page-based API. There are extensive\n  examples
+  showing how to use it in the README and examples directory.\n\n  For detailed documentation,
+  check the rdocs for the PDF::Reader,\n  PDF::Reader::Page and PDF::Reader::ObjectHash
+  classes.\n\n  The old API is marked as deprecated but will continue to work with
+  no\n  visible warnings for now.\n\n  ********************************************\n\n"
 rdoc_options:
 - --title
 - PDF::Reader Documentation
@@ -176,9 +175,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:
-  - - ! '>'
+  - - ! '>='
     - !ruby/object:Gem::Version
-      version: 1.3.1
+      version: '0'
 requirements: []
 rubyforge_project:
 rubygems_version: 1.8.11