pdf-reader 0.7.5 → 0.7.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG CHANGED
@@ -1,3 +1,14 @@
1
+ v0.7.6 (28th August 2009)
2
+ - Various bug fixes that increase the files we can successfully parse
3
+ - Treat float and integer tokens differently (thanks Neil)
4
+ - Correctly handle PDFs where the Kids element of a Pages dict is an indirect
5
+ reference (thanks Rob Holland)
6
+ - Fix conversion of PDF strings to Ruby strings on 1.8.6 (thanks Andrès Koetsier)
7
+ - Fix decoding with ASCII85 and ASCIIHex filters (thanks Andrès Koetsier)
8
+ - Fix extracting inline images from content streams (thanks Andrès Koetsier)
9
+ - Fix extracting [ ] from content streams (thanks Christian Rishøj)
10
+ - Fix conversion of text to UTF8 when the cmap uses bfrange (thanks Federico Gonzalez Lutteroth)
11
+
1
12
  v0.7.5 (27th August 2008)
2
13
  - Fix a 1.8.7ism
3
14
 
@@ -0,0 +1,21 @@
1
+ Copyright (c) 2009 Peter Jones
2
+ Copyright (c) 2009 James Healy
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining
5
+ a copy of this software and associated documentation files (the
6
+ "Software"), to deal in the Software without restriction, including
7
+ without limitation the rights to use, copy, modify, merge, publish,
8
+ distribute, sublicense, and/or sell copies of the Software, and to
9
+ permit persons to whom the Software is furnished to do so, subject to
10
+ the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -5,9 +5,22 @@ It provides programmatic access to the contents of a PDF file with a high
5
5
  degree of flexibility.
6
6
 
7
7
  The PDF 1.7 specification is a weighty document and not all aspects are
8
- currently supported. We welcome submission of PDF files that exhibit
8
+ currently supported. I welcome submission of PDF files that exhibit
9
9
  unsupported aspects of the spec to assist with improving out support.
10
10
 
11
+ = Development Status
12
+
13
+ I adopted this library in 2007 when I was learning the fundamentals of the PDF
14
+ spec. I do not currently use it in my day to day work and I just don't have the
15
+ spare time to dedicate to adding new features.
16
+
17
+ The code as it is works fairly well, and I offer it "as is". All patches, bug
18
+ reports and sample PDFs are welcome - I will work on them when I can. If anyone
19
+ is interested in adding features to PDF::Reader in their own effort to learn
20
+ the PDF file format, I'll happy offer help qand support.
21
+
22
+ I STRONGLY RECOMMEND NOT USING PDF::READER FOR YOUR PRODUCTION CODE.
23
+
11
24
  = Installation
12
25
 
13
26
  The recommended installation method is via Rubygems.
@@ -42,7 +55,8 @@ PDF file:
42
55
 
43
56
  MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
44
57
  file should be valid, or that a corrupt file didn't raise an exception, please
45
- forward a copy of the file to the maintainers and we can attempt improve the code.
58
+ forward a copy of the file to the maintainers (preferably via the google group)
59
+ and we can attempt to improve the code.
46
60
 
47
61
  UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
48
62
  support. Again, we welcome submissions of PDF files that exhibit these features to help
@@ -56,12 +70,18 @@ report it!) or your receiver (please don't report it!).
56
70
 
57
71
  = Maintainers
58
72
 
59
- - Peter Jones <mailto:pjones@pmade.com>
60
73
  - James Healy <mailto:jimmy@deefa.com>
61
74
 
75
+ = Licensing
76
+
77
+ This library is distributed under the terms of the MIT License. See the included file for
78
+ more detail.
79
+
62
80
  = Mailing List
63
81
 
64
- Any questions or feedback should be sent to the PDF::Reader google group.
82
+ Any questions or feedback should be sent to the PDF::Reader google group. It's
83
+ better that any answers be available for others instead of hiding in someone's
84
+ inbox.
65
85
 
66
86
  http://groups.google.com/group/pdf-reader
67
87
 
@@ -77,21 +97,21 @@ A simple app to count the number of pages in a PDF File.
77
97
  require 'pdf/reader'
78
98
 
79
99
  class PageReceiver
80
- attr_accessor :page_count
100
+ attr_accessor :counter
81
101
 
82
102
  def initialize
83
- @page_count = 0
103
+ @counter = 0
84
104
  end
85
105
 
86
106
  # Called when page parsing ends
87
107
  def end_page
88
- @page_count += 1
108
+ @counter += 1
89
109
  end
90
110
  end
91
111
 
92
112
  receiver = PageReceiver.new
93
113
  pdf = PDF::Reader.file("somefile.pdf", receiver)
94
- puts "#{receiver.page_count} pages"
114
+ puts "#{receiver.counter} pages"
95
115
 
96
116
  == List all callbacks generated by a single PDF
97
117
 
@@ -242,7 +262,7 @@ A simple app to display the number of pages in a PDF File.
242
262
 
243
263
  = Known Limitations
244
264
 
245
- The order of the callbacks is unpredicable, and is dependent on the internal
265
+ The order of the callbacks is unpredictable, and is dependent on the internal
246
266
  layout of the file, not the order objects are displayed to the user. As a
247
267
  consequence of this it is highly unlikely that text will be completely in
248
268
  order.
data/Rakefile CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
6
6
  require "rake/gempackagetask"
7
7
  require 'spec/rake/spectask'
8
8
 
9
- PKG_VERSION = "0.7.5"
9
+ PKG_VERSION = "0.7.6"
10
10
  PKG_NAME = "pdf-reader"
11
11
  PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
12
12
 
@@ -47,8 +47,7 @@ Rake::RDocTask.new("doc") do |rdoc|
47
47
  rdoc.rdoc_files.include('README.rdoc')
48
48
  rdoc.rdoc_files.include('TODO')
49
49
  rdoc.rdoc_files.include('CHANGELOG')
50
- #rdoc.rdoc_files.include('COPYING')
51
- #rdoc.rdoc_files.include('LICENSE')
50
+ rdoc.rdoc_files.include('MIT-LICENSE')
52
51
  rdoc.rdoc_files.include('lib/**/*.rb')
53
52
  rdoc.options << "--inline-source"
54
53
  end
@@ -70,7 +69,7 @@ spec = Gem::Specification.new do |spec|
70
69
  spec.executables << "pdf_text"
71
70
  spec.executables << "pdf_list_callbacks"
72
71
  spec.has_rdoc = true
73
- spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG}
72
+ spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG MIT-LICENSE }
74
73
  spec.rdoc_options << '--title' << 'PDF::Reader Documentation' <<
75
74
  '--main' << 'README.rdoc' << '-q'
76
75
  spec.author = "Peter Jones"
@@ -78,6 +77,7 @@ spec = Gem::Specification.new do |spec|
78
77
  spec.rubyforge_project = "pdf-reader"
79
78
  spec.homepage = "http://software.pmade.com/pdfreader"
80
79
  spec.description = "The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe"
80
+ spec.add_dependency('Ascii85', '>=0.9')
81
81
  end
82
82
 
83
83
  # package the library into a gem
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
4
+
3
5
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
6
 
5
7
  require 'pdf/reader'
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
4
+
3
5
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
6
 
5
7
  USAGE = "USAGE: " + File.basename(__FILE__) + " <file> <object id> [generation]"
@@ -1,5 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
3
4
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
5
 
5
6
  require 'pdf/reader'
@@ -24,6 +24,7 @@
24
24
  ################################################################################
25
25
 
26
26
  require 'stringio'
27
+ require 'ascii85'
27
28
 
28
29
  module PDF
29
30
  ################################################################################
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -41,7 +41,7 @@ class PDF::Reader
41
41
  self
42
42
  end
43
43
  ################################################################################
44
- # reads the requested number of bytes from the underlying IO stream.
44
+ # reads the requested number of bytes from the underlying IO stream.
45
45
  #
46
46
  # length should be a positive integer.
47
47
  def read (length)
@@ -56,13 +56,22 @@ class PDF::Reader
56
56
  out
57
57
  end
58
58
  ################################################################################
59
- # Reads from the buffer until the specified token is found, or the end of the buffer
59
+ # Reads from the buffer until the specified token is found, or the end of the buffer
60
60
  #
61
61
  # bytes - the bytes to search for.
62
62
  def read_until(bytes)
63
63
  out = ""
64
64
  size = bytes.size
65
-
65
+
66
+ if @buffer && !@buffer.empty?
67
+ if @buffer.include?(bytes)
68
+ offset = @buffer.index(bytes) + size
69
+ return head(offset)
70
+ else
71
+ out << head(@buffer.size)
72
+ end
73
+ end
74
+
66
75
  loop do
67
76
  out << @io.read(1)
68
77
  if out[-1 * size,size].eql?(bytes)
@@ -74,7 +83,7 @@ class PDF::Reader
74
83
  out
75
84
  end
76
85
  ################################################################################
77
- # returns true if the underlying IO object is at end and the internal buffer
86
+ # returns true if the underlying IO object is at end and the internal buffer
78
87
  # is empty
79
88
  def eof?
80
89
  ready_token
@@ -89,6 +98,10 @@ class PDF::Reader
89
98
  @io.pos
90
99
  end
91
100
  ################################################################################
101
+ def pos_without_buf
102
+ @io.pos - @buffer.to_s.size
103
+ end
104
+ ################################################################################
92
105
  # PDF files are processed by tokenising the content into a series of objects and commands.
93
106
  # This prepares the buffer for use by reading the next line of tokens into memory.
94
107
  def ready_token (with_strip=true, skip_blanks=true)
@@ -105,10 +118,10 @@ class PDF::Reader
105
118
  # return the next token from the underlying IO stream
106
119
  def token
107
120
  ready_token
108
-
121
+
109
122
  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size
110
123
 
111
- token_chars =
124
+ token_chars =
112
125
  if i == 0 and @buffer[i,2] == "<<" then 2
113
126
  elsif i == 0 and @buffer[i,2] == ">>" then 2
114
127
  elsif i == 0 then 1
@@ -148,7 +161,7 @@ class PDF::Reader
148
161
  data = @io.read(1024)
149
162
 
150
163
  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
151
- # To ensure we find the xref offset correctly, change all possible options to a
164
+ # To ensure we find the xref offset correctly, change all possible options to a
152
165
  # standard format
153
166
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
154
167
  lines = data.split(/\n/).reverse
@@ -69,14 +69,12 @@ class PDF::Reader
69
69
  start_code = "0x#{start_code}".hex
70
70
  end_code = "0x#{end_code}".hex
71
71
  dst = "0x#{dst}".hex
72
- incr = 0
73
72
 
74
73
  # add all values in the range to our mapping
75
- (start_code..end_code).each do |val|
76
- @map[val] = dst + incr
77
- incr += 1
74
+ (start_code..end_code).each_with_index do |val, idx|
75
+ @map[val] = dst + idx
78
76
  # ensure a single range does not exceed 255 chars
79
- raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if incr > 255
77
+ raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if idx > 255
80
78
  end
81
79
  end
82
80
  end
@@ -293,7 +293,7 @@ class PDF::Reader
293
293
  if page[:Type] == :Pages
294
294
  callback(:begin_page_container, [page])
295
295
  walk_resources(@xref.object(res)) if res
296
- page[:Kids].each {|child| walk_pages(@xref.object(child))}
296
+ @xref.object(page[:Kids]).each {|child| walk_pages(@xref.object(child))}
297
297
  callback(:end_page_container)
298
298
  elsif page[:Type] == :Page
299
299
  callback(:begin_page, [page])
@@ -41,33 +41,33 @@ class PDF::Reader
41
41
  end
42
42
 
43
43
  case enc
44
- when nil then
44
+ when nil then
45
45
  load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
46
46
  @unpack = "C*"
47
- when "Identity-H".to_sym then
47
+ when "Identity-H".to_sym then
48
48
  @unpack = "n*"
49
49
  @to_unicode_required = true
50
- when :MacRomanEncoding then
50
+ when :MacRomanEncoding then
51
51
  load_mapping File.dirname(__FILE__) + "/encodings/mac_roman.txt"
52
52
  @unpack = "C*"
53
- when :MacExpertEncoding then
53
+ when :MacExpertEncoding then
54
54
  load_mapping File.dirname(__FILE__) + "/encodings/mac_expert.txt"
55
55
  @unpack = "C*"
56
- when :PDFDocEncoding then
56
+ when :PDFDocEncoding then
57
57
  load_mapping File.dirname(__FILE__) + "/encodings/pdf_doc.txt"
58
58
  @unpack = "C*"
59
- when :StandardEncoding then
59
+ when :StandardEncoding then
60
60
  load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
61
61
  @unpack = "C*"
62
- when :SymbolEncoding then
62
+ when :SymbolEncoding then
63
63
  load_mapping File.dirname(__FILE__) + "/encodings/symbol.txt"
64
64
  @unpack = "C*"
65
- when :UTF16Encoding then
65
+ when :UTF16Encoding then
66
66
  @unpack = "n*"
67
- when :WinAnsiEncoding then
67
+ when :WinAnsiEncoding then
68
68
  load_mapping File.dirname(__FILE__) + "/encodings/win_ansi.txt"
69
69
  @unpack = "C*"
70
- when :ZapfDingbatsEncoding then
70
+ when :ZapfDingbatsEncoding then
71
71
  load_mapping File.dirname(__FILE__) + "/encodings/zapf_dingbats.txt"
72
72
  @unpack = "C*"
73
73
  else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -46,7 +46,7 @@ class PDF::Reader
46
46
  def output_parent (obj)
47
47
  case obj
48
48
  when Hash
49
- obj.each do |k,v|
49
+ obj.each do |k,v|
50
50
  print "#{k}"; output_child(v); print "\n"
51
51
  Explore::const_set(k, k) if !Explore.const_defined?(k)
52
52
  end
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -34,17 +34,30 @@ class PDF::Reader
34
34
  # in the future.
35
35
  class Filter
36
36
  ################################################################################
37
- # creates a new filter for decoding content
38
- def initialize (name, options)
37
+ # creates a new filter for decoding content.
38
+ #
39
+ # Filters that are only used to encode image data are accepted, but the data is
40
+ # returned untouched. At this stage PDF::Reader has no need to decode images.
41
+ #
42
+ def initialize (name, options = nil)
39
43
  @options = options
40
44
 
41
45
  case name.to_sym
46
+ when :ASCII85Decode then @filter = :ascii85
47
+ when :ASCIIHexDecode then @filter = :asciihex
48
+ when :CCITTFaxDecode then @filter = nil
49
+ when :DCTDecode then @filter = nil
42
50
  when :FlateDecode then @filter = :flate
43
- #else raise UnsupportedFeatureError, "Unknown filter: #{name}"
51
+ when :JBIG2Decode then @filter = nil
52
+ else raise UnsupportedFeatureError, "Unknown filter: #{name}"
44
53
  end
45
54
  end
46
55
  ################################################################################
47
56
  # attempts to decode the specified data with the current filter
57
+ #
58
+ # Filters that are only used to encode image data are accepted, but the data is
59
+ # returned untouched. At this stage PDF::Reader has no need to decode images.
60
+ #
48
61
  def filter (data)
49
62
  # leave the data untouched if we don't support the required filter
50
63
  return data if @filter.nil?
@@ -53,6 +66,30 @@ class PDF::Reader
53
66
  self.send(@filter, data)
54
67
  end
55
68
  ################################################################################
69
+ # Decode the specified data using the Ascii85 algorithm. Relies on the AScii85
70
+ # rubygem.
71
+ #
72
+ def ascii85(data)
73
+ data = "<~#{data}" unless data.to_s[0,2] == "<~"
74
+ Ascii85::decode(data)
75
+ rescue Exception => e
76
+ # Oops, there was a problem decoding the stream
77
+ raise MalformedPDFError, "Error occured while decoding an ASCII85 stream (#{e.class.to_s}: #{e.to_s})"
78
+ end
79
+ ################################################################################
80
+ # Decode the specified data using the AsciiHex algorithm.
81
+ #
82
+ def asciihex(data)
83
+ data.chop! if data[-1,1] == ">"
84
+ data = data[1,data.size] if data[0,1] == "<"
85
+ data.gsub!(/[^A-Fa-f0-9]/,"")
86
+ data << "0" if data.size % 2 == 1
87
+ data.scan(/.{2}/).map { |s| s.hex.chr }.join("")
88
+ rescue Exception => e
89
+ # Oops, there was a problem decoding the stream
90
+ raise MalformedPDFError, "Error occured while decoding an ASCIIHex stream (#{e.class.to_s}: #{e.to_s})"
91
+ end
92
+ ################################################################################
56
93
  # Decode the specified data with the Zlib compression algorithm
57
94
  def flate (data)
58
95
  begin
@@ -63,7 +100,7 @@ class PDF::Reader
63
100
  # If that fails, then use an undocumented 'feature' to attempt to inflate
64
101
  # the data as a raw RFC1951 stream.
65
102
  #
66
- # See
103
+ # See
67
104
  # - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/243545
68
105
  # - http://www.gzip.org/zlib/zlib_faq.html#faq38
69
106
  Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(data)
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -28,12 +28,12 @@ class PDF::Reader
28
28
  attr_accessor :label, :subtype, :encoding, :descendantfonts, :tounicode
29
29
  attr_reader :basefont
30
30
 
31
- # returns a hash that maps glyph names to unicode codepoints. The mapping is based on
31
+ # returns a hash that maps glyph names to unicode codepoints. The mapping is based on
32
32
  # a text file supplied by Adobe at:
33
33
  # http://www.adobe.com/devnet/opentype/archives/glyphlist.txt
34
34
  def self.glyphnames
35
35
  @@glyphs ||= {}
36
-
36
+
37
37
  if @@glyphs.empty?
38
38
  RUBY_VERSION >= "1.9" ? mode = "r:BINARY" : mode = "r"
39
39
  File.open(File.dirname(__FILE__) + "/glyphlist.txt",mode) do |f|
@@ -51,9 +51,9 @@ class PDF::Reader
51
51
  # setup a default encoding for the selected font. It can always be overridden
52
52
  # with encoding= if required
53
53
  case font
54
- when "Symbol" then
54
+ when "Symbol" then
55
55
  self.encoding = PDF::Reader::Encoding.new("SymbolEncoding")
56
- when "ZapfDingbats" then
56
+ when "ZapfDingbats" then
57
57
  self.encoding = PDF::Reader::Encoding.new("ZapfDingbatsEncoding")
58
58
  end
59
59
  @basefont = font
@@ -64,7 +64,7 @@ class PDF::Reader
64
64
 
65
65
  if params.class == String
66
66
  # translate the bytestram into a UTF-8 string.
67
- # If an encoding hasn't been specified, assume the text using this
67
+ # If an encoding hasn't been specified, assume the text using this
68
68
  # font is in Adobe Standard Encoding.
69
69
  (encoding || PDF::Reader::Encoding.new(:StandardEncoding)).to_utf8(params, tounicode)
70
70
  elsif params.class == Array
@@ -61,7 +61,8 @@ class PDF::Reader
61
61
  when ">>", "]", ">" then return Token.new(token)
62
62
  else
63
63
  if operators.has_key?(token) then return Token.new(token)
64
- else return token.to_f
64
+ elsif token =~ /\d*\.\d/ then return token.to_f
65
+ else return token.to_i
65
66
  end
66
67
  end
67
68
  end
@@ -99,7 +100,7 @@ class PDF::Reader
99
100
  # Reads a PDF hex string from the buffer and converts it to a Ruby String
100
101
  def hex_string
101
102
  str = ""
102
-
103
+
103
104
  loop do
104
105
  token = @buffer.token
105
106
  break if token == ">"
@@ -122,14 +123,15 @@ class PDF::Reader
122
123
  # find the first occurance of ( ) [ \ or ]
123
124
  #
124
125
  # I originally just used the regexp form of index(), but it seems to be
125
- # buggy on some OSX systems (returns nil when there is a match). The
126
- # block form of index() is more reliable, but only works on 1.8.7 or
127
- # greater.
126
+ # buggy on some OSX systems (returns nil when there is a match). This
127
+ # version is more reliable and was suggested by Andrès Koetsier.
128
128
  #
129
- if RUBY_VERSION >= "1.8.7"
130
- i = @buffer.raw.unpack("C*").index { |n| [40, 41, 91, 92, 93].include?(n) }
131
- else
132
- i = @buffer.raw.index(/[\\\(\)]/)
129
+ i = nil
130
+ @buffer.raw.unpack("C*").each_with_index do |charint, idx|
131
+ if [40, 41, 92].include?(charint)
132
+ i = idx
133
+ break
134
+ end
133
135
  end
134
136
 
135
137
  if i.nil?
@@ -201,7 +203,7 @@ class PDF::Reader
201
203
  def stream (dict)
202
204
  raise MalformedPDFError, "PDF malformed, missing stream length" unless dict.has_key?(:Length)
203
205
  data = @buffer.read(@xref.object(dict[:Length]))
204
-
206
+
205
207
  Error.str_assert(parse_token, "endstream")
206
208
  Error.str_assert(parse_token, "endobj")
207
209
 
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -25,7 +25,7 @@
25
25
 
26
26
  class PDF::Reader
27
27
  ################################################################################
28
- # An internal PDF::Reader class that represents an indirect reference to a PDF Object
28
+ # An internal PDF::Reader class that represents an indirect reference to a PDF Object
29
29
  class Reference
30
30
  ################################################################################
31
31
  # check if the next token in the buffer is a reference, and return a PDF::Reader::Reference
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -72,7 +72,7 @@ class PDF::Reader
72
72
  # If the object is a stream, that is returned as well
73
73
  def object (ref, save_pos = true)
74
74
  return ref unless ref.kind_of?(Reference)
75
- pos = @buffer.pos if save_pos
75
+ pos = @buffer.pos_without_buf if save_pos
76
76
  obj = Parser.new(@buffer.seek(offset_for(ref)), self).object(ref.id, ref.gen)
77
77
  @buffer.seek(pos) if save_pos
78
78
  return obj
@@ -132,7 +132,7 @@ class PDF::Reader
132
132
  # ref - a PDF::Reader::Reference object containing an object ID and revision number
133
133
  def offset_for (ref)
134
134
  @xref[ref.id][ref.gen]
135
- rescue
135
+ rescue
136
136
  raise InvalidObjectError, "Object #{ref.id}, Generation #{ref.gen} is invalid"
137
137
  end
138
138
  ################################################################################
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdf-reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.5
4
+ version: 0.7.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Jones
@@ -9,10 +9,19 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-08-27 00:00:00 +10:00
12
+ date: 2009-08-28 00:00:00 +10:00
13
13
  default_executable:
14
- dependencies: []
15
-
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: Ascii85
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: "0.9"
24
+ version:
16
25
  description: The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe
17
26
  email: pjones@pmade.com
18
27
  executables:
@@ -25,10 +34,9 @@ extra_rdoc_files:
25
34
  - README.rdoc
26
35
  - TODO
27
36
  - CHANGELOG
37
+ - MIT-LICENSE
28
38
  files:
29
- - lib/pdf
30
39
  - lib/pdf/reader.rb
31
- - lib/pdf/reader
32
40
  - lib/pdf/reader/buffer.rb
33
41
  - lib/pdf/reader/cmap.rb
34
42
  - lib/pdf/reader/content.rb
@@ -44,7 +52,6 @@ files:
44
52
  - lib/pdf/reader/register_receiver.rb
45
53
  - lib/pdf/reader/text_receiver.rb
46
54
  - lib/pdf/reader/token.rb
47
- - lib/pdf/reader/encodings
48
55
  - lib/pdf/reader/encodings/mac_expert.txt
49
56
  - lib/pdf/reader/encodings/mac_roman.txt
50
57
  - lib/pdf/reader/encodings/pdf_doc.txt
@@ -58,8 +65,11 @@ files:
58
65
  - README.rdoc
59
66
  - TODO
60
67
  - CHANGELOG
68
+ - MIT-LICENSE
61
69
  has_rdoc: true
62
70
  homepage: http://software.pmade.com/pdfreader
71
+ licenses: []
72
+
63
73
  post_install_message:
64
74
  rdoc_options:
65
75
  - --title
@@ -84,9 +94,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
84
94
  requirements: []
85
95
 
86
96
  rubyforge_project: pdf-reader
87
- rubygems_version: 1.2.0
97
+ rubygems_version: 1.3.4
88
98
  signing_key:
89
- specification_version: 2
99
+ specification_version: 3
90
100
  summary: A library for accessing the content of PDF files
91
101
  test_files: []
92
102