pdf-reader 0.7.5 → 0.7.6

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,14 @@
1
+ v0.7.6 (28th August 2009)
2
+ - Various bug fixes that increase the files we can successfully parse
3
+ - Treat float and integer tokens differently (thanks Neil)
4
+ - Correctly handle PDFs where the Kids element of a Pages dict is an indirect
5
+ reference (thanks Rob Holland)
6
+ - Fix conversion of PDF strings to Ruby strings on 1.8.6 (thanks Andrès Koetsier)
7
+ - Fix decoding with ASCII85 and ASCIIHex filters (thanks Andrès Koetsier)
8
+ - Fix extracting inline images from content streams (thanks Andrès Koetsier)
9
+ - Fix extracting [ ] from content streams (thanks Christian Rishøj)
10
+ - Fix conversion of text to UTF8 when the cmap uses bfrange (thanks Federico Gonzalez Lutteroth)
11
+
1
12
  v0.7.5 (27th August 2008)
2
13
  - Fix a 1.8.7ism
3
14
 
@@ -0,0 +1,21 @@
1
+ Copyright (c) 2009 Peter Jones
2
+ Copyright (c) 2009 James Healy
3
+
4
+ Permission is hereby granted, free of charge, to any person obtaining
5
+ a copy of this software and associated documentation files (the
6
+ "Software"), to deal in the Software without restriction, including
7
+ without limitation the rights to use, copy, modify, merge, publish,
8
+ distribute, sublicense, and/or sell copies of the Software, and to
9
+ permit persons to whom the Software is furnished to do so, subject to
10
+ the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
19
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
20
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
21
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -5,9 +5,22 @@ It provides programmatic access to the contents of a PDF file with a high
5
5
  degree of flexibility.
6
6
 
7
7
  The PDF 1.7 specification is a weighty document and not all aspects are
8
- currently supported. We welcome submission of PDF files that exhibit
8
+ currently supported. I welcome submission of PDF files that exhibit
9
9
  unsupported aspects of the spec to assist with improving out support.
10
10
 
11
+ = Development Status
12
+
13
+ I adopted this library in 2007 when I was learning the fundamentals of the PDF
14
+ spec. I do not currently use it in my day to day work and I just don't have the
15
+ spare time to dedicate to adding new features.
16
+
17
+ The code as it is works fairly well, and I offer it "as is". All patches, bug
18
+ reports and sample PDFs are welcome - I will work on them when I can. If anyone
19
+ is interested in adding features to PDF::Reader in their own effort to learn
20
+ the PDF file format, I'll happy offer help qand support.
21
+
22
+ I STRONGLY RECOMMEND NOT USING PDF::READER FOR YOUR PRODUCTION CODE.
23
+
11
24
  = Installation
12
25
 
13
26
  The recommended installation method is via Rubygems.
@@ -42,7 +55,8 @@ PDF file:
42
55
 
43
56
  MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
44
57
  file should be valid, or that a corrupt file didn't raise an exception, please
45
- forward a copy of the file to the maintainers and we can attempt improve the code.
58
+ forward a copy of the file to the maintainers (preferably via the google group)
59
+ and we can attempt to improve the code.
46
60
 
47
61
  UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
48
62
  support. Again, we welcome submissions of PDF files that exhibit these features to help
@@ -56,12 +70,18 @@ report it!) or your receiver (please don't report it!).
56
70
 
57
71
  = Maintainers
58
72
 
59
- - Peter Jones <mailto:pjones@pmade.com>
60
73
  - James Healy <mailto:jimmy@deefa.com>
61
74
 
75
+ = Licensing
76
+
77
+ This library is distributed under the terms of the MIT License. See the included file for
78
+ more detail.
79
+
62
80
  = Mailing List
63
81
 
64
- Any questions or feedback should be sent to the PDF::Reader google group.
82
+ Any questions or feedback should be sent to the PDF::Reader google group. It's
83
+ better that any answers be available for others instead of hiding in someone's
84
+ inbox.
65
85
 
66
86
  http://groups.google.com/group/pdf-reader
67
87
 
@@ -77,21 +97,21 @@ A simple app to count the number of pages in a PDF File.
77
97
  require 'pdf/reader'
78
98
 
79
99
  class PageReceiver
80
- attr_accessor :page_count
100
+ attr_accessor :counter
81
101
 
82
102
  def initialize
83
- @page_count = 0
103
+ @counter = 0
84
104
  end
85
105
 
86
106
  # Called when page parsing ends
87
107
  def end_page
88
- @page_count += 1
108
+ @counter += 1
89
109
  end
90
110
  end
91
111
 
92
112
  receiver = PageReceiver.new
93
113
  pdf = PDF::Reader.file("somefile.pdf", receiver)
94
- puts "#{receiver.page_count} pages"
114
+ puts "#{receiver.counter} pages"
95
115
 
96
116
  == List all callbacks generated by a single PDF
97
117
 
@@ -242,7 +262,7 @@ A simple app to display the number of pages in a PDF File.
242
262
 
243
263
  = Known Limitations
244
264
 
245
- The order of the callbacks is unpredicable, and is dependent on the internal
265
+ The order of the callbacks is unpredictable, and is dependent on the internal
246
266
  layout of the file, not the order objects are displayed to the user. As a
247
267
  consequence of this it is highly unlikely that text will be completely in
248
268
  order.
data/Rakefile CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
6
6
  require "rake/gempackagetask"
7
7
  require 'spec/rake/spectask'
8
8
 
9
- PKG_VERSION = "0.7.5"
9
+ PKG_VERSION = "0.7.6"
10
10
  PKG_NAME = "pdf-reader"
11
11
  PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
12
12
 
@@ -47,8 +47,7 @@ Rake::RDocTask.new("doc") do |rdoc|
47
47
  rdoc.rdoc_files.include('README.rdoc')
48
48
  rdoc.rdoc_files.include('TODO')
49
49
  rdoc.rdoc_files.include('CHANGELOG')
50
- #rdoc.rdoc_files.include('COPYING')
51
- #rdoc.rdoc_files.include('LICENSE')
50
+ rdoc.rdoc_files.include('MIT-LICENSE')
52
51
  rdoc.rdoc_files.include('lib/**/*.rb')
53
52
  rdoc.options << "--inline-source"
54
53
  end
@@ -70,7 +69,7 @@ spec = Gem::Specification.new do |spec|
70
69
  spec.executables << "pdf_text"
71
70
  spec.executables << "pdf_list_callbacks"
72
71
  spec.has_rdoc = true
73
- spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG}
72
+ spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG MIT-LICENSE }
74
73
  spec.rdoc_options << '--title' << 'PDF::Reader Documentation' <<
75
74
  '--main' << 'README.rdoc' << '-q'
76
75
  spec.author = "Peter Jones"
@@ -78,6 +77,7 @@ spec = Gem::Specification.new do |spec|
78
77
  spec.rubyforge_project = "pdf-reader"
79
78
  spec.homepage = "http://software.pmade.com/pdfreader"
80
79
  spec.description = "The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe"
80
+ spec.add_dependency('Ascii85', '>=0.9')
81
81
  end
82
82
 
83
83
  # package the library into a gem
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
4
+
3
5
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
6
 
5
7
  require 'pdf/reader'
@@ -1,5 +1,7 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
4
+
3
5
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
6
 
5
7
  USAGE = "USAGE: " + File.basename(__FILE__) + " <file> <object id> [generation]"
@@ -1,5 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ require 'rubygems'
3
4
  $LOAD_PATH.unshift(File.dirname(__FILE__) + "/../lib")
4
5
 
5
6
  require 'pdf/reader'
@@ -24,6 +24,7 @@
24
24
  ################################################################################
25
25
 
26
26
  require 'stringio'
27
+ require 'ascii85'
27
28
 
28
29
  module PDF
29
30
  ################################################################################
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -41,7 +41,7 @@ class PDF::Reader
41
41
  self
42
42
  end
43
43
  ################################################################################
44
- # reads the requested number of bytes from the underlying IO stream.
44
+ # reads the requested number of bytes from the underlying IO stream.
45
45
  #
46
46
  # length should be a positive integer.
47
47
  def read (length)
@@ -56,13 +56,22 @@ class PDF::Reader
56
56
  out
57
57
  end
58
58
  ################################################################################
59
- # Reads from the buffer until the specified token is found, or the end of the buffer
59
+ # Reads from the buffer until the specified token is found, or the end of the buffer
60
60
  #
61
61
  # bytes - the bytes to search for.
62
62
  def read_until(bytes)
63
63
  out = ""
64
64
  size = bytes.size
65
-
65
+
66
+ if @buffer && !@buffer.empty?
67
+ if @buffer.include?(bytes)
68
+ offset = @buffer.index(bytes) + size
69
+ return head(offset)
70
+ else
71
+ out << head(@buffer.size)
72
+ end
73
+ end
74
+
66
75
  loop do
67
76
  out << @io.read(1)
68
77
  if out[-1 * size,size].eql?(bytes)
@@ -74,7 +83,7 @@ class PDF::Reader
74
83
  out
75
84
  end
76
85
  ################################################################################
77
- # returns true if the underlying IO object is at end and the internal buffer
86
+ # returns true if the underlying IO object is at end and the internal buffer
78
87
  # is empty
79
88
  def eof?
80
89
  ready_token
@@ -89,6 +98,10 @@ class PDF::Reader
89
98
  @io.pos
90
99
  end
91
100
  ################################################################################
101
+ def pos_without_buf
102
+ @io.pos - @buffer.to_s.size
103
+ end
104
+ ################################################################################
92
105
  # PDF files are processed by tokenising the content into a series of objects and commands.
93
106
  # This prepares the buffer for use by reading the next line of tokens into memory.
94
107
  def ready_token (with_strip=true, skip_blanks=true)
@@ -105,10 +118,10 @@ class PDF::Reader
105
118
  # return the next token from the underlying IO stream
106
119
  def token
107
120
  ready_token
108
-
121
+
109
122
  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size
110
123
 
111
- token_chars =
124
+ token_chars =
112
125
  if i == 0 and @buffer[i,2] == "<<" then 2
113
126
  elsif i == 0 and @buffer[i,2] == ">>" then 2
114
127
  elsif i == 0 then 1
@@ -148,7 +161,7 @@ class PDF::Reader
148
161
  data = @io.read(1024)
149
162
 
150
163
  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
151
- # To ensure we find the xref offset correctly, change all possible options to a
164
+ # To ensure we find the xref offset correctly, change all possible options to a
152
165
  # standard format
153
166
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
154
167
  lines = data.split(/\n/).reverse
@@ -69,14 +69,12 @@ class PDF::Reader
69
69
  start_code = "0x#{start_code}".hex
70
70
  end_code = "0x#{end_code}".hex
71
71
  dst = "0x#{dst}".hex
72
- incr = 0
73
72
 
74
73
  # add all values in the range to our mapping
75
- (start_code..end_code).each do |val|
76
- @map[val] = dst + incr
77
- incr += 1
74
+ (start_code..end_code).each_with_index do |val, idx|
75
+ @map[val] = dst + idx
78
76
  # ensure a single range does not exceed 255 chars
79
- raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if incr > 255
77
+ raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if idx > 255
80
78
  end
81
79
  end
82
80
  end
@@ -293,7 +293,7 @@ class PDF::Reader
293
293
  if page[:Type] == :Pages
294
294
  callback(:begin_page_container, [page])
295
295
  walk_resources(@xref.object(res)) if res
296
- page[:Kids].each {|child| walk_pages(@xref.object(child))}
296
+ @xref.object(page[:Kids]).each {|child| walk_pages(@xref.object(child))}
297
297
  callback(:end_page_container)
298
298
  elsif page[:Type] == :Page
299
299
  callback(:begin_page, [page])
@@ -41,33 +41,33 @@ class PDF::Reader
41
41
  end
42
42
 
43
43
  case enc
44
- when nil then
44
+ when nil then
45
45
  load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
46
46
  @unpack = "C*"
47
- when "Identity-H".to_sym then
47
+ when "Identity-H".to_sym then
48
48
  @unpack = "n*"
49
49
  @to_unicode_required = true
50
- when :MacRomanEncoding then
50
+ when :MacRomanEncoding then
51
51
  load_mapping File.dirname(__FILE__) + "/encodings/mac_roman.txt"
52
52
  @unpack = "C*"
53
- when :MacExpertEncoding then
53
+ when :MacExpertEncoding then
54
54
  load_mapping File.dirname(__FILE__) + "/encodings/mac_expert.txt"
55
55
  @unpack = "C*"
56
- when :PDFDocEncoding then
56
+ when :PDFDocEncoding then
57
57
  load_mapping File.dirname(__FILE__) + "/encodings/pdf_doc.txt"
58
58
  @unpack = "C*"
59
- when :StandardEncoding then
59
+ when :StandardEncoding then
60
60
  load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
61
61
  @unpack = "C*"
62
- when :SymbolEncoding then
62
+ when :SymbolEncoding then
63
63
  load_mapping File.dirname(__FILE__) + "/encodings/symbol.txt"
64
64
  @unpack = "C*"
65
- when :UTF16Encoding then
65
+ when :UTF16Encoding then
66
66
  @unpack = "n*"
67
- when :WinAnsiEncoding then
67
+ when :WinAnsiEncoding then
68
68
  load_mapping File.dirname(__FILE__) + "/encodings/win_ansi.txt"
69
69
  @unpack = "C*"
70
- when :ZapfDingbatsEncoding then
70
+ when :ZapfDingbatsEncoding then
71
71
  load_mapping File.dirname(__FILE__) + "/encodings/zapf_dingbats.txt"
72
72
  @unpack = "C*"
73
73
  else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -46,7 +46,7 @@ class PDF::Reader
46
46
  def output_parent (obj)
47
47
  case obj
48
48
  when Hash
49
- obj.each do |k,v|
49
+ obj.each do |k,v|
50
50
  print "#{k}"; output_child(v); print "\n"
51
51
  Explore::const_set(k, k) if !Explore.const_defined?(k)
52
52
  end
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -34,17 +34,30 @@ class PDF::Reader
34
34
  # in the future.
35
35
  class Filter
36
36
  ################################################################################
37
- # creates a new filter for decoding content
38
- def initialize (name, options)
37
+ # creates a new filter for decoding content.
38
+ #
39
+ # Filters that are only used to encode image data are accepted, but the data is
40
+ # returned untouched. At this stage PDF::Reader has no need to decode images.
41
+ #
42
+ def initialize (name, options = nil)
39
43
  @options = options
40
44
 
41
45
  case name.to_sym
46
+ when :ASCII85Decode then @filter = :ascii85
47
+ when :ASCIIHexDecode then @filter = :asciihex
48
+ when :CCITTFaxDecode then @filter = nil
49
+ when :DCTDecode then @filter = nil
42
50
  when :FlateDecode then @filter = :flate
43
- #else raise UnsupportedFeatureError, "Unknown filter: #{name}"
51
+ when :JBIG2Decode then @filter = nil
52
+ else raise UnsupportedFeatureError, "Unknown filter: #{name}"
44
53
  end
45
54
  end
46
55
  ################################################################################
47
56
  # attempts to decode the specified data with the current filter
57
+ #
58
+ # Filters that are only used to encode image data are accepted, but the data is
59
+ # returned untouched. At this stage PDF::Reader has no need to decode images.
60
+ #
48
61
  def filter (data)
49
62
  # leave the data untouched if we don't support the required filter
50
63
  return data if @filter.nil?
@@ -53,6 +66,30 @@ class PDF::Reader
53
66
  self.send(@filter, data)
54
67
  end
55
68
  ################################################################################
69
+ # Decode the specified data using the Ascii85 algorithm. Relies on the AScii85
70
+ # rubygem.
71
+ #
72
+ def ascii85(data)
73
+ data = "<~#{data}" unless data.to_s[0,2] == "<~"
74
+ Ascii85::decode(data)
75
+ rescue Exception => e
76
+ # Oops, there was a problem decoding the stream
77
+ raise MalformedPDFError, "Error occured while decoding an ASCII85 stream (#{e.class.to_s}: #{e.to_s})"
78
+ end
79
+ ################################################################################
80
+ # Decode the specified data using the AsciiHex algorithm.
81
+ #
82
+ def asciihex(data)
83
+ data.chop! if data[-1,1] == ">"
84
+ data = data[1,data.size] if data[0,1] == "<"
85
+ data.gsub!(/[^A-Fa-f0-9]/,"")
86
+ data << "0" if data.size % 2 == 1
87
+ data.scan(/.{2}/).map { |s| s.hex.chr }.join("")
88
+ rescue Exception => e
89
+ # Oops, there was a problem decoding the stream
90
+ raise MalformedPDFError, "Error occured while decoding an ASCIIHex stream (#{e.class.to_s}: #{e.to_s})"
91
+ end
92
+ ################################################################################
56
93
  # Decode the specified data with the Zlib compression algorithm
57
94
  def flate (data)
58
95
  begin
@@ -63,7 +100,7 @@ class PDF::Reader
63
100
  # If that fails, then use an undocumented 'feature' to attempt to inflate
64
101
  # the data as a raw RFC1951 stream.
65
102
  #
66
- # See
103
+ # See
67
104
  # - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/243545
68
105
  # - http://www.gzip.org/zlib/zlib_faq.html#faq38
69
106
  Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(data)
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -28,12 +28,12 @@ class PDF::Reader
28
28
  attr_accessor :label, :subtype, :encoding, :descendantfonts, :tounicode
29
29
  attr_reader :basefont
30
30
 
31
- # returns a hash that maps glyph names to unicode codepoints. The mapping is based on
31
+ # returns a hash that maps glyph names to unicode codepoints. The mapping is based on
32
32
  # a text file supplied by Adobe at:
33
33
  # http://www.adobe.com/devnet/opentype/archives/glyphlist.txt
34
34
  def self.glyphnames
35
35
  @@glyphs ||= {}
36
-
36
+
37
37
  if @@glyphs.empty?
38
38
  RUBY_VERSION >= "1.9" ? mode = "r:BINARY" : mode = "r"
39
39
  File.open(File.dirname(__FILE__) + "/glyphlist.txt",mode) do |f|
@@ -51,9 +51,9 @@ class PDF::Reader
51
51
  # setup a default encoding for the selected font. It can always be overridden
52
52
  # with encoding= if required
53
53
  case font
54
- when "Symbol" then
54
+ when "Symbol" then
55
55
  self.encoding = PDF::Reader::Encoding.new("SymbolEncoding")
56
- when "ZapfDingbats" then
56
+ when "ZapfDingbats" then
57
57
  self.encoding = PDF::Reader::Encoding.new("ZapfDingbatsEncoding")
58
58
  end
59
59
  @basefont = font
@@ -64,7 +64,7 @@ class PDF::Reader
64
64
 
65
65
  if params.class == String
66
66
  # translate the bytestram into a UTF-8 string.
67
- # If an encoding hasn't been specified, assume the text using this
67
+ # If an encoding hasn't been specified, assume the text using this
68
68
  # font is in Adobe Standard Encoding.
69
69
  (encoding || PDF::Reader::Encoding.new(:StandardEncoding)).to_utf8(params, tounicode)
70
70
  elsif params.class == Array
@@ -61,7 +61,8 @@ class PDF::Reader
61
61
  when ">>", "]", ">" then return Token.new(token)
62
62
  else
63
63
  if operators.has_key?(token) then return Token.new(token)
64
- else return token.to_f
64
+ elsif token =~ /\d*\.\d/ then return token.to_f
65
+ else return token.to_i
65
66
  end
66
67
  end
67
68
  end
@@ -99,7 +100,7 @@ class PDF::Reader
99
100
  # Reads a PDF hex string from the buffer and converts it to a Ruby String
100
101
  def hex_string
101
102
  str = ""
102
-
103
+
103
104
  loop do
104
105
  token = @buffer.token
105
106
  break if token == ">"
@@ -122,14 +123,15 @@ class PDF::Reader
122
123
  # find the first occurance of ( ) [ \ or ]
123
124
  #
124
125
  # I originally just used the regexp form of index(), but it seems to be
125
- # buggy on some OSX systems (returns nil when there is a match). The
126
- # block form of index() is more reliable, but only works on 1.8.7 or
127
- # greater.
126
+ # buggy on some OSX systems (returns nil when there is a match). This
127
+ # version is more reliable and was suggested by Andrès Koetsier.
128
128
  #
129
- if RUBY_VERSION >= "1.8.7"
130
- i = @buffer.raw.unpack("C*").index { |n| [40, 41, 91, 92, 93].include?(n) }
131
- else
132
- i = @buffer.raw.index(/[\\\(\)]/)
129
+ i = nil
130
+ @buffer.raw.unpack("C*").each_with_index do |charint, idx|
131
+ if [40, 41, 92].include?(charint)
132
+ i = idx
133
+ break
134
+ end
133
135
  end
134
136
 
135
137
  if i.nil?
@@ -201,7 +203,7 @@ class PDF::Reader
201
203
  def stream (dict)
202
204
  raise MalformedPDFError, "PDF malformed, missing stream length" unless dict.has_key?(:Length)
203
205
  data = @buffer.read(@xref.object(dict[:Length]))
204
-
206
+
205
207
  Error.str_assert(parse_token, "endstream")
206
208
  Error.str_assert(parse_token, "endobj")
207
209
 
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -25,7 +25,7 @@
25
25
 
26
26
  class PDF::Reader
27
27
  ################################################################################
28
- # An internal PDF::Reader class that represents an indirect reference to a PDF Object
28
+ # An internal PDF::Reader class that represents an indirect reference to a PDF Object
29
29
  class Reference
30
30
  ################################################################################
31
31
  # check if the next token in the buffer is a reference, and return a PDF::Reader::Reference
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -9,10 +9,10 @@
9
9
  # distribute, sublicense, and/or sell copies of the Software, and to
10
10
  # permit persons to whom the Software is furnished to do so, subject to
11
11
  # the following conditions:
12
- #
12
+ #
13
13
  # The above copyright notice and this permission notice shall be
14
14
  # included in all copies or substantial portions of the Software.
15
- #
15
+ #
16
16
  # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
17
  # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
18
  # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
@@ -72,7 +72,7 @@ class PDF::Reader
72
72
  # If the object is a stream, that is returned as well
73
73
  def object (ref, save_pos = true)
74
74
  return ref unless ref.kind_of?(Reference)
75
- pos = @buffer.pos if save_pos
75
+ pos = @buffer.pos_without_buf if save_pos
76
76
  obj = Parser.new(@buffer.seek(offset_for(ref)), self).object(ref.id, ref.gen)
77
77
  @buffer.seek(pos) if save_pos
78
78
  return obj
@@ -132,7 +132,7 @@ class PDF::Reader
132
132
  # ref - a PDF::Reader::Reference object containing an object ID and revision number
133
133
  def offset_for (ref)
134
134
  @xref[ref.id][ref.gen]
135
- rescue
135
+ rescue
136
136
  raise InvalidObjectError, "Object #{ref.id}, Generation #{ref.gen} is invalid"
137
137
  end
138
138
  ################################################################################
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdf-reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.5
4
+ version: 0.7.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Jones
@@ -9,10 +9,19 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-08-27 00:00:00 +10:00
12
+ date: 2009-08-28 00:00:00 +10:00
13
13
  default_executable:
14
- dependencies: []
15
-
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: Ascii85
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: "0.9"
24
+ version:
16
25
  description: The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe
17
26
  email: pjones@pmade.com
18
27
  executables:
@@ -25,10 +34,9 @@ extra_rdoc_files:
25
34
  - README.rdoc
26
35
  - TODO
27
36
  - CHANGELOG
37
+ - MIT-LICENSE
28
38
  files:
29
- - lib/pdf
30
39
  - lib/pdf/reader.rb
31
- - lib/pdf/reader
32
40
  - lib/pdf/reader/buffer.rb
33
41
  - lib/pdf/reader/cmap.rb
34
42
  - lib/pdf/reader/content.rb
@@ -44,7 +52,6 @@ files:
44
52
  - lib/pdf/reader/register_receiver.rb
45
53
  - lib/pdf/reader/text_receiver.rb
46
54
  - lib/pdf/reader/token.rb
47
- - lib/pdf/reader/encodings
48
55
  - lib/pdf/reader/encodings/mac_expert.txt
49
56
  - lib/pdf/reader/encodings/mac_roman.txt
50
57
  - lib/pdf/reader/encodings/pdf_doc.txt
@@ -58,8 +65,11 @@ files:
58
65
  - README.rdoc
59
66
  - TODO
60
67
  - CHANGELOG
68
+ - MIT-LICENSE
61
69
  has_rdoc: true
62
70
  homepage: http://software.pmade.com/pdfreader
71
+ licenses: []
72
+
63
73
  post_install_message:
64
74
  rdoc_options:
65
75
  - --title
@@ -84,9 +94,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
84
94
  requirements: []
85
95
 
86
96
  rubyforge_project: pdf-reader
87
- rubygems_version: 1.2.0
97
+ rubygems_version: 1.3.4
88
98
  signing_key:
89
- specification_version: 2
99
+ specification_version: 3
90
100
  summary: A library for accessing the content of PDF files
91
101
  test_files: []
92
102