pdf-reader 0.7.5 → 0.7.6
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +11 -0
- data/MIT-LICENSE +21 -0
- data/README.rdoc +29 -9
- data/Rakefile +4 -4
- data/bin/pdf_list_callbacks +2 -0
- data/bin/pdf_object +2 -0
- data/bin/pdf_text +1 -0
- data/lib/pdf/reader.rb +1 -0
- data/lib/pdf/reader/buffer.rb +22 -9
- data/lib/pdf/reader/cmap.rb +3 -5
- data/lib/pdf/reader/content.rb +1 -1
- data/lib/pdf/reader/encoding.rb +10 -10
- data/lib/pdf/reader/error.rb +2 -2
- data/lib/pdf/reader/explore.rb +3 -3
- data/lib/pdf/reader/filter.rb +43 -6
- data/lib/pdf/reader/font.rb +7 -7
- data/lib/pdf/reader/parser.rb +12 -10
- data/lib/pdf/reader/reference.rb +3 -3
- data/lib/pdf/reader/stream.rb +2 -2
- data/lib/pdf/reader/token.rb +2 -2
- data/lib/pdf/reader/xref.rb +2 -2
- metadata +19 -9
data/CHANGELOG
CHANGED
@@ -1,3 +1,14 @@
|
|
1
|
+
v0.7.6 (28th August 2009)
|
2
|
+
- Various bug fixes that increase the files we can successfully parse
|
3
|
+
- Treat float and integer tokens differently (thanks Neil)
|
4
|
+
- Correctly handle PDFs where the Kids element of a Pages dict is an indirect
|
5
|
+
reference (thanks Rob Holland)
|
6
|
+
- Fix conversion of PDF strings to Ruby strings on 1.8.6 (thanks Andrès Koetsier)
|
7
|
+
- Fix decoding with ASCII85 and ASCIIHex filters (thanks Andrès Koetsier)
|
8
|
+
- Fix extracting inline images from content streams (thanks Andrès Koetsier)
|
9
|
+
- Fix extracting [ ] from content streams (thanks Christian Rishøj)
|
10
|
+
- Fix conversion of text to UTF8 when the cmap uses bfrange (thanks Federico Gonzalez Lutteroth)
|
11
|
+
|
1
12
|
v0.7.5 (27th August 2008)
|
2
13
|
- Fix a 1.8.7ism
|
3
14
|
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
Copyright (c) 2009 Peter Jones
|
2
|
+
Copyright (c) 2009 James Healy
|
3
|
+
|
4
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
5
|
+
a copy of this software and associated documentation files (the
|
6
|
+
"Software"), to deal in the Software without restriction, including
|
7
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
8
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
9
|
+
permit persons to whom the Software is furnished to do so, subject to
|
10
|
+
the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be
|
13
|
+
included in all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
16
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
17
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
18
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
19
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
20
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
21
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
CHANGED
@@ -5,9 +5,22 @@ It provides programmatic access to the contents of a PDF file with a high
|
|
5
5
|
degree of flexibility.
|
6
6
|
|
7
7
|
The PDF 1.7 specification is a weighty document and not all aspects are
|
8
|
-
currently supported.
|
8
|
+
currently supported. I welcome submission of PDF files that exhibit
|
9
9
|
unsupported aspects of the spec to assist with improving out support.
|
10
10
|
|
11
|
+
= Development Status
|
12
|
+
|
13
|
+
I adopted this library in 2007 when I was learning the fundamentals of the PDF
|
14
|
+
spec. I do not currently use it in my day to day work and I just don't have the
|
15
|
+
spare time to dedicate to adding new features.
|
16
|
+
|
17
|
+
The code as it is works fairly well, and I offer it "as is". All patches, bug
|
18
|
+
reports and sample PDFs are welcome - I will work on them when I can. If anyone
|
19
|
+
is interested in adding features to PDF::Reader in their own effort to learn
|
20
|
+
the PDF file format, I'll happy offer help qand support.
|
21
|
+
|
22
|
+
I STRONGLY RECOMMEND NOT USING PDF::READER FOR YOUR PRODUCTION CODE.
|
23
|
+
|
11
24
|
= Installation
|
12
25
|
|
13
26
|
The recommended installation method is via Rubygems.
|
@@ -42,7 +55,8 @@ PDF file:
|
|
42
55
|
|
43
56
|
MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
|
44
57
|
file should be valid, or that a corrupt file didn't raise an exception, please
|
45
|
-
forward a copy of the file to the maintainers
|
58
|
+
forward a copy of the file to the maintainers (preferably via the google group)
|
59
|
+
and we can attempt to improve the code.
|
46
60
|
|
47
61
|
UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
|
48
62
|
support. Again, we welcome submissions of PDF files that exhibit these features to help
|
@@ -56,12 +70,18 @@ report it!) or your receiver (please don't report it!).
|
|
56
70
|
|
57
71
|
= Maintainers
|
58
72
|
|
59
|
-
- Peter Jones <mailto:pjones@pmade.com>
|
60
73
|
- James Healy <mailto:jimmy@deefa.com>
|
61
74
|
|
75
|
+
= Licensing
|
76
|
+
|
77
|
+
This library is distributed under the terms of the MIT License. See the included file for
|
78
|
+
more detail.
|
79
|
+
|
62
80
|
= Mailing List
|
63
81
|
|
64
|
-
Any questions or feedback should be sent to the PDF::Reader google group.
|
82
|
+
Any questions or feedback should be sent to the PDF::Reader google group. It's
|
83
|
+
better that any answers be available for others instead of hiding in someone's
|
84
|
+
inbox.
|
65
85
|
|
66
86
|
http://groups.google.com/group/pdf-reader
|
67
87
|
|
@@ -77,21 +97,21 @@ A simple app to count the number of pages in a PDF File.
|
|
77
97
|
require 'pdf/reader'
|
78
98
|
|
79
99
|
class PageReceiver
|
80
|
-
attr_accessor :
|
100
|
+
attr_accessor :counter
|
81
101
|
|
82
102
|
def initialize
|
83
|
-
@
|
103
|
+
@counter = 0
|
84
104
|
end
|
85
105
|
|
86
106
|
# Called when page parsing ends
|
87
107
|
def end_page
|
88
|
-
@
|
108
|
+
@counter += 1
|
89
109
|
end
|
90
110
|
end
|
91
111
|
|
92
112
|
receiver = PageReceiver.new
|
93
113
|
pdf = PDF::Reader.file("somefile.pdf", receiver)
|
94
|
-
puts "#{receiver.
|
114
|
+
puts "#{receiver.counter} pages"
|
95
115
|
|
96
116
|
== List all callbacks generated by a single PDF
|
97
117
|
|
@@ -242,7 +262,7 @@ A simple app to display the number of pages in a PDF File.
|
|
242
262
|
|
243
263
|
= Known Limitations
|
244
264
|
|
245
|
-
The order of the callbacks is
|
265
|
+
The order of the callbacks is unpredictable, and is dependent on the internal
|
246
266
|
layout of the file, not the order objects are displayed to the user. As a
|
247
267
|
consequence of this it is highly unlikely that text will be completely in
|
248
268
|
order.
|
data/Rakefile
CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
|
|
6
6
|
require "rake/gempackagetask"
|
7
7
|
require 'spec/rake/spectask'
|
8
8
|
|
9
|
-
PKG_VERSION = "0.7.
|
9
|
+
PKG_VERSION = "0.7.6"
|
10
10
|
PKG_NAME = "pdf-reader"
|
11
11
|
PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
|
12
12
|
|
@@ -47,8 +47,7 @@ Rake::RDocTask.new("doc") do |rdoc|
|
|
47
47
|
rdoc.rdoc_files.include('README.rdoc')
|
48
48
|
rdoc.rdoc_files.include('TODO')
|
49
49
|
rdoc.rdoc_files.include('CHANGELOG')
|
50
|
-
|
51
|
-
#rdoc.rdoc_files.include('LICENSE')
|
50
|
+
rdoc.rdoc_files.include('MIT-LICENSE')
|
52
51
|
rdoc.rdoc_files.include('lib/**/*.rb')
|
53
52
|
rdoc.options << "--inline-source"
|
54
53
|
end
|
@@ -70,7 +69,7 @@ spec = Gem::Specification.new do |spec|
|
|
70
69
|
spec.executables << "pdf_text"
|
71
70
|
spec.executables << "pdf_list_callbacks"
|
72
71
|
spec.has_rdoc = true
|
73
|
-
spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG}
|
72
|
+
spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG MIT-LICENSE }
|
74
73
|
spec.rdoc_options << '--title' << 'PDF::Reader Documentation' <<
|
75
74
|
'--main' << 'README.rdoc' << '-q'
|
76
75
|
spec.author = "Peter Jones"
|
@@ -78,6 +77,7 @@ spec = Gem::Specification.new do |spec|
|
|
78
77
|
spec.rubyforge_project = "pdf-reader"
|
79
78
|
spec.homepage = "http://software.pmade.com/pdfreader"
|
80
79
|
spec.description = "The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe"
|
80
|
+
spec.add_dependency('Ascii85', '>=0.9')
|
81
81
|
end
|
82
82
|
|
83
83
|
# package the library into a gem
|
data/bin/pdf_list_callbacks
CHANGED
data/bin/pdf_object
CHANGED
data/bin/pdf_text
CHANGED
data/lib/pdf/reader.rb
CHANGED
data/lib/pdf/reader/buffer.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -41,7 +41,7 @@ class PDF::Reader
|
|
41
41
|
self
|
42
42
|
end
|
43
43
|
################################################################################
|
44
|
-
# reads the requested number of bytes from the underlying IO stream.
|
44
|
+
# reads the requested number of bytes from the underlying IO stream.
|
45
45
|
#
|
46
46
|
# length should be a positive integer.
|
47
47
|
def read (length)
|
@@ -56,13 +56,22 @@ class PDF::Reader
|
|
56
56
|
out
|
57
57
|
end
|
58
58
|
################################################################################
|
59
|
-
# Reads from the buffer until the specified token is found, or the end of the buffer
|
59
|
+
# Reads from the buffer until the specified token is found, or the end of the buffer
|
60
60
|
#
|
61
61
|
# bytes - the bytes to search for.
|
62
62
|
def read_until(bytes)
|
63
63
|
out = ""
|
64
64
|
size = bytes.size
|
65
|
-
|
65
|
+
|
66
|
+
if @buffer && !@buffer.empty?
|
67
|
+
if @buffer.include?(bytes)
|
68
|
+
offset = @buffer.index(bytes) + size
|
69
|
+
return head(offset)
|
70
|
+
else
|
71
|
+
out << head(@buffer.size)
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
66
75
|
loop do
|
67
76
|
out << @io.read(1)
|
68
77
|
if out[-1 * size,size].eql?(bytes)
|
@@ -74,7 +83,7 @@ class PDF::Reader
|
|
74
83
|
out
|
75
84
|
end
|
76
85
|
################################################################################
|
77
|
-
# returns true if the underlying IO object is at end and the internal buffer
|
86
|
+
# returns true if the underlying IO object is at end and the internal buffer
|
78
87
|
# is empty
|
79
88
|
def eof?
|
80
89
|
ready_token
|
@@ -89,6 +98,10 @@ class PDF::Reader
|
|
89
98
|
@io.pos
|
90
99
|
end
|
91
100
|
################################################################################
|
101
|
+
def pos_without_buf
|
102
|
+
@io.pos - @buffer.to_s.size
|
103
|
+
end
|
104
|
+
################################################################################
|
92
105
|
# PDF files are processed by tokenising the content into a series of objects and commands.
|
93
106
|
# This prepares the buffer for use by reading the next line of tokens into memory.
|
94
107
|
def ready_token (with_strip=true, skip_blanks=true)
|
@@ -105,10 +118,10 @@ class PDF::Reader
|
|
105
118
|
# return the next token from the underlying IO stream
|
106
119
|
def token
|
107
120
|
ready_token
|
108
|
-
|
121
|
+
|
109
122
|
i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size
|
110
123
|
|
111
|
-
token_chars =
|
124
|
+
token_chars =
|
112
125
|
if i == 0 and @buffer[i,2] == "<<" then 2
|
113
126
|
elsif i == 0 and @buffer[i,2] == ">>" then 2
|
114
127
|
elsif i == 0 then 1
|
@@ -148,7 +161,7 @@ class PDF::Reader
|
|
148
161
|
data = @io.read(1024)
|
149
162
|
|
150
163
|
# the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
|
151
|
-
# To ensure we find the xref offset correctly, change all possible options to a
|
164
|
+
# To ensure we find the xref offset correctly, change all possible options to a
|
152
165
|
# standard format
|
153
166
|
data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
|
154
167
|
lines = data.split(/\n/).reverse
|
data/lib/pdf/reader/cmap.rb
CHANGED
@@ -69,14 +69,12 @@ class PDF::Reader
|
|
69
69
|
start_code = "0x#{start_code}".hex
|
70
70
|
end_code = "0x#{end_code}".hex
|
71
71
|
dst = "0x#{dst}".hex
|
72
|
-
incr = 0
|
73
72
|
|
74
73
|
# add all values in the range to our mapping
|
75
|
-
(start_code..end_code).
|
76
|
-
@map[val] = dst +
|
77
|
-
incr += 1
|
74
|
+
(start_code..end_code).each_with_index do |val, idx|
|
75
|
+
@map[val] = dst + idx
|
78
76
|
# ensure a single range does not exceed 255 chars
|
79
|
-
raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if
|
77
|
+
raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if idx > 255
|
80
78
|
end
|
81
79
|
end
|
82
80
|
end
|
data/lib/pdf/reader/content.rb
CHANGED
@@ -293,7 +293,7 @@ class PDF::Reader
|
|
293
293
|
if page[:Type] == :Pages
|
294
294
|
callback(:begin_page_container, [page])
|
295
295
|
walk_resources(@xref.object(res)) if res
|
296
|
-
page[:Kids].each {|child| walk_pages(@xref.object(child))}
|
296
|
+
@xref.object(page[:Kids]).each {|child| walk_pages(@xref.object(child))}
|
297
297
|
callback(:end_page_container)
|
298
298
|
elsif page[:Type] == :Page
|
299
299
|
callback(:begin_page, [page])
|
data/lib/pdf/reader/encoding.rb
CHANGED
@@ -41,33 +41,33 @@ class PDF::Reader
|
|
41
41
|
end
|
42
42
|
|
43
43
|
case enc
|
44
|
-
when nil then
|
44
|
+
when nil then
|
45
45
|
load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
|
46
46
|
@unpack = "C*"
|
47
|
-
when "Identity-H".to_sym then
|
47
|
+
when "Identity-H".to_sym then
|
48
48
|
@unpack = "n*"
|
49
49
|
@to_unicode_required = true
|
50
|
-
when :MacRomanEncoding then
|
50
|
+
when :MacRomanEncoding then
|
51
51
|
load_mapping File.dirname(__FILE__) + "/encodings/mac_roman.txt"
|
52
52
|
@unpack = "C*"
|
53
|
-
when :MacExpertEncoding then
|
53
|
+
when :MacExpertEncoding then
|
54
54
|
load_mapping File.dirname(__FILE__) + "/encodings/mac_expert.txt"
|
55
55
|
@unpack = "C*"
|
56
|
-
when :PDFDocEncoding then
|
56
|
+
when :PDFDocEncoding then
|
57
57
|
load_mapping File.dirname(__FILE__) + "/encodings/pdf_doc.txt"
|
58
58
|
@unpack = "C*"
|
59
|
-
when :StandardEncoding then
|
59
|
+
when :StandardEncoding then
|
60
60
|
load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
|
61
61
|
@unpack = "C*"
|
62
|
-
when :SymbolEncoding then
|
62
|
+
when :SymbolEncoding then
|
63
63
|
load_mapping File.dirname(__FILE__) + "/encodings/symbol.txt"
|
64
64
|
@unpack = "C*"
|
65
|
-
when :UTF16Encoding then
|
65
|
+
when :UTF16Encoding then
|
66
66
|
@unpack = "n*"
|
67
|
-
when :WinAnsiEncoding then
|
67
|
+
when :WinAnsiEncoding then
|
68
68
|
load_mapping File.dirname(__FILE__) + "/encodings/win_ansi.txt"
|
69
69
|
@unpack = "C*"
|
70
|
-
when :ZapfDingbatsEncoding then
|
70
|
+
when :ZapfDingbatsEncoding then
|
71
71
|
load_mapping File.dirname(__FILE__) + "/encodings/zapf_dingbats.txt"
|
72
72
|
@unpack = "C*"
|
73
73
|
else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
|
data/lib/pdf/reader/error.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/explore.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -46,7 +46,7 @@ class PDF::Reader
|
|
46
46
|
def output_parent (obj)
|
47
47
|
case obj
|
48
48
|
when Hash
|
49
|
-
obj.each do |k,v|
|
49
|
+
obj.each do |k,v|
|
50
50
|
print "#{k}"; output_child(v); print "\n"
|
51
51
|
Explore::const_set(k, k) if !Explore.const_defined?(k)
|
52
52
|
end
|
data/lib/pdf/reader/filter.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -34,17 +34,30 @@ class PDF::Reader
|
|
34
34
|
# in the future.
|
35
35
|
class Filter
|
36
36
|
################################################################################
|
37
|
-
# creates a new filter for decoding content
|
38
|
-
|
37
|
+
# creates a new filter for decoding content.
|
38
|
+
#
|
39
|
+
# Filters that are only used to encode image data are accepted, but the data is
|
40
|
+
# returned untouched. At this stage PDF::Reader has no need to decode images.
|
41
|
+
#
|
42
|
+
def initialize (name, options = nil)
|
39
43
|
@options = options
|
40
44
|
|
41
45
|
case name.to_sym
|
46
|
+
when :ASCII85Decode then @filter = :ascii85
|
47
|
+
when :ASCIIHexDecode then @filter = :asciihex
|
48
|
+
when :CCITTFaxDecode then @filter = nil
|
49
|
+
when :DCTDecode then @filter = nil
|
42
50
|
when :FlateDecode then @filter = :flate
|
43
|
-
|
51
|
+
when :JBIG2Decode then @filter = nil
|
52
|
+
else raise UnsupportedFeatureError, "Unknown filter: #{name}"
|
44
53
|
end
|
45
54
|
end
|
46
55
|
################################################################################
|
47
56
|
# attempts to decode the specified data with the current filter
|
57
|
+
#
|
58
|
+
# Filters that are only used to encode image data are accepted, but the data is
|
59
|
+
# returned untouched. At this stage PDF::Reader has no need to decode images.
|
60
|
+
#
|
48
61
|
def filter (data)
|
49
62
|
# leave the data untouched if we don't support the required filter
|
50
63
|
return data if @filter.nil?
|
@@ -53,6 +66,30 @@ class PDF::Reader
|
|
53
66
|
self.send(@filter, data)
|
54
67
|
end
|
55
68
|
################################################################################
|
69
|
+
# Decode the specified data using the Ascii85 algorithm. Relies on the AScii85
|
70
|
+
# rubygem.
|
71
|
+
#
|
72
|
+
def ascii85(data)
|
73
|
+
data = "<~#{data}" unless data.to_s[0,2] == "<~"
|
74
|
+
Ascii85::decode(data)
|
75
|
+
rescue Exception => e
|
76
|
+
# Oops, there was a problem decoding the stream
|
77
|
+
raise MalformedPDFError, "Error occured while decoding an ASCII85 stream (#{e.class.to_s}: #{e.to_s})"
|
78
|
+
end
|
79
|
+
################################################################################
|
80
|
+
# Decode the specified data using the AsciiHex algorithm.
|
81
|
+
#
|
82
|
+
def asciihex(data)
|
83
|
+
data.chop! if data[-1,1] == ">"
|
84
|
+
data = data[1,data.size] if data[0,1] == "<"
|
85
|
+
data.gsub!(/[^A-Fa-f0-9]/,"")
|
86
|
+
data << "0" if data.size % 2 == 1
|
87
|
+
data.scan(/.{2}/).map { |s| s.hex.chr }.join("")
|
88
|
+
rescue Exception => e
|
89
|
+
# Oops, there was a problem decoding the stream
|
90
|
+
raise MalformedPDFError, "Error occured while decoding an ASCIIHex stream (#{e.class.to_s}: #{e.to_s})"
|
91
|
+
end
|
92
|
+
################################################################################
|
56
93
|
# Decode the specified data with the Zlib compression algorithm
|
57
94
|
def flate (data)
|
58
95
|
begin
|
@@ -63,7 +100,7 @@ class PDF::Reader
|
|
63
100
|
# If that fails, then use an undocumented 'feature' to attempt to inflate
|
64
101
|
# the data as a raw RFC1951 stream.
|
65
102
|
#
|
66
|
-
# See
|
103
|
+
# See
|
67
104
|
# - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/243545
|
68
105
|
# - http://www.gzip.org/zlib/zlib_faq.html#faq38
|
69
106
|
Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(data)
|
data/lib/pdf/reader/font.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -28,12 +28,12 @@ class PDF::Reader
|
|
28
28
|
attr_accessor :label, :subtype, :encoding, :descendantfonts, :tounicode
|
29
29
|
attr_reader :basefont
|
30
30
|
|
31
|
-
# returns a hash that maps glyph names to unicode codepoints. The mapping is based on
|
31
|
+
# returns a hash that maps glyph names to unicode codepoints. The mapping is based on
|
32
32
|
# a text file supplied by Adobe at:
|
33
33
|
# http://www.adobe.com/devnet/opentype/archives/glyphlist.txt
|
34
34
|
def self.glyphnames
|
35
35
|
@@glyphs ||= {}
|
36
|
-
|
36
|
+
|
37
37
|
if @@glyphs.empty?
|
38
38
|
RUBY_VERSION >= "1.9" ? mode = "r:BINARY" : mode = "r"
|
39
39
|
File.open(File.dirname(__FILE__) + "/glyphlist.txt",mode) do |f|
|
@@ -51,9 +51,9 @@ class PDF::Reader
|
|
51
51
|
# setup a default encoding for the selected font. It can always be overridden
|
52
52
|
# with encoding= if required
|
53
53
|
case font
|
54
|
-
when "Symbol" then
|
54
|
+
when "Symbol" then
|
55
55
|
self.encoding = PDF::Reader::Encoding.new("SymbolEncoding")
|
56
|
-
when "ZapfDingbats" then
|
56
|
+
when "ZapfDingbats" then
|
57
57
|
self.encoding = PDF::Reader::Encoding.new("ZapfDingbatsEncoding")
|
58
58
|
end
|
59
59
|
@basefont = font
|
@@ -64,7 +64,7 @@ class PDF::Reader
|
|
64
64
|
|
65
65
|
if params.class == String
|
66
66
|
# translate the bytestram into a UTF-8 string.
|
67
|
-
# If an encoding hasn't been specified, assume the text using this
|
67
|
+
# If an encoding hasn't been specified, assume the text using this
|
68
68
|
# font is in Adobe Standard Encoding.
|
69
69
|
(encoding || PDF::Reader::Encoding.new(:StandardEncoding)).to_utf8(params, tounicode)
|
70
70
|
elsif params.class == Array
|
data/lib/pdf/reader/parser.rb
CHANGED
@@ -61,7 +61,8 @@ class PDF::Reader
|
|
61
61
|
when ">>", "]", ">" then return Token.new(token)
|
62
62
|
else
|
63
63
|
if operators.has_key?(token) then return Token.new(token)
|
64
|
-
|
64
|
+
elsif token =~ /\d*\.\d/ then return token.to_f
|
65
|
+
else return token.to_i
|
65
66
|
end
|
66
67
|
end
|
67
68
|
end
|
@@ -99,7 +100,7 @@ class PDF::Reader
|
|
99
100
|
# Reads a PDF hex string from the buffer and converts it to a Ruby String
|
100
101
|
def hex_string
|
101
102
|
str = ""
|
102
|
-
|
103
|
+
|
103
104
|
loop do
|
104
105
|
token = @buffer.token
|
105
106
|
break if token == ">"
|
@@ -122,14 +123,15 @@ class PDF::Reader
|
|
122
123
|
# find the first occurance of ( ) [ \ or ]
|
123
124
|
#
|
124
125
|
# I originally just used the regexp form of index(), but it seems to be
|
125
|
-
# buggy on some OSX systems (returns nil when there is a match).
|
126
|
-
#
|
127
|
-
# greater.
|
126
|
+
# buggy on some OSX systems (returns nil when there is a match). This
|
127
|
+
# version is more reliable and was suggested by Andrès Koetsier.
|
128
128
|
#
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
129
|
+
i = nil
|
130
|
+
@buffer.raw.unpack("C*").each_with_index do |charint, idx|
|
131
|
+
if [40, 41, 92].include?(charint)
|
132
|
+
i = idx
|
133
|
+
break
|
134
|
+
end
|
133
135
|
end
|
134
136
|
|
135
137
|
if i.nil?
|
@@ -201,7 +203,7 @@ class PDF::Reader
|
|
201
203
|
def stream (dict)
|
202
204
|
raise MalformedPDFError, "PDF malformed, missing stream length" unless dict.has_key?(:Length)
|
203
205
|
data = @buffer.read(@xref.object(dict[:Length]))
|
204
|
-
|
206
|
+
|
205
207
|
Error.str_assert(parse_token, "endstream")
|
206
208
|
Error.str_assert(parse_token, "endobj")
|
207
209
|
|
data/lib/pdf/reader/reference.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -25,7 +25,7 @@
|
|
25
25
|
|
26
26
|
class PDF::Reader
|
27
27
|
################################################################################
|
28
|
-
# An internal PDF::Reader class that represents an indirect reference to a PDF Object
|
28
|
+
# An internal PDF::Reader class that represents an indirect reference to a PDF Object
|
29
29
|
class Reference
|
30
30
|
################################################################################
|
31
31
|
# check if the next token in the buffer is a reference, and return a PDF::Reader::Reference
|
data/lib/pdf/reader/stream.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/token.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/xref.rb
CHANGED
@@ -72,7 +72,7 @@ class PDF::Reader
|
|
72
72
|
# If the object is a stream, that is returned as well
|
73
73
|
def object (ref, save_pos = true)
|
74
74
|
return ref unless ref.kind_of?(Reference)
|
75
|
-
pos = @buffer.
|
75
|
+
pos = @buffer.pos_without_buf if save_pos
|
76
76
|
obj = Parser.new(@buffer.seek(offset_for(ref)), self).object(ref.id, ref.gen)
|
77
77
|
@buffer.seek(pos) if save_pos
|
78
78
|
return obj
|
@@ -132,7 +132,7 @@ class PDF::Reader
|
|
132
132
|
# ref - a PDF::Reader::Reference object containing an object ID and revision number
|
133
133
|
def offset_for (ref)
|
134
134
|
@xref[ref.id][ref.gen]
|
135
|
-
rescue
|
135
|
+
rescue
|
136
136
|
raise InvalidObjectError, "Object #{ref.id}, Generation #{ref.gen} is invalid"
|
137
137
|
end
|
138
138
|
################################################################################
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pdf-reader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.6
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Peter Jones
|
@@ -9,10 +9,19 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date:
|
12
|
+
date: 2009-08-28 00:00:00 +10:00
|
13
13
|
default_executable:
|
14
|
-
dependencies:
|
15
|
-
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: Ascii85
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.9"
|
24
|
+
version:
|
16
25
|
description: The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe
|
17
26
|
email: pjones@pmade.com
|
18
27
|
executables:
|
@@ -25,10 +34,9 @@ extra_rdoc_files:
|
|
25
34
|
- README.rdoc
|
26
35
|
- TODO
|
27
36
|
- CHANGELOG
|
37
|
+
- MIT-LICENSE
|
28
38
|
files:
|
29
|
-
- lib/pdf
|
30
39
|
- lib/pdf/reader.rb
|
31
|
-
- lib/pdf/reader
|
32
40
|
- lib/pdf/reader/buffer.rb
|
33
41
|
- lib/pdf/reader/cmap.rb
|
34
42
|
- lib/pdf/reader/content.rb
|
@@ -44,7 +52,6 @@ files:
|
|
44
52
|
- lib/pdf/reader/register_receiver.rb
|
45
53
|
- lib/pdf/reader/text_receiver.rb
|
46
54
|
- lib/pdf/reader/token.rb
|
47
|
-
- lib/pdf/reader/encodings
|
48
55
|
- lib/pdf/reader/encodings/mac_expert.txt
|
49
56
|
- lib/pdf/reader/encodings/mac_roman.txt
|
50
57
|
- lib/pdf/reader/encodings/pdf_doc.txt
|
@@ -58,8 +65,11 @@ files:
|
|
58
65
|
- README.rdoc
|
59
66
|
- TODO
|
60
67
|
- CHANGELOG
|
68
|
+
- MIT-LICENSE
|
61
69
|
has_rdoc: true
|
62
70
|
homepage: http://software.pmade.com/pdfreader
|
71
|
+
licenses: []
|
72
|
+
|
63
73
|
post_install_message:
|
64
74
|
rdoc_options:
|
65
75
|
- --title
|
@@ -84,9 +94,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
84
94
|
requirements: []
|
85
95
|
|
86
96
|
rubyforge_project: pdf-reader
|
87
|
-
rubygems_version: 1.
|
97
|
+
rubygems_version: 1.3.4
|
88
98
|
signing_key:
|
89
|
-
specification_version:
|
99
|
+
specification_version: 3
|
90
100
|
summary: A library for accessing the content of PDF files
|
91
101
|
test_files: []
|
92
102
|
|