pdf-reader 0.7.5 → 0.7.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG +11 -0
- data/MIT-LICENSE +21 -0
- data/README.rdoc +29 -9
- data/Rakefile +4 -4
- data/bin/pdf_list_callbacks +2 -0
- data/bin/pdf_object +2 -0
- data/bin/pdf_text +1 -0
- data/lib/pdf/reader.rb +1 -0
- data/lib/pdf/reader/buffer.rb +22 -9
- data/lib/pdf/reader/cmap.rb +3 -5
- data/lib/pdf/reader/content.rb +1 -1
- data/lib/pdf/reader/encoding.rb +10 -10
- data/lib/pdf/reader/error.rb +2 -2
- data/lib/pdf/reader/explore.rb +3 -3
- data/lib/pdf/reader/filter.rb +43 -6
- data/lib/pdf/reader/font.rb +7 -7
- data/lib/pdf/reader/parser.rb +12 -10
- data/lib/pdf/reader/reference.rb +3 -3
- data/lib/pdf/reader/stream.rb +2 -2
- data/lib/pdf/reader/token.rb +2 -2
- data/lib/pdf/reader/xref.rb +2 -2
- metadata +19 -9
data/CHANGELOG
CHANGED
@@ -1,3 +1,14 @@
|
|
1
|
+
v0.7.6 (28th August 2009)
|
2
|
+
- Various bug fixes that increase the files we can successfully parse
|
3
|
+
- Treat float and integer tokens differently (thanks Neil)
|
4
|
+
- Correctly handle PDFs where the Kids element of a Pages dict is an indirect
|
5
|
+
reference (thanks Rob Holland)
|
6
|
+
- Fix conversion of PDF strings to Ruby strings on 1.8.6 (thanks Andrès Koetsier)
|
7
|
+
- Fix decoding with ASCII85 and ASCIIHex filters (thanks Andrès Koetsier)
|
8
|
+
- Fix extracting inline images from content streams (thanks Andrès Koetsier)
|
9
|
+
- Fix extracting [ ] from content streams (thanks Christian Rishøj)
|
10
|
+
- Fix conversion of text to UTF8 when the cmap uses bfrange (thanks Federico Gonzalez Lutteroth)
|
11
|
+
|
1
12
|
v0.7.5 (27th August 2008)
|
2
13
|
- Fix a 1.8.7ism
|
3
14
|
|
data/MIT-LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
Copyright (c) 2009 Peter Jones
|
2
|
+
Copyright (c) 2009 James Healy
|
3
|
+
|
4
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
5
|
+
a copy of this software and associated documentation files (the
|
6
|
+
"Software"), to deal in the Software without restriction, including
|
7
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
8
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
9
|
+
permit persons to whom the Software is furnished to do so, subject to
|
10
|
+
the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be
|
13
|
+
included in all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
16
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
17
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
18
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
19
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
20
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
21
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
CHANGED
@@ -5,9 +5,22 @@ It provides programmatic access to the contents of a PDF file with a high
|
|
5
5
|
degree of flexibility.
|
6
6
|
|
7
7
|
The PDF 1.7 specification is a weighty document and not all aspects are
|
8
|
-
currently supported.
|
8
|
+
currently supported. I welcome submission of PDF files that exhibit
|
9
9
|
unsupported aspects of the spec to assist with improving out support.
|
10
10
|
|
11
|
+
= Development Status
|
12
|
+
|
13
|
+
I adopted this library in 2007 when I was learning the fundamentals of the PDF
|
14
|
+
spec. I do not currently use it in my day to day work and I just don't have the
|
15
|
+
spare time to dedicate to adding new features.
|
16
|
+
|
17
|
+
The code as it is works fairly well, and I offer it "as is". All patches, bug
|
18
|
+
reports and sample PDFs are welcome - I will work on them when I can. If anyone
|
19
|
+
is interested in adding features to PDF::Reader in their own effort to learn
|
20
|
+
the PDF file format, I'll happy offer help qand support.
|
21
|
+
|
22
|
+
I STRONGLY RECOMMEND NOT USING PDF::READER FOR YOUR PRODUCTION CODE.
|
23
|
+
|
11
24
|
= Installation
|
12
25
|
|
13
26
|
The recommended installation method is via Rubygems.
|
@@ -42,7 +55,8 @@ PDF file:
|
|
42
55
|
|
43
56
|
MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
|
44
57
|
file should be valid, or that a corrupt file didn't raise an exception, please
|
45
|
-
forward a copy of the file to the maintainers
|
58
|
+
forward a copy of the file to the maintainers (preferably via the google group)
|
59
|
+
and we can attempt to improve the code.
|
46
60
|
|
47
61
|
UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
|
48
62
|
support. Again, we welcome submissions of PDF files that exhibit these features to help
|
@@ -56,12 +70,18 @@ report it!) or your receiver (please don't report it!).
|
|
56
70
|
|
57
71
|
= Maintainers
|
58
72
|
|
59
|
-
- Peter Jones <mailto:pjones@pmade.com>
|
60
73
|
- James Healy <mailto:jimmy@deefa.com>
|
61
74
|
|
75
|
+
= Licensing
|
76
|
+
|
77
|
+
This library is distributed under the terms of the MIT License. See the included file for
|
78
|
+
more detail.
|
79
|
+
|
62
80
|
= Mailing List
|
63
81
|
|
64
|
-
Any questions or feedback should be sent to the PDF::Reader google group.
|
82
|
+
Any questions or feedback should be sent to the PDF::Reader google group. It's
|
83
|
+
better that any answers be available for others instead of hiding in someone's
|
84
|
+
inbox.
|
65
85
|
|
66
86
|
http://groups.google.com/group/pdf-reader
|
67
87
|
|
@@ -77,21 +97,21 @@ A simple app to count the number of pages in a PDF File.
|
|
77
97
|
require 'pdf/reader'
|
78
98
|
|
79
99
|
class PageReceiver
|
80
|
-
attr_accessor :
|
100
|
+
attr_accessor :counter
|
81
101
|
|
82
102
|
def initialize
|
83
|
-
@
|
103
|
+
@counter = 0
|
84
104
|
end
|
85
105
|
|
86
106
|
# Called when page parsing ends
|
87
107
|
def end_page
|
88
|
-
@
|
108
|
+
@counter += 1
|
89
109
|
end
|
90
110
|
end
|
91
111
|
|
92
112
|
receiver = PageReceiver.new
|
93
113
|
pdf = PDF::Reader.file("somefile.pdf", receiver)
|
94
|
-
puts "#{receiver.
|
114
|
+
puts "#{receiver.counter} pages"
|
95
115
|
|
96
116
|
== List all callbacks generated by a single PDF
|
97
117
|
|
@@ -242,7 +262,7 @@ A simple app to display the number of pages in a PDF File.
|
|
242
262
|
|
243
263
|
= Known Limitations
|
244
264
|
|
245
|
-
The order of the callbacks is
|
265
|
+
The order of the callbacks is unpredictable, and is dependent on the internal
|
246
266
|
layout of the file, not the order objects are displayed to the user. As a
|
247
267
|
consequence of this it is highly unlikely that text will be completely in
|
248
268
|
order.
|
data/Rakefile
CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
|
|
6
6
|
require "rake/gempackagetask"
|
7
7
|
require 'spec/rake/spectask'
|
8
8
|
|
9
|
-
PKG_VERSION = "0.7.
|
9
|
+
PKG_VERSION = "0.7.6"
|
10
10
|
PKG_NAME = "pdf-reader"
|
11
11
|
PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
|
12
12
|
|
@@ -47,8 +47,7 @@ Rake::RDocTask.new("doc") do |rdoc|
|
|
47
47
|
rdoc.rdoc_files.include('README.rdoc')
|
48
48
|
rdoc.rdoc_files.include('TODO')
|
49
49
|
rdoc.rdoc_files.include('CHANGELOG')
|
50
|
-
|
51
|
-
#rdoc.rdoc_files.include('LICENSE')
|
50
|
+
rdoc.rdoc_files.include('MIT-LICENSE')
|
52
51
|
rdoc.rdoc_files.include('lib/**/*.rb')
|
53
52
|
rdoc.options << "--inline-source"
|
54
53
|
end
|
@@ -70,7 +69,7 @@ spec = Gem::Specification.new do |spec|
|
|
70
69
|
spec.executables << "pdf_text"
|
71
70
|
spec.executables << "pdf_list_callbacks"
|
72
71
|
spec.has_rdoc = true
|
73
|
-
spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG}
|
72
|
+
spec.extra_rdoc_files = %w{README.rdoc TODO CHANGELOG MIT-LICENSE }
|
74
73
|
spec.rdoc_options << '--title' << 'PDF::Reader Documentation' <<
|
75
74
|
'--main' << 'README.rdoc' << '-q'
|
76
75
|
spec.author = "Peter Jones"
|
@@ -78,6 +77,7 @@ spec = Gem::Specification.new do |spec|
|
|
78
77
|
spec.rubyforge_project = "pdf-reader"
|
79
78
|
spec.homepage = "http://software.pmade.com/pdfreader"
|
80
79
|
spec.description = "The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe"
|
80
|
+
spec.add_dependency('Ascii85', '>=0.9')
|
81
81
|
end
|
82
82
|
|
83
83
|
# package the library into a gem
|
data/bin/pdf_list_callbacks
CHANGED
data/bin/pdf_object
CHANGED
data/bin/pdf_text
CHANGED
data/lib/pdf/reader.rb
CHANGED
data/lib/pdf/reader/buffer.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -41,7 +41,7 @@ class PDF::Reader
|
|
41
41
|
self
|
42
42
|
end
|
43
43
|
################################################################################
|
44
|
-
# reads the requested number of bytes from the underlying IO stream.
|
44
|
+
# reads the requested number of bytes from the underlying IO stream.
|
45
45
|
#
|
46
46
|
# length should be a positive integer.
|
47
47
|
def read (length)
|
@@ -56,13 +56,22 @@ class PDF::Reader
|
|
56
56
|
out
|
57
57
|
end
|
58
58
|
################################################################################
|
59
|
-
# Reads from the buffer until the specified token is found, or the end of the buffer
|
59
|
+
# Reads from the buffer until the specified token is found, or the end of the buffer
|
60
60
|
#
|
61
61
|
# bytes - the bytes to search for.
|
62
62
|
def read_until(bytes)
|
63
63
|
out = ""
|
64
64
|
size = bytes.size
|
65
|
-
|
65
|
+
|
66
|
+
if @buffer && !@buffer.empty?
|
67
|
+
if @buffer.include?(bytes)
|
68
|
+
offset = @buffer.index(bytes) + size
|
69
|
+
return head(offset)
|
70
|
+
else
|
71
|
+
out << head(@buffer.size)
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
66
75
|
loop do
|
67
76
|
out << @io.read(1)
|
68
77
|
if out[-1 * size,size].eql?(bytes)
|
@@ -74,7 +83,7 @@ class PDF::Reader
|
|
74
83
|
out
|
75
84
|
end
|
76
85
|
################################################################################
|
77
|
-
# returns true if the underlying IO object is at end and the internal buffer
|
86
|
+
# returns true if the underlying IO object is at end and the internal buffer
|
78
87
|
# is empty
|
79
88
|
def eof?
|
80
89
|
ready_token
|
@@ -89,6 +98,10 @@ class PDF::Reader
|
|
89
98
|
@io.pos
|
90
99
|
end
|
91
100
|
################################################################################
|
101
|
+
def pos_without_buf
|
102
|
+
@io.pos - @buffer.to_s.size
|
103
|
+
end
|
104
|
+
################################################################################
|
92
105
|
# PDF files are processed by tokenising the content into a series of objects and commands.
|
93
106
|
# This prepares the buffer for use by reading the next line of tokens into memory.
|
94
107
|
def ready_token (with_strip=true, skip_blanks=true)
|
@@ -105,10 +118,10 @@ class PDF::Reader
|
|
105
118
|
# return the next token from the underlying IO stream
|
106
119
|
def token
|
107
120
|
ready_token
|
108
|
-
|
121
|
+
|
109
122
|
i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size
|
110
123
|
|
111
|
-
token_chars =
|
124
|
+
token_chars =
|
112
125
|
if i == 0 and @buffer[i,2] == "<<" then 2
|
113
126
|
elsif i == 0 and @buffer[i,2] == ">>" then 2
|
114
127
|
elsif i == 0 then 1
|
@@ -148,7 +161,7 @@ class PDF::Reader
|
|
148
161
|
data = @io.read(1024)
|
149
162
|
|
150
163
|
# the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
|
151
|
-
# To ensure we find the xref offset correctly, change all possible options to a
|
164
|
+
# To ensure we find the xref offset correctly, change all possible options to a
|
152
165
|
# standard format
|
153
166
|
data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
|
154
167
|
lines = data.split(/\n/).reverse
|
data/lib/pdf/reader/cmap.rb
CHANGED
@@ -69,14 +69,12 @@ class PDF::Reader
|
|
69
69
|
start_code = "0x#{start_code}".hex
|
70
70
|
end_code = "0x#{end_code}".hex
|
71
71
|
dst = "0x#{dst}".hex
|
72
|
-
incr = 0
|
73
72
|
|
74
73
|
# add all values in the range to our mapping
|
75
|
-
(start_code..end_code).
|
76
|
-
@map[val] = dst +
|
77
|
-
incr += 1
|
74
|
+
(start_code..end_code).each_with_index do |val, idx|
|
75
|
+
@map[val] = dst + idx
|
78
76
|
# ensure a single range does not exceed 255 chars
|
79
|
-
raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if
|
77
|
+
raise PDF::Reader::MalformedPDFError, "a CMap bfrange cann't exceed 255 chars" if idx > 255
|
80
78
|
end
|
81
79
|
end
|
82
80
|
end
|
data/lib/pdf/reader/content.rb
CHANGED
@@ -293,7 +293,7 @@ class PDF::Reader
|
|
293
293
|
if page[:Type] == :Pages
|
294
294
|
callback(:begin_page_container, [page])
|
295
295
|
walk_resources(@xref.object(res)) if res
|
296
|
-
page[:Kids].each {|child| walk_pages(@xref.object(child))}
|
296
|
+
@xref.object(page[:Kids]).each {|child| walk_pages(@xref.object(child))}
|
297
297
|
callback(:end_page_container)
|
298
298
|
elsif page[:Type] == :Page
|
299
299
|
callback(:begin_page, [page])
|
data/lib/pdf/reader/encoding.rb
CHANGED
@@ -41,33 +41,33 @@ class PDF::Reader
|
|
41
41
|
end
|
42
42
|
|
43
43
|
case enc
|
44
|
-
when nil then
|
44
|
+
when nil then
|
45
45
|
load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
|
46
46
|
@unpack = "C*"
|
47
|
-
when "Identity-H".to_sym then
|
47
|
+
when "Identity-H".to_sym then
|
48
48
|
@unpack = "n*"
|
49
49
|
@to_unicode_required = true
|
50
|
-
when :MacRomanEncoding then
|
50
|
+
when :MacRomanEncoding then
|
51
51
|
load_mapping File.dirname(__FILE__) + "/encodings/mac_roman.txt"
|
52
52
|
@unpack = "C*"
|
53
|
-
when :MacExpertEncoding then
|
53
|
+
when :MacExpertEncoding then
|
54
54
|
load_mapping File.dirname(__FILE__) + "/encodings/mac_expert.txt"
|
55
55
|
@unpack = "C*"
|
56
|
-
when :PDFDocEncoding then
|
56
|
+
when :PDFDocEncoding then
|
57
57
|
load_mapping File.dirname(__FILE__) + "/encodings/pdf_doc.txt"
|
58
58
|
@unpack = "C*"
|
59
|
-
when :StandardEncoding then
|
59
|
+
when :StandardEncoding then
|
60
60
|
load_mapping File.dirname(__FILE__) + "/encodings/standard.txt"
|
61
61
|
@unpack = "C*"
|
62
|
-
when :SymbolEncoding then
|
62
|
+
when :SymbolEncoding then
|
63
63
|
load_mapping File.dirname(__FILE__) + "/encodings/symbol.txt"
|
64
64
|
@unpack = "C*"
|
65
|
-
when :UTF16Encoding then
|
65
|
+
when :UTF16Encoding then
|
66
66
|
@unpack = "n*"
|
67
|
-
when :WinAnsiEncoding then
|
67
|
+
when :WinAnsiEncoding then
|
68
68
|
load_mapping File.dirname(__FILE__) + "/encodings/win_ansi.txt"
|
69
69
|
@unpack = "C*"
|
70
|
-
when :ZapfDingbatsEncoding then
|
70
|
+
when :ZapfDingbatsEncoding then
|
71
71
|
load_mapping File.dirname(__FILE__) + "/encodings/zapf_dingbats.txt"
|
72
72
|
@unpack = "C*"
|
73
73
|
else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
|
data/lib/pdf/reader/error.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/explore.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -46,7 +46,7 @@ class PDF::Reader
|
|
46
46
|
def output_parent (obj)
|
47
47
|
case obj
|
48
48
|
when Hash
|
49
|
-
obj.each do |k,v|
|
49
|
+
obj.each do |k,v|
|
50
50
|
print "#{k}"; output_child(v); print "\n"
|
51
51
|
Explore::const_set(k, k) if !Explore.const_defined?(k)
|
52
52
|
end
|
data/lib/pdf/reader/filter.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -34,17 +34,30 @@ class PDF::Reader
|
|
34
34
|
# in the future.
|
35
35
|
class Filter
|
36
36
|
################################################################################
|
37
|
-
# creates a new filter for decoding content
|
38
|
-
|
37
|
+
# creates a new filter for decoding content.
|
38
|
+
#
|
39
|
+
# Filters that are only used to encode image data are accepted, but the data is
|
40
|
+
# returned untouched. At this stage PDF::Reader has no need to decode images.
|
41
|
+
#
|
42
|
+
def initialize (name, options = nil)
|
39
43
|
@options = options
|
40
44
|
|
41
45
|
case name.to_sym
|
46
|
+
when :ASCII85Decode then @filter = :ascii85
|
47
|
+
when :ASCIIHexDecode then @filter = :asciihex
|
48
|
+
when :CCITTFaxDecode then @filter = nil
|
49
|
+
when :DCTDecode then @filter = nil
|
42
50
|
when :FlateDecode then @filter = :flate
|
43
|
-
|
51
|
+
when :JBIG2Decode then @filter = nil
|
52
|
+
else raise UnsupportedFeatureError, "Unknown filter: #{name}"
|
44
53
|
end
|
45
54
|
end
|
46
55
|
################################################################################
|
47
56
|
# attempts to decode the specified data with the current filter
|
57
|
+
#
|
58
|
+
# Filters that are only used to encode image data are accepted, but the data is
|
59
|
+
# returned untouched. At this stage PDF::Reader has no need to decode images.
|
60
|
+
#
|
48
61
|
def filter (data)
|
49
62
|
# leave the data untouched if we don't support the required filter
|
50
63
|
return data if @filter.nil?
|
@@ -53,6 +66,30 @@ class PDF::Reader
|
|
53
66
|
self.send(@filter, data)
|
54
67
|
end
|
55
68
|
################################################################################
|
69
|
+
# Decode the specified data using the Ascii85 algorithm. Relies on the AScii85
|
70
|
+
# rubygem.
|
71
|
+
#
|
72
|
+
def ascii85(data)
|
73
|
+
data = "<~#{data}" unless data.to_s[0,2] == "<~"
|
74
|
+
Ascii85::decode(data)
|
75
|
+
rescue Exception => e
|
76
|
+
# Oops, there was a problem decoding the stream
|
77
|
+
raise MalformedPDFError, "Error occured while decoding an ASCII85 stream (#{e.class.to_s}: #{e.to_s})"
|
78
|
+
end
|
79
|
+
################################################################################
|
80
|
+
# Decode the specified data using the AsciiHex algorithm.
|
81
|
+
#
|
82
|
+
def asciihex(data)
|
83
|
+
data.chop! if data[-1,1] == ">"
|
84
|
+
data = data[1,data.size] if data[0,1] == "<"
|
85
|
+
data.gsub!(/[^A-Fa-f0-9]/,"")
|
86
|
+
data << "0" if data.size % 2 == 1
|
87
|
+
data.scan(/.{2}/).map { |s| s.hex.chr }.join("")
|
88
|
+
rescue Exception => e
|
89
|
+
# Oops, there was a problem decoding the stream
|
90
|
+
raise MalformedPDFError, "Error occured while decoding an ASCIIHex stream (#{e.class.to_s}: #{e.to_s})"
|
91
|
+
end
|
92
|
+
################################################################################
|
56
93
|
# Decode the specified data with the Zlib compression algorithm
|
57
94
|
def flate (data)
|
58
95
|
begin
|
@@ -63,7 +100,7 @@ class PDF::Reader
|
|
63
100
|
# If that fails, then use an undocumented 'feature' to attempt to inflate
|
64
101
|
# the data as a raw RFC1951 stream.
|
65
102
|
#
|
66
|
-
# See
|
103
|
+
# See
|
67
104
|
# - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/243545
|
68
105
|
# - http://www.gzip.org/zlib/zlib_faq.html#faq38
|
69
106
|
Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(data)
|
data/lib/pdf/reader/font.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -28,12 +28,12 @@ class PDF::Reader
|
|
28
28
|
attr_accessor :label, :subtype, :encoding, :descendantfonts, :tounicode
|
29
29
|
attr_reader :basefont
|
30
30
|
|
31
|
-
# returns a hash that maps glyph names to unicode codepoints. The mapping is based on
|
31
|
+
# returns a hash that maps glyph names to unicode codepoints. The mapping is based on
|
32
32
|
# a text file supplied by Adobe at:
|
33
33
|
# http://www.adobe.com/devnet/opentype/archives/glyphlist.txt
|
34
34
|
def self.glyphnames
|
35
35
|
@@glyphs ||= {}
|
36
|
-
|
36
|
+
|
37
37
|
if @@glyphs.empty?
|
38
38
|
RUBY_VERSION >= "1.9" ? mode = "r:BINARY" : mode = "r"
|
39
39
|
File.open(File.dirname(__FILE__) + "/glyphlist.txt",mode) do |f|
|
@@ -51,9 +51,9 @@ class PDF::Reader
|
|
51
51
|
# setup a default encoding for the selected font. It can always be overridden
|
52
52
|
# with encoding= if required
|
53
53
|
case font
|
54
|
-
when "Symbol" then
|
54
|
+
when "Symbol" then
|
55
55
|
self.encoding = PDF::Reader::Encoding.new("SymbolEncoding")
|
56
|
-
when "ZapfDingbats" then
|
56
|
+
when "ZapfDingbats" then
|
57
57
|
self.encoding = PDF::Reader::Encoding.new("ZapfDingbatsEncoding")
|
58
58
|
end
|
59
59
|
@basefont = font
|
@@ -64,7 +64,7 @@ class PDF::Reader
|
|
64
64
|
|
65
65
|
if params.class == String
|
66
66
|
# translate the bytestram into a UTF-8 string.
|
67
|
-
# If an encoding hasn't been specified, assume the text using this
|
67
|
+
# If an encoding hasn't been specified, assume the text using this
|
68
68
|
# font is in Adobe Standard Encoding.
|
69
69
|
(encoding || PDF::Reader::Encoding.new(:StandardEncoding)).to_utf8(params, tounicode)
|
70
70
|
elsif params.class == Array
|
data/lib/pdf/reader/parser.rb
CHANGED
@@ -61,7 +61,8 @@ class PDF::Reader
|
|
61
61
|
when ">>", "]", ">" then return Token.new(token)
|
62
62
|
else
|
63
63
|
if operators.has_key?(token) then return Token.new(token)
|
64
|
-
|
64
|
+
elsif token =~ /\d*\.\d/ then return token.to_f
|
65
|
+
else return token.to_i
|
65
66
|
end
|
66
67
|
end
|
67
68
|
end
|
@@ -99,7 +100,7 @@ class PDF::Reader
|
|
99
100
|
# Reads a PDF hex string from the buffer and converts it to a Ruby String
|
100
101
|
def hex_string
|
101
102
|
str = ""
|
102
|
-
|
103
|
+
|
103
104
|
loop do
|
104
105
|
token = @buffer.token
|
105
106
|
break if token == ">"
|
@@ -122,14 +123,15 @@ class PDF::Reader
|
|
122
123
|
# find the first occurance of ( ) [ \ or ]
|
123
124
|
#
|
124
125
|
# I originally just used the regexp form of index(), but it seems to be
|
125
|
-
# buggy on some OSX systems (returns nil when there is a match).
|
126
|
-
#
|
127
|
-
# greater.
|
126
|
+
# buggy on some OSX systems (returns nil when there is a match). This
|
127
|
+
# version is more reliable and was suggested by Andrès Koetsier.
|
128
128
|
#
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
129
|
+
i = nil
|
130
|
+
@buffer.raw.unpack("C*").each_with_index do |charint, idx|
|
131
|
+
if [40, 41, 92].include?(charint)
|
132
|
+
i = idx
|
133
|
+
break
|
134
|
+
end
|
133
135
|
end
|
134
136
|
|
135
137
|
if i.nil?
|
@@ -201,7 +203,7 @@ class PDF::Reader
|
|
201
203
|
def stream (dict)
|
202
204
|
raise MalformedPDFError, "PDF malformed, missing stream length" unless dict.has_key?(:Length)
|
203
205
|
data = @buffer.read(@xref.object(dict[:Length]))
|
204
|
-
|
206
|
+
|
205
207
|
Error.str_assert(parse_token, "endstream")
|
206
208
|
Error.str_assert(parse_token, "endobj")
|
207
209
|
|
data/lib/pdf/reader/reference.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
@@ -25,7 +25,7 @@
|
|
25
25
|
|
26
26
|
class PDF::Reader
|
27
27
|
################################################################################
|
28
|
-
# An internal PDF::Reader class that represents an indirect reference to a PDF Object
|
28
|
+
# An internal PDF::Reader class that represents an indirect reference to a PDF Object
|
29
29
|
class Reference
|
30
30
|
################################################################################
|
31
31
|
# check if the next token in the buffer is a reference, and return a PDF::Reader::Reference
|
data/lib/pdf/reader/stream.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/token.rb
CHANGED
@@ -9,10 +9,10 @@
|
|
9
9
|
# distribute, sublicense, and/or sell copies of the Software, and to
|
10
10
|
# permit persons to whom the Software is furnished to do so, subject to
|
11
11
|
# the following conditions:
|
12
|
-
#
|
12
|
+
#
|
13
13
|
# The above copyright notice and this permission notice shall be
|
14
14
|
# included in all copies or substantial portions of the Software.
|
15
|
-
#
|
15
|
+
#
|
16
16
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
17
|
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
18
|
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
data/lib/pdf/reader/xref.rb
CHANGED
@@ -72,7 +72,7 @@ class PDF::Reader
|
|
72
72
|
# If the object is a stream, that is returned as well
|
73
73
|
def object (ref, save_pos = true)
|
74
74
|
return ref unless ref.kind_of?(Reference)
|
75
|
-
pos = @buffer.
|
75
|
+
pos = @buffer.pos_without_buf if save_pos
|
76
76
|
obj = Parser.new(@buffer.seek(offset_for(ref)), self).object(ref.id, ref.gen)
|
77
77
|
@buffer.seek(pos) if save_pos
|
78
78
|
return obj
|
@@ -132,7 +132,7 @@ class PDF::Reader
|
|
132
132
|
# ref - a PDF::Reader::Reference object containing an object ID and revision number
|
133
133
|
def offset_for (ref)
|
134
134
|
@xref[ref.id][ref.gen]
|
135
|
-
rescue
|
135
|
+
rescue
|
136
136
|
raise InvalidObjectError, "Object #{ref.id}, Generation #{ref.gen} is invalid"
|
137
137
|
end
|
138
138
|
################################################################################
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pdf-reader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.6
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Peter Jones
|
@@ -9,10 +9,19 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date:
|
12
|
+
date: 2009-08-28 00:00:00 +10:00
|
13
13
|
default_executable:
|
14
|
-
dependencies:
|
15
|
-
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: Ascii85
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.9"
|
24
|
+
version:
|
16
25
|
description: The PDF::Reader library implements a PDF parser conforming as much as possible to the PDF specification from Adobe
|
17
26
|
email: pjones@pmade.com
|
18
27
|
executables:
|
@@ -25,10 +34,9 @@ extra_rdoc_files:
|
|
25
34
|
- README.rdoc
|
26
35
|
- TODO
|
27
36
|
- CHANGELOG
|
37
|
+
- MIT-LICENSE
|
28
38
|
files:
|
29
|
-
- lib/pdf
|
30
39
|
- lib/pdf/reader.rb
|
31
|
-
- lib/pdf/reader
|
32
40
|
- lib/pdf/reader/buffer.rb
|
33
41
|
- lib/pdf/reader/cmap.rb
|
34
42
|
- lib/pdf/reader/content.rb
|
@@ -44,7 +52,6 @@ files:
|
|
44
52
|
- lib/pdf/reader/register_receiver.rb
|
45
53
|
- lib/pdf/reader/text_receiver.rb
|
46
54
|
- lib/pdf/reader/token.rb
|
47
|
-
- lib/pdf/reader/encodings
|
48
55
|
- lib/pdf/reader/encodings/mac_expert.txt
|
49
56
|
- lib/pdf/reader/encodings/mac_roman.txt
|
50
57
|
- lib/pdf/reader/encodings/pdf_doc.txt
|
@@ -58,8 +65,11 @@ files:
|
|
58
65
|
- README.rdoc
|
59
66
|
- TODO
|
60
67
|
- CHANGELOG
|
68
|
+
- MIT-LICENSE
|
61
69
|
has_rdoc: true
|
62
70
|
homepage: http://software.pmade.com/pdfreader
|
71
|
+
licenses: []
|
72
|
+
|
63
73
|
post_install_message:
|
64
74
|
rdoc_options:
|
65
75
|
- --title
|
@@ -84,9 +94,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
84
94
|
requirements: []
|
85
95
|
|
86
96
|
rubyforge_project: pdf-reader
|
87
|
-
rubygems_version: 1.
|
97
|
+
rubygems_version: 1.3.4
|
88
98
|
signing_key:
|
89
|
-
specification_version:
|
99
|
+
specification_version: 3
|
90
100
|
summary: A library for accessing the content of PDF files
|
91
101
|
test_files: []
|
92
102
|
|