pdf-reader 1.0.0.rc1 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,7 @@
1
+ v1.0.0 (16th January 2012)
2
+ - support a new encryption variation
3
+ - bugfix in PageTextRender (thanks Paul Gallagher)
4
+
1
5
  v1.0.0.rc1 (19th December 2011)
2
6
  - performance optimisations (all by Bernerd Schaefer)
3
7
  - some improvements to text extraction from form xobjects
@@ -1,18 +1,3 @@
1
- = !PLEASE NOTE!
2
-
3
- All the examples below are for the latest (pre-release) version of the gem (0.11)
4
-
5
- If you have installed the gem via the rubygems with the command:
6
-
7
- $ gem install pdf-reader
8
-
9
- Then the examples below *will not work* for you. Please check the examples that
10
- come with previous version of the gem (0.10).
11
-
12
- If you want to install the latest version of this gem use the command:
13
-
14
- $ gem install pdf-reader --prerelease
15
-
16
1
  = Release Notes
17
2
 
18
3
  The PDF::Reader library implements a PDF parser conforming as much as possible
@@ -59,7 +44,8 @@ an IO stream:
59
44
  puts reader.info
60
45
 
61
46
  If you open a PDF with File#open or IO#open, I strongly recommend using "rb"
62
- mode to ensure the file isn't mangled by ruby being 'helpful'.
47
+ mode to ensure the file isn't mangled by ruby being 'helpful'. This is
48
+ particularly important on windows and MRI >= 1.9.2.
63
49
 
64
50
  File.open("somefile.pdf", "rb") do |io|
65
51
  reader = PDF::Reader.new(io)
@@ -111,6 +97,15 @@ to UTF-8 before it is passed back from PDF::Reader.
111
97
  Strings that contain binary data (like font blobs) will be marked as such on
112
98
  M17N aware VMs.
113
99
 
100
+ = Former API
101
+
102
+ Version 1.0.0 of PDF::Reader introduced a new page-based API that provides
103
+ efficient and easy access to any page.
104
+
105
+ The previous API is marked as deprecated but will continue to work for the
106
+ time being. Eventually calls to the old API will begin triggering deprecation
107
+ warnings before it is completely removed in version 2.0.0.
108
+
114
109
  = Exceptions
115
110
 
116
111
  There are two key exceptions that you will need to watch out for when processing a
@@ -119,7 +114,7 @@ PDF file:
119
114
  MalformedPDFError - The PDF appears to be corrupt in some way. If you believe the
120
115
  file should be valid, or that a corrupt file didn't raise an exception, please
121
116
  forward a copy of the file to the maintainers (preferably via the google group)
122
- and we can attempt to improve the code.
117
+ and we will attempt to improve the code.
123
118
 
124
119
  UnsupportedFeatureError - The PDF uses a feature that PDF::Reader doesn't currently
125
120
  support. Again, we welcome submissions of PDF files that exhibit these features to help
data/TODO CHANGED
@@ -1,27 +1,19 @@
1
- v0.8
2
- - add extra callbacks
3
- - list implemented features
4
- - encrypted? tagged? bookmarks? annotated? optimised?
5
- - Allow more than just page content and metadata to be parsed (see spec section 3.6.1)
1
+ This stuff would be great
2
+ - improved access to document level objects and data
6
3
  - bookmarks?
7
4
  - outline?
8
5
  - articles?
9
6
  - viewer prefs?
10
- - Don't remove comment when tokenising in the middle of a string
7
+ - Improve the speed of Encoding#to_utf8
11
8
  - Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
12
9
  poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
13
10
  from the Original encoding to Unicode.
14
11
  - detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
15
12
  - Improve interpretation of non content stream data (ie metadata). recognise dates, etc
16
- - Fix inheritance of page attributes. Resources has been done, but plenty of other attributes
17
- are inheritable. See table 3.2.7 in the spec
18
13
 
19
- v0.9
20
- - Add a way to extract raster images
21
- - see XObjects section of spec (section 4.7)
22
- - Add a way to extract font data?
23
14
 
24
- Sometime
15
+
16
+ This might be useful, more research required
25
17
  - Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
26
18
  - Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
27
19
 
@@ -30,10 +22,7 @@ Sometime
30
22
  - Ship some extra receivers in the standard package, particuarly ones that are useful for running
31
23
  rspec over generated PDF files
32
24
 
33
- - When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there's no
34
- sensible way to convert them to unicode
35
-
36
- - Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
25
+ - Add support for additional filters: CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode
37
26
 
38
27
  - Add support for additional encodings:
39
28
  - Identity-V(I *think* this relates to vertical text. Not sure how we'd support it sensibly)
@@ -159,7 +159,7 @@ module PDF
159
159
  yield PDF::Reader.new(input, opts)
160
160
  end
161
161
 
162
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
162
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
163
163
  # eventually be removed
164
164
  #
165
165
  #
@@ -171,7 +171,7 @@ module PDF
171
171
  end
172
172
  end
173
173
 
174
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
174
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
175
175
  # eventually be removed
176
176
  #
177
177
  # Parse the given string, sending events to the given receiver.
@@ -182,7 +182,7 @@ module PDF
182
182
  end
183
183
  end
184
184
 
185
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
185
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
186
186
  # eventually be removed
187
187
  #
188
188
  # Parse the file with the given name, returning an unmarshalled ruby version of
@@ -194,7 +194,7 @@ module PDF
194
194
  }
195
195
  end
196
196
 
197
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
197
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
198
198
  # eventually be removed
199
199
  #
200
200
  # Parse the given string, returning an unmarshalled ruby version of represents
@@ -245,7 +245,7 @@ module PDF
245
245
  end
246
246
 
247
247
 
248
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
248
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
249
249
  # eventually be removed
250
250
  #
251
251
  # Given an IO object that contains PDF data, parse it.
@@ -263,7 +263,7 @@ module PDF
263
263
  self
264
264
  end
265
265
 
266
- # DEPRECATED: this method was deprecated in version 0.11.0 and will
266
+ # DEPRECATED: this method was deprecated in version 1.0.0 and will
267
267
  # eventually be removed
268
268
  #
269
269
  # Given an IO object that contains PDF data, return the contents of a single object
@@ -276,7 +276,7 @@ module PDF
276
276
 
277
277
  private
278
278
 
279
- # recursively convert strings from outside a content stream intop UTF-8
279
+ # recursively convert strings from outside a content stream into UTF-8
280
280
  #
281
281
  def doc_strings_to_utf8(obj)
282
282
  case obj
@@ -272,7 +272,7 @@ class PDF::Reader
272
272
  row += 1
273
273
  end
274
274
 
275
- pixels.map { |row| row.flatten.pack("C*") }.join("")
275
+ pixels.map { |bytes| bytes.flatten.pack("C*") }.join("")
276
276
  end
277
277
  end
278
278
  end
@@ -76,7 +76,7 @@ module PDF
76
76
  params << token
77
77
  end
78
78
  end
79
- rescue EOFError => e
79
+ rescue EOFError
80
80
  raise MalformedPDFError, "End Of File while processing a content stream"
81
81
  end
82
82
  end
@@ -133,7 +133,7 @@ module PDF
133
133
  params << token
134
134
  end
135
135
  end
136
- rescue EOFError => e
136
+ rescue EOFError
137
137
  raise MalformedPDFError, "End Of File while processing a content stream"
138
138
  end
139
139
 
@@ -1,12 +1,6 @@
1
1
  # coding: utf-8
2
2
 
3
3
  require 'matrix'
4
- require 'yaml'
5
-
6
- begin
7
- require 'psych'
8
- rescue LoadError
9
- end
10
4
 
11
5
  module PDF
12
6
  class Reader
@@ -32,7 +26,7 @@ module PDF
32
26
  @font_stack = [build_fonts(page.fonts)]
33
27
  @xobject_stack = [page.xobjects]
34
28
  @content = {}
35
- @stack = [DEFAULT_GRAPHICS_STATE]
29
+ @stack = [DEFAULT_GRAPHICS_STATE.dup]
36
30
  end
37
31
 
38
32
  def content
@@ -235,8 +229,6 @@ module PDF
235
229
  # underlying device space.
236
230
  #
237
231
  def transform(point, z = 1)
238
- trm = text_rendering_matrix
239
-
240
232
  point.transform(text_rendering_matrix, z)
241
233
  end
242
234
 
@@ -286,7 +278,7 @@ module PDF
286
278
  end
287
279
 
288
280
  # private class for representing points on a cartesian plain. Used
289
- # to simplify maths in the MinPpi class.
281
+ # to simplify maths.
290
282
  #
291
283
  class Point < Struct.new(:x, :y)
292
284
  def transform(trm, z)
@@ -295,10 +287,6 @@ module PDF
295
287
  (trm[0,1] * x) + (trm[1,1] * y) + (trm[2,1] * z)
296
288
  )
297
289
  end
298
-
299
- def distance(point)
300
- Math.hypot(point.x - @x, point.y - @y)
301
- end
302
290
  end
303
291
  end
304
292
  end
@@ -79,7 +79,8 @@ class PDF::Reader
79
79
  objKey = @encrypt_key.dup
80
80
  (0..2).each { |e| objKey << (ref.id >> e*8 & 0xFF ) }
81
81
  (0..1).each { |e| objKey << (ref.gen >> e*8 & 0xFF ) }
82
- rc4 = RC4.new( Digest::MD5.digest(objKey) )
82
+ length = objKey.length < 16 ? objKey.length : 16
83
+ rc4 = RC4.new( Digest::MD5.digest(objKey)[(0...length)] )
83
84
  rc4.decrypt(buf)
84
85
  end
85
86
 
@@ -144,10 +145,11 @@ class PDF::Reader
144
145
  out = Digest::MD5.digest(PassPadBytes.pack("C*") + @file_id)
145
146
  #zero doesn't matter -> so from 0-19
146
147
  20.times{ |i| out=RC4.new(xor_each_byte(keyBegins, i)).decrypt(out) }
148
+ pass = @user_key[(0...16)] == out
147
149
  else
148
- out = RC4.new(keyBegins).encrypt(PassPadBytes.pack("C*"))
150
+ pass = RC4.new(keyBegins).encrypt(PassPadBytes.pack("C*")) == @user_key
149
151
  end
150
- @user_key[(0...16)] == out ? keyBegins : nil
152
+ pass ? keyBegins : nil
151
153
  end
152
154
 
153
155
  def make_file_key( user_pass )
metadata CHANGED
@@ -1,19 +1,19 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pdf-reader
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0.rc1
5
- prerelease: 6
4
+ version: 1.0.0
5
+ prerelease:
6
6
  platform: ruby
7
7
  authors:
8
8
  - James Healy
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-12-19 00:00:00.000000000 Z
12
+ date: 2012-01-16 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
16
- requirement: &19650680 !ruby/object:Gem::Requirement
16
+ requirement: &24844240 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: '0'
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *19650680
24
+ version_requirements: *24844240
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: roodi
27
- requirement: &19650220 !ruby/object:Gem::Requirement
27
+ requirement: &24843780 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: '0'
33
33
  type: :development
34
34
  prerelease: false
35
- version_requirements: *19650220
35
+ version_requirements: *24843780
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: rspec
38
- requirement: &19649720 !ruby/object:Gem::Requirement
38
+ requirement: &24843280 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: '2.3'
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *19649720
46
+ version_requirements: *24843280
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: ZenTest
49
- requirement: &19649220 !ruby/object:Gem::Requirement
49
+ requirement: &24842780 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ~>
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: 4.4.2
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *19649220
57
+ version_requirements: *24842780
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: Ascii85
60
- requirement: &19648740 !ruby/object:Gem::Requirement
60
+ requirement: &24842320 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,10 +65,10 @@ dependencies:
65
65
  version: 1.0.0
66
66
  type: :runtime
67
67
  prerelease: false
68
- version_requirements: *19648740
68
+ version_requirements: *24842320
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: ruby-rc4
71
- requirement: &19648280 !ruby/object:Gem::Requirement
71
+ requirement: &24841940 !ruby/object:Gem::Requirement
72
72
  none: false
73
73
  requirements:
74
74
  - - ! '>='
@@ -76,7 +76,7 @@ dependencies:
76
76
  version: '0'
77
77
  type: :runtime
78
78
  prerelease: false
79
- version_requirements: *19648280
79
+ version_requirements: *24841940
80
80
  description: The PDF::Reader library implements a PDF parser conforming as much as
81
81
  possible to the PDF specification from Adobe
82
82
  email:
@@ -152,13 +152,12 @@ files:
152
152
  - bin/pdf_callbacks
153
153
  homepage: http://github.com/yob/pdf-reader
154
154
  licenses: []
155
- post_install_message: ! "\n ********************************************\n\n This
156
- is a beta release of PDF::Reader to gather feedback on the proposed\n API changes.\n\n
157
- \ The old API is marked as deprecated but will continue to work with no\n visible
158
- warnings for now.\n\n The new API is documented in the README and in rdoc for the
159
- PDF::Reader,\n PDF::Reader::Page and PDF::Reader::ObjectHash classes.\n\n Do not
160
- use this in production, stick to stable releases for that. If you do\n take the
161
- new API for a spin, please send any feedback my way.\n\n ********************************************\n\n"
155
+ post_install_message: ! "\n ********************************************\n\n v1.0.0
156
+ of PDF::Reader introduced a new page-based API. There are extensive\n examples
157
+ showing how to use it in the README and examples directory.\n\n For detailed documentation,
158
+ check the rdocs for the PDF::Reader,\n PDF::Reader::Page and PDF::Reader::ObjectHash
159
+ classes.\n\n The old API is marked as deprecated but will continue to work with
160
+ no\n visible warnings for now.\n\n ********************************************\n\n"
162
161
  rdoc_options:
163
162
  - --title
164
163
  - PDF::Reader Documentation
@@ -176,9 +175,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
176
175
  required_rubygems_version: !ruby/object:Gem::Requirement
177
176
  none: false
178
177
  requirements:
179
- - - ! '>'
178
+ - - ! '>='
180
179
  - !ruby/object:Gem::Version
181
- version: 1.3.1
180
+ version: '0'
182
181
  requirements: []
183
182
  rubyforge_project:
184
183
  rubygems_version: 1.8.11