pdf-reader 0.6.2 → 0.7

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,14 @@
1
+ v0.7 (6th May 2008)
2
+ - API INCOMPATIBLE CHANGE: any hashes that are passed to callbacks use symbols as keys instead of PDF::Reader::Name instances.
3
+ - Improved support for converting text in some PDF files to unicode
4
+ - Behave as expected if the Contents key in a Page Dict is a reference
5
+ - Include some basic metadata callbacks
6
+ - Don't interpret a comment token (%) inside a string as a comment
7
+ - Small fixes to improve 1.9 compatability
8
+ - Improved our Zlib deflating to make it more slightly more robust - still some more issues to work out though
9
+ - Throw an UnsupportedFeatureError if a pdf that uses XRef streams is opened
10
+ - Added an option to PDF::Reader#file and PDF::Reader#string to enable parsing of only parts of a PDF file(ie. only metadata, etc)
11
+
1
12
  v0.6.2 (22nd March 2008)
2
13
  - Catch low level errors when applying filters to a content stream and raise a MalformedPDFError instead.
3
14
  - Added support for processing inline images
data/README CHANGED
@@ -101,6 +101,29 @@ it through less or to a text file.
101
101
  puts cb
102
102
  end
103
103
 
104
+ == Extract metadata only
105
+
106
+ require 'rubygems'
107
+ require 'pdf/reader'
108
+
109
+ class MetaDataReceiver
110
+ attr_accessor :regular
111
+ attr_accessor :xml
112
+
113
+ def metadata(data)
114
+ @regular = data
115
+ end
116
+
117
+ def metadata_xml(data)
118
+ @xml = data
119
+ end
120
+ end
121
+
122
+ receiver = MetaDataReceiver.new
123
+ pdf = PDF::Reader.file(ARGV.shift, receiver, :pages => false, :metadata => true)
124
+ puts receiver.regular.inspect
125
+ puts receiver.xml.inspect
126
+
104
127
  == Basic RSpec of a generated PDF
105
128
 
106
129
  require 'rubygems'
data/Rakefile CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
6
6
  require "rake/gempackagetask"
7
7
  require 'spec/rake/spectask'
8
8
 
9
- PKG_VERSION = "0.6.2"
9
+ PKG_VERSION = "0.7"
10
10
  PKG_NAME = "pdf-reader"
11
11
  PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
12
12
 
data/TODO CHANGED
@@ -1,24 +1,27 @@
1
- v0.7
2
- - Allow the user to only process certain aspects of the PDF file. For example, if they're only
3
- interested in meta data or bookmarks, there's no point in walking the pages tree.
4
- - maybe a third option to Reader.parse?
5
- parse(io, receiver, {:pages => true, :fonts => false, :metadata => true, :bookmarks => false})
6
- - detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
7
- - Provide a way to get raw access to a particular object. Good for testing purposes
8
-
9
1
  v0.8
2
+ - Allow more than just page content and metadata to be parsed (see spec section 3.6.1)
3
+ - bookmarks?
4
+ - outline?
5
+ - articles?
6
+ - viewer prefs?
7
+ - Don't remove comment when tokenising in the middle of a string
10
8
  - Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
11
9
  poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
12
10
  from the Original encoding to Unicode.
11
+ - detect when a font's encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
12
+ - Provide a way to get raw access to a particular object. Good for testing purposes
13
+ - Improve interpretation of non content stream data (ie metadata). Use PDFDofEncoding, recognise UTF16 strings, recognise dates, etc
14
+ - Support Cross Reference Streams (spec 3.4.7)
13
15
 
14
16
  v0.9
15
- - Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
16
- - Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
17
17
  - Add a way to extract raster images
18
18
  - see XObjects section of spec (section 4.7)
19
19
  - Add a way to extract font data?
20
20
 
21
21
  Sometime
22
+ - Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
23
+ - Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
24
+
22
25
  - Work out why specs/data/zlib*.pdf isn't parsed correctly when all the major PDF viewers can display it correctly
23
26
 
24
27
  - Ship some extra receivers in the standard package, particuarly ones that are useful for running
@@ -27,8 +30,6 @@ Sometime
27
30
  - When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there's no
28
31
  sensible way to convert them to unicode
29
32
 
30
- - Improve metadata support
31
-
32
33
  - Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
33
34
 
34
35
  - Add support for additional encodings:
data/lib/pdf/reader.rb CHANGED
@@ -51,19 +51,35 @@ module PDF
51
51
  #
52
52
  # pdf = PDF::Reader.new
53
53
  # pdf.parse(File.new("somefile.pdf"), receiver)
54
+ #
55
+ # = Parsing parts of a file
56
+ #
57
+ # Both PDF::Reader#file and PDF::Reader#string accept a 3 argument that specifies which
58
+ # parts of the file to process. By default, all options are enabled, so this can be useful
59
+ # to cut down processing time if you're only interested in say, metadata.
60
+ #
61
+ # As an example, the following call will disable parsing the contents of pages in the file,
62
+ # but explicitly enables processing metadata.
63
+ #
64
+ # PDF::Reader.new("somefile.pdf", receiver, {:metadata => true, :pages => false})
65
+ #
66
+ # Available options are currently:
67
+ #
68
+ # :metadata
69
+ # :pages
54
70
  class Reader
55
71
  ################################################################################
56
72
  # Parse the file with the given name, sending events to the given receiver.
57
- def self.file (name, receiver)
73
+ def self.file (name, receiver, opts = {})
58
74
  File.open(name,"rb") do |f|
59
- new.parse(f, receiver)
75
+ new.parse(f, receiver, opts)
60
76
  end
61
77
  end
62
78
  ################################################################################
63
79
  # Parse the given string, sending events to the given receiver.
64
- def self.string (str, receiver)
80
+ def self.string (str, receiver, opts = {})
65
81
  StringIO.open(str) do |s|
66
- new.parse(s, receiver)
82
+ new.parse(s, receiver, opts)
67
83
  end
68
84
  end
69
85
  ################################################################################
@@ -79,7 +95,6 @@ require 'pdf/reader/encoding'
79
95
  require 'pdf/reader/error'
80
96
  require 'pdf/reader/filter'
81
97
  require 'pdf/reader/font'
82
- require 'pdf/reader/name'
83
98
  require 'pdf/reader/parser'
84
99
  require 'pdf/reader/reference'
85
100
  require 'pdf/reader/register_receiver'
@@ -94,14 +109,19 @@ class PDF::Reader
94
109
  end
95
110
  ################################################################################
96
111
  # Given an IO object that contains PDF data, parse it.
97
- def parse (io, receiver)
112
+ def parse (io, receiver, opts = {})
98
113
  @buffer = Buffer.new(io)
99
114
  @xref = XRef.new(@buffer)
100
115
  @parser = Parser.new(@buffer, @xref)
101
116
  @content = (receiver == Explore ? Explore : Content).new(receiver, @xref)
102
117
 
118
+ options = {:pages => true, :metadata => true}
119
+ options.merge!(opts)
120
+
103
121
  trailer = @xref.load
104
- @content.document(@xref.object(trailer['Root'])) || self
122
+ @content.metadata(@xref.object(trailer[:Info]).first) if options[:metadata]
123
+ @content.document(@xref.object(trailer[:Root]).first) if options[:pages]
124
+ self
105
125
  end
106
126
  ################################################################################
107
127
  end
@@ -93,7 +93,8 @@ class PDF::Reader
93
93
  def ready_token (with_strip=true, skip_blanks=true)
94
94
  while @buffer.nil? or @buffer.empty?
95
95
  @buffer = @io.readline
96
- @buffer.sub!(/%.*$/, '')
96
+ @buffer.force_encoding("BINARY") if @buffer.respond_to?(:force_encoding)
97
+ #@buffer.sub!(/%.*$/, '') if strip_comments
97
98
  @buffer.chomp!
98
99
  break unless skip_blanks
99
100
  end
@@ -114,7 +115,14 @@ class PDF::Reader
114
115
  end
115
116
 
116
117
  strip_space = !(i == 0 and @buffer[0,1] == '(')
117
- head(token_chars, strip_space)
118
+ tok = head(token_chars, strip_space)
119
+
120
+ if tok[0,1] == "%"
121
+ @buffer = ""
122
+ token
123
+ else
124
+ tok
125
+ end
118
126
  end
119
127
  ################################################################################
120
128
  def head (chars, with_strip=true)
@@ -52,19 +52,19 @@ class PDF::Reader
52
52
 
53
53
  def decode(c)
54
54
  # TODO: implement the conversion
55
- Error.assert_equal(c.class, Fixnum)
55
+ return c unless c.class == Fixnum
56
56
  @map[c]
57
57
  end
58
58
 
59
59
  private
60
60
 
61
61
  def process_bfchar_line(l)
62
- m, find, replace = *l.match(/<([0-9a-fA-F]+)> <([0-9a-fA-F]+)>/)
62
+ m, find, replace = *l.match(/<([0-9a-fA-F]+)>\s*<([0-9a-fA-F]+)>/)
63
63
  @map["0x#{find}".hex] = "0x#{replace}".hex if find && replace
64
64
  end
65
65
 
66
66
  def process_bfrange_line(l)
67
- m, start_code, end_code, dst = *l.match(/<([0-9a-fA-F]+)> <([0-9a-fA-F]+)> <([0-9a-fA-F]+)>/)
67
+ m, start_code, end_code, dst = *l.match(/<([0-9a-fA-F]+)>\s*<([0-9a-fA-F]+)>\s*<([0-9a-fA-F]+)>/)
68
68
  if start_code && end_code && dst
69
69
  start_code = "0x#{start_code}".hex
70
70
  end_code = "0x#{end_code}".hex
@@ -145,6 +145,8 @@ class PDF::Reader
145
145
  # - end_page_container
146
146
  # - begin_page
147
147
  # - end_page
148
+ # - metadata
149
+ # - xml_metadata
148
150
  #
149
151
  # == Resource Callbacks
150
152
  #
@@ -250,10 +252,20 @@ class PDF::Reader
250
252
  @fonts ||= {}
251
253
  end
252
254
  ################################################################################
255
+ # Begin processing the document metadata
256
+ def metadata (info)
257
+ info = utf16_to_utf8(info)
258
+ callback(:metadata, [info]) if info
259
+ end
260
+ ################################################################################
253
261
  # Begin processing the document
254
262
  def document (root)
263
+ if root[:Metadata]
264
+ obj, stream = @xref.object(root[:Metadata])
265
+ callback(:xml_metadata,stream)
266
+ end
255
267
  callback(:begin_document, [root])
256
- walk_pages(@xref.object(root['Pages']))
268
+ walk_pages(@xref.object(root[:Pages]).first)
257
269
  callback(:end_document)
258
270
  end
259
271
  ################################################################################
@@ -261,27 +273,35 @@ class PDF::Reader
261
273
  # its content
262
274
  def walk_pages (page)
263
275
 
264
- if page['Resources']
265
- res = page['Resources']
266
- page.delete('Resources')
276
+ if page[:Resources]
277
+ res = page[:Resources]
278
+ page.delete(:Resources)
267
279
  end
268
280
 
269
281
  # extract page content
270
- if page['Type'] == "Pages"
282
+ if page[:Type] == :Pages
271
283
  callback(:begin_page_container, [page])
272
- walk_resources(@xref.object(res)) if res
273
- page['Kids'].each {|child| walk_pages(@xref.object(child))}
284
+ walk_resources(@xref.object(res).first) if res
285
+ page[:Kids].each {|child| walk_pages(@xref.object(child).first)}
274
286
  callback(:end_page_container)
275
- elsif page['Type'] == "Page"
287
+ elsif page[:Type] == :Page
276
288
  callback(:begin_page, [page])
277
- walk_resources(@xref.object(res)) if res
289
+ walk_resources(@xref.object(res).first) if res
278
290
  @page = page
279
291
  @params = []
280
292
 
281
- page['Contents'].to_a.each do |cstream|
282
- obj, stream = @xref.object(cstream)
293
+ if page[:Contents].kind_of?(Array)
294
+ contents = page[:Contents]
295
+ elsif @xref.obj_type(page[:Contents]) == :Array
296
+ contents, stream = @xref.object(page[:Contents])
297
+ else
298
+ contents = [page[:Contents]]
299
+ end
300
+
301
+ contents.each do |content|
302
+ obj, stream = @xref.object(content)
283
303
  content_stream(stream)
284
- end if page.has_key?('Contents') and page['Contents']
304
+ end if page.has_key?(:Contents) and page[:Contents]
285
305
 
286
306
  callback(:end_page)
287
307
  end
@@ -324,61 +344,60 @@ class PDF::Reader
324
344
  end
325
345
  end
326
346
  rescue EOFError => e
347
+ raise MalformedPDFError, "End Of File while processing a content stream"
327
348
  end
328
349
  ################################################################################
329
350
  def walk_resources(resources)
330
351
  resources = resolve_references(resources)
331
352
 
332
353
  # extract any procset information
333
- if resources['ProcSet']
334
- callback(:resource_procset, resources['ProcSet'])
354
+ if resources[:ProcSet]
355
+ callback(:resource_procset, resources[:ProcSet])
335
356
  end
336
357
 
337
358
  # extract any xobject information
338
- if resources['XObject']
339
- @xref.object(resources['XObject']).each do |name, val|
359
+ if resources[:XObject]
360
+ @xref.object(resources[:XObject]).first.each do |name, val|
340
361
  obj, stream = @xref.object(val)
341
362
  callback(:resource_xobject, [name, obj, stream])
342
363
  end
343
364
  end
344
365
 
345
366
  # extract any extgstate information
346
- if resources['ExtGState']
347
- @xref.object(resources['ExtGState']).each do |name, val|
348
- callback(:resource_extgstate, [name, @xref.object(val)])
367
+ if resources[:ExtGState]
368
+ @xref.object(resources[:ExtGState]).first.each do |name, val|
369
+ callback(:resource_extgstate, [name, @xref.object(val).first])
349
370
  end
350
371
  end
351
372
 
352
373
  # extract any colorspace information
353
- if resources['ColorSpace']
354
- @xref.object(resources['ColorSpace']).each do |name, val|
355
- callback(:resource_colorspace, [name, @xref.object(val)])
374
+ if resources[:ColorSpace]
375
+ @xref.object(resources[:ColorSpace]).first.each do |name, val|
376
+ callback(:resource_colorspace, [name, @xref.object(val).first])
356
377
  end
357
378
  end
358
379
 
359
380
  # extract any pattern information
360
- if resources['Pattern']
361
- @xref.object(resources['Pattern']).each do |name, val|
362
- callback(:resource_pattern, [name, @xref.object(val)])
381
+ if resources[:Pattern]
382
+ @xref.object(resources[:Pattern]).first.each do |name, val|
383
+ callback(:resource_pattern, [name, @xref.object(val).first])
363
384
  end
364
385
  end
365
386
 
366
387
  # extract any font information
367
- if resources['Font']
368
- @xref.object(resources['Font']).each do |label, desc|
369
- desc = @xref.object(desc)
388
+ if resources[:Font]
389
+ @xref.object(resources[:Font]).first.each do |label, desc|
390
+ desc = @xref.object(desc).first
370
391
  @fonts[label] = PDF::Reader::Font.new
371
392
  @fonts[label].label = label
372
- @fonts[label].subtype = desc['Subtype'] if desc['Subtype']
373
- @fonts[label].basefont = desc['BaseFont'] if desc['BaseFont']
374
- @fonts[label].encoding = PDF::Reader::Encoding.factory(@xref.object(desc['Encoding']))
375
- @fonts[label].descendantfonts = desc['DescendantFonts'] if desc['DescendantFonts']
376
- if desc['ToUnicode']
377
- obj, cmap = @xref.object(desc['ToUnicode'])
378
-
393
+ @fonts[label].subtype = desc[:Subtype] if desc[:Subtype]
394
+ @fonts[label].basefont = desc[:BaseFont] if desc[:BaseFont]
395
+ @fonts[label].encoding = PDF::Reader::Encoding.factory(@xref.object(desc[:Encoding]).first)
396
+ @fonts[label].descendantfonts = desc[:DescendantFonts] if desc[:DescendantFonts]
397
+ if desc[:ToUnicode]
379
398
  # this stream is a cmap
380
399
  begin
381
- @fonts[label].tounicode = PDF::Reader::CMap.new(cmap)
400
+ @fonts[label].tounicode = PDF::Reader::CMap.new(desc[:ToUnicode])
382
401
  rescue
383
402
  # if the CMap fails to parse, don't worry too much. Means we can't translate the text properly
384
403
  end
@@ -391,7 +410,13 @@ class PDF::Reader
391
410
  # Convert any PDF::Reader::Resource objects into a real object
392
411
  def resolve_references(obj)
393
412
  case obj
394
- when PDF::Reader::Reference then resolve_references(@xref.object(obj))
413
+ when PDF::Reader::Reference then
414
+ obj, stream = @xref.object(obj)
415
+ if stream
416
+ stream
417
+ else
418
+ resolve_references(obj)
419
+ end
395
420
  when Hash then obj.each { |key,val| obj[key] = resolve_references(val) }
396
421
  when Array then obj.collect { |item| resolve_references(item) }
397
422
  else
@@ -404,6 +429,21 @@ class PDF::Reader
404
429
  @receiver.send(name, *params) if @receiver.respond_to?(name)
405
430
  end
406
431
  ################################################################################
432
+ private
433
+ def utf16_to_utf8(obj)
434
+ case obj
435
+ when String then
436
+ if obj[0,2] == "\376\377"
437
+ obj[2, obj.size-2].unpack("n*").pack("U*")
438
+ else
439
+ obj
440
+ end
441
+ when Hash then obj.each { |key,val| obj[key] = utf16_to_utf8(val) }
442
+ when Array then obj.collect { |item| utf16_to_utf8(item) }
443
+ else
444
+ obj
445
+ end
446
+ end
407
447
  end
408
448
  ################################################################################
409
449
  end
@@ -60,21 +60,21 @@ class PDF::Reader
60
60
  # Takes the "Encoding" value of a Font dictionary and builds a PDF::Reader::Encoding object
61
61
  def self.factory(enc)
62
62
  if enc.kind_of?(Hash)
63
- diff = enc['Differences']
64
- enc = enc['Encoding'] || enc['BaseEncoding']
63
+ diff = enc[:Differences]
64
+ enc = enc[:Encoding] || enc[:BaseEncoding]
65
65
  elsif enc != nil
66
- enc = enc.to_s
66
+ enc = enc.to_sym
67
67
  end
68
68
 
69
69
  case enc
70
- when nil then enc = PDF::Reader::Encoding::StandardEncoding.new
71
- when "Identity-H" then enc = PDF::Reader::Encoding::IdentityH.new
72
- when "MacRomanEncoding" then enc = PDF::Reader::Encoding::MacRomanEncoding.new
73
- when "MacExpertEncoding" then enc = PDF::Reader::Encoding::MacExpertEncoding.new
74
- when "StandardEncoding" then enc = PDF::Reader::Encoding::StandardEncoding.new
75
- when "SymbolEncoding" then enc = PDF::Reader::Encoding::SymbolEncoding.new
76
- when "WinAnsiEncoding" then enc = PDF::Reader::Encoding::WinAnsiEncoding.new
77
- when "ZapfDingbatsEncoding" then enc = PDF::Reader::Encoding::ZapfDingbatsEncoding.new
70
+ when nil then enc = PDF::Reader::Encoding::StandardEncoding.new
71
+ when "Identity-H".to_sym then enc = PDF::Reader::Encoding::IdentityH.new
72
+ when :MacRomanEncoding then enc = PDF::Reader::Encoding::MacRomanEncoding.new
73
+ when :MacExpertEncoding then enc = PDF::Reader::Encoding::MacExpertEncoding.new
74
+ when :StandardEncoding then enc = PDF::Reader::Encoding::StandardEncoding.new
75
+ when :SymbolEncoding then enc = PDF::Reader::Encoding::SymbolEncoding.new
76
+ when :WinAnsiEncoding then enc = PDF::Reader::Encoding::WinAnsiEncoding.new
77
+ when :ZapfDingbatsEncoding then enc = PDF::Reader::Encoding::ZapfDingbatsEncoding.new
78
78
  else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
79
79
  end
80
80
 
@@ -104,28 +104,28 @@ class PDF::Reader
104
104
  protected :process_glyphnames
105
105
 
106
106
  class IdentityH < Encoding
107
- def to_utf8(str, map = nil)
108
-
107
+ def to_utf8(str, tounicode = nil)
108
+
109
109
  array_enc = []
110
110
 
111
111
  # iterate over string, reading it in 2 byte chunks and interpreting those
112
112
  # chunks as ints
113
- str.unpack("n*").each do |c|
114
-
113
+ str.unpack("n*").each do |num|
114
+
115
115
  # convert the int to a unicode codepoint if possible.
116
116
  # without a ToUnicode CMap, it's impossible to reliably convert this text
117
117
  # to unicode, so just replace each character with a little box. Big smacks
118
118
  # the the PDF producing app.
119
- if map && (code = map.decode(c))
119
+ if tounicode && (code = tounicode.decode(num))
120
120
  array_enc << code
121
121
  else
122
122
  array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
123
123
  end
124
124
  end
125
-
125
+
126
126
  # replace charcters that didn't convert to unicode nicely with something valid
127
127
  array_enc.collect! { |c| c ? c : PDF::Reader::Encoding::UNKNOWN_CHAR }
128
-
128
+
129
129
  # pack all our Unicode codepoints into a UTF-8 string
130
130
  ret = array_enc.pack("U*")
131
131
 
@@ -143,169 +143,175 @@ class PDF::Reader
143
143
  array_expert = self.process_differences(array_expert)
144
144
  array_enc = []
145
145
  array_expert.each do |num|
146
- case num
147
- # change necesary characters to equivilant Unicode codepoints
148
- when 0x21; array_enc << 0xF721
149
- when 0x22; array_enc << 0xF6F8 # Hungarumlautsmall
150
- when 0x23; array_enc << 0xF7A2
151
- when 0x24; array_enc << 0xF724
152
- when 0x25; array_enc << 0xF6E4
153
- when 0x26; array_enc << 0xF726
154
- when 0x27; array_enc << 0xF7B4
155
- when 0x28; array_enc << 0x207D
156
- when 0x29; array_enc << 0xF07E
157
- when 0x2A; array_enc << 0x2025
158
- when 0x2B; array_enc << 0x2024
159
- when 0x2F; array_enc << 0x2044
160
- when 0x30; array_enc << 0xF730
161
- when 0x31; array_enc << 0xF731
162
- when 0x32; array_enc << 0xF732
163
- when 0x33; array_enc << 0xF733
164
- when 0x34; array_enc << 0xF734
165
- when 0x35; array_enc << 0xF735
166
- when 0x36; array_enc << 0xF736
167
- when 0x37; array_enc << 0xF737
168
- when 0x38; array_enc << 0xF738
169
- when 0x39; array_enc << 0xF739
170
- when 0x3D; array_enc << 0xF6DE
171
- when 0x3F; array_enc << 0xF73F
172
- when 0x44; array_enc << 0xF7F0
173
- when 0x47; array_enc << 0x00BC
174
- when 0x48; array_enc << 0x00BD
175
- when 0x49; array_enc << 0x00BE
176
- when 0x4A; array_enc << 0x215B
177
- when 0x4B; array_enc << 0x215C
178
- when 0x4C; array_enc << 0x215D
179
- when 0x4D; array_enc << 0x215E
180
- when 0x4E; array_enc << 0x2153
181
- when 0x4F; array_enc << 0x2154
182
- when 0x56; array_enc << 0xFB00
183
- when 0x57; array_enc << 0xFB01
184
- when 0x58; array_enc << 0xFB02
185
- when 0x59; array_enc << 0xFB03
186
- when 0x5A; array_enc << 0xFB04
187
- when 0x5B; array_enc << 0x208D
188
- when 0x5D; array_enc << 0x208E
189
- when 0x5E; array_enc << 0xF6F6
190
- when 0x5F; array_enc << 0xF6E5
191
- when 0x60; array_enc << 0xF760
192
- when 0x61; array_enc << 0xF761
193
- when 0x62; array_enc << 0xF762
194
- when 0x63; array_enc << 0xF763
195
- when 0x64; array_enc << 0xF764
196
- when 0x65; array_enc << 0xF765
197
- when 0x66; array_enc << 0xF766
198
- when 0x67; array_enc << 0xF767
199
- when 0x68; array_enc << 0xF768
200
- when 0x69; array_enc << 0xF769
201
- when 0x6A; array_enc << 0xF76A
202
- when 0x6B; array_enc << 0xF76B
203
- when 0x6C; array_enc << 0xF76C
204
- when 0x6D; array_enc << 0xF76D
205
- when 0x6E; array_enc << 0xF76E
206
- when 0x6F; array_enc << 0xF76F
207
- when 0x70; array_enc << 0xF770
208
- when 0x71; array_enc << 0xF771
209
- when 0x72; array_enc << 0xF772
210
- when 0x73; array_enc << 0xF773
211
- when 0x74; array_enc << 0xF774
212
- when 0x75; array_enc << 0xF775
213
- when 0x76; array_enc << 0xF776
214
- when 0x77; array_enc << 0xF777
215
- when 0x78; array_enc << 0xF778
216
- when 0x79; array_enc << 0xF779
217
- when 0x7A; array_enc << 0xF77A
218
- when 0x7B; array_enc << 0x20A1
219
- when 0x7C; array_enc << 0xF6DC
220
- when 0x7D; array_enc << 0xF6DD
221
- when 0x7E; array_enc << 0xF6FE
222
- when 0x81; array_enc << 0xF6E9
223
- when 0x82; array_enc << 0xF6E0
224
- when 0x87; array_enc << 0xF7E1 # Acircumflexsmall
225
- when 0x88; array_enc << 0xF7E0
226
- when 0x89; array_enc << 0xF7E2 # Acutesmall
227
- when 0x8A; array_enc << 0xF7E4
228
- when 0x8B; array_enc << 0xF7E3
229
- when 0x8C; array_enc << 0xF7E5
230
- when 0x8D; array_enc << 0xF7E7
231
- when 0x8E; array_enc << 0xF7E9
232
- when 0x8F; array_enc << 0xF7E8
233
- when 0x90; array_enc << 0xF7E4
234
- when 0x91; array_enc << 0xF7EB
235
- when 0x92; array_enc << 0xF7ED
236
- when 0x93; array_enc << 0xF7EC
237
- when 0x94; array_enc << 0xF7EE
238
- when 0x95; array_enc << 0xF7EF
239
- when 0x96; array_enc << 0xF7F1
240
- when 0x97; array_enc << 0xF7F3
241
- when 0x98; array_enc << 0xF7F2
242
- when 0x99; array_enc << 0xF7F4
243
- when 0x9A; array_enc << 0xF7F6
244
- when 0x9B; array_enc << 0xF7F5
245
- when 0x9C; array_enc << 0xF7FA
246
- when 0x9D; array_enc << 0xF7F9
247
- when 0x9E; array_enc << 0xF7FB
248
- when 0x9F; array_enc << 0xF7FC
249
- when 0xA1; array_enc << 0x2078
250
- when 0xA2; array_enc << 0x2084
251
- when 0xA3; array_enc << 0x2083
252
- when 0xA4; array_enc << 0x2086
253
- when 0xA5; array_enc << 0x2088
254
- when 0xA6; array_enc << 0x2087
255
- when 0xA7; array_enc << 0xF6FD
256
- when 0xA9; array_enc << 0xF6DF
257
- when 0xAA; array_enc << 0x2082
258
- when 0xAC; array_enc << 0xF7A8
259
- when 0xAE; array_enc << 0xF6F5
260
- when 0xAF; array_enc << 0xF6F0
261
- when 0xB0; array_enc << 0x2085
262
- when 0xB2; array_enc << 0xF6E1
263
- when 0xB3; array_enc << 0xF6E7
264
- when 0xB4; array_enc << 0xF7FD
265
- when 0xB6; array_enc << 0xF6E3
266
- when 0xB9; array_enc << 0xF7FE
267
- when 0xBB; array_enc << 0x2089
268
- when 0xBC; array_enc << 0x2080
269
- when 0xBD; array_enc << 0xF6FF
270
- when 0xBE; array_enc << 0xF7E6 # AEsmall
271
- when 0xBF; array_enc << 0xF7F8
272
- when 0xC0; array_enc << 0xF7BF
273
- when 0xC1; array_enc << 0x2081
274
- when 0xC2; array_enc << 0xF6F9
275
- when 0xC9; array_enc << 0xF7B8
276
- when 0xCF; array_enc << 0xF6FA
277
- when 0xD0; array_enc << 0x2012
278
- when 0xD1; array_enc << 0xF6E6
279
- when 0xD6; array_enc << 0xF7A1
280
- when 0xD8; array_enc << 0xF7FF
281
- when 0xDA; array_enc << 0x00B9
282
- when 0xDB; array_enc << 0x00B2
283
- when 0xDC; array_enc << 0x00B3
284
- when 0xDD; array_enc << 0x2074
285
- when 0xDE; array_enc << 0x2075
286
- when 0xDF; array_enc << 0x2076
287
- when 0xE0; array_enc << 0x2077
288
- when 0xE1; array_enc << 0x2079
289
- when 0xE2; array_enc << 0x2070
290
- when 0xE4; array_enc << 0xF6EC
291
- when 0xE5; array_enc << 0xF6F1
292
- when 0xE6; array_enc << 0xF6F3
293
- when 0xE9; array_enc << 0xF6ED
294
- when 0xEA; array_enc << 0xF6F2
295
- when 0xEB; array_enc << 0xF6EB
296
- when 0xF1; array_enc << 0xF6EE
297
- when 0xF2; array_enc << 0xF6FB
298
- when 0xF3; array_enc << 0xF6F4
299
- when 0xF4; array_enc << 0xF7AF
300
- when 0xF5; array_enc << 0xF6EF
301
- when 0xF6; array_enc << 0x207F
302
- when 0xF7; array_enc << 0xF6EF
303
- when 0xF8; array_enc << 0xF6E2
304
- when 0xF9; array_enc << 0xF6E8
305
- when 0xFA; array_enc << 0xF6F7
306
- when 0xFB; array_enc << 0xF6FC
146
+ if tounicode && (code = tounicode.decode(num))
147
+ array_enc << code
148
+ elsif tounicode
149
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
307
150
  else
308
- array_enc << num
151
+ case num
152
+ # change necesary characters to equivilant Unicode codepoints
153
+ when 0x21; array_enc << 0xF721
154
+ when 0x22; array_enc << 0xF6F8 # Hungarumlautsmall
155
+ when 0x23; array_enc << 0xF7A2
156
+ when 0x24; array_enc << 0xF724
157
+ when 0x25; array_enc << 0xF6E4
158
+ when 0x26; array_enc << 0xF726
159
+ when 0x27; array_enc << 0xF7B4
160
+ when 0x28; array_enc << 0x207D
161
+ when 0x29; array_enc << 0xF07E
162
+ when 0x2A; array_enc << 0x2025
163
+ when 0x2B; array_enc << 0x2024
164
+ when 0x2F; array_enc << 0x2044
165
+ when 0x30; array_enc << 0xF730
166
+ when 0x31; array_enc << 0xF731
167
+ when 0x32; array_enc << 0xF732
168
+ when 0x33; array_enc << 0xF733
169
+ when 0x34; array_enc << 0xF734
170
+ when 0x35; array_enc << 0xF735
171
+ when 0x36; array_enc << 0xF736
172
+ when 0x37; array_enc << 0xF737
173
+ when 0x38; array_enc << 0xF738
174
+ when 0x39; array_enc << 0xF739
175
+ when 0x3D; array_enc << 0xF6DE
176
+ when 0x3F; array_enc << 0xF73F
177
+ when 0x44; array_enc << 0xF7F0
178
+ when 0x47; array_enc << 0x00BC
179
+ when 0x48; array_enc << 0x00BD
180
+ when 0x49; array_enc << 0x00BE
181
+ when 0x4A; array_enc << 0x215B
182
+ when 0x4B; array_enc << 0x215C
183
+ when 0x4C; array_enc << 0x215D
184
+ when 0x4D; array_enc << 0x215E
185
+ when 0x4E; array_enc << 0x2153
186
+ when 0x4F; array_enc << 0x2154
187
+ when 0x56; array_enc << 0xFB00
188
+ when 0x57; array_enc << 0xFB01
189
+ when 0x58; array_enc << 0xFB02
190
+ when 0x59; array_enc << 0xFB03
191
+ when 0x5A; array_enc << 0xFB04
192
+ when 0x5B; array_enc << 0x208D
193
+ when 0x5D; array_enc << 0x208E
194
+ when 0x5E; array_enc << 0xF6F6
195
+ when 0x5F; array_enc << 0xF6E5
196
+ when 0x60; array_enc << 0xF760
197
+ when 0x61; array_enc << 0xF761
198
+ when 0x62; array_enc << 0xF762
199
+ when 0x63; array_enc << 0xF763
200
+ when 0x64; array_enc << 0xF764
201
+ when 0x65; array_enc << 0xF765
202
+ when 0x66; array_enc << 0xF766
203
+ when 0x67; array_enc << 0xF767
204
+ when 0x68; array_enc << 0xF768
205
+ when 0x69; array_enc << 0xF769
206
+ when 0x6A; array_enc << 0xF76A
207
+ when 0x6B; array_enc << 0xF76B
208
+ when 0x6C; array_enc << 0xF76C
209
+ when 0x6D; array_enc << 0xF76D
210
+ when 0x6E; array_enc << 0xF76E
211
+ when 0x6F; array_enc << 0xF76F
212
+ when 0x70; array_enc << 0xF770
213
+ when 0x71; array_enc << 0xF771
214
+ when 0x72; array_enc << 0xF772
215
+ when 0x73; array_enc << 0xF773
216
+ when 0x74; array_enc << 0xF774
217
+ when 0x75; array_enc << 0xF775
218
+ when 0x76; array_enc << 0xF776
219
+ when 0x77; array_enc << 0xF777
220
+ when 0x78; array_enc << 0xF778
221
+ when 0x79; array_enc << 0xF779
222
+ when 0x7A; array_enc << 0xF77A
223
+ when 0x7B; array_enc << 0x20A1
224
+ when 0x7C; array_enc << 0xF6DC
225
+ when 0x7D; array_enc << 0xF6DD
226
+ when 0x7E; array_enc << 0xF6FE
227
+ when 0x81; array_enc << 0xF6E9
228
+ when 0x82; array_enc << 0xF6E0
229
+ when 0x87; array_enc << 0xF7E1 # Acircumflexsmall
230
+ when 0x88; array_enc << 0xF7E0
231
+ when 0x89; array_enc << 0xF7E2 # Acutesmall
232
+ when 0x8A; array_enc << 0xF7E4
233
+ when 0x8B; array_enc << 0xF7E3
234
+ when 0x8C; array_enc << 0xF7E5
235
+ when 0x8D; array_enc << 0xF7E7
236
+ when 0x8E; array_enc << 0xF7E9
237
+ when 0x8F; array_enc << 0xF7E8
238
+ when 0x90; array_enc << 0xF7E4
239
+ when 0x91; array_enc << 0xF7EB
240
+ when 0x92; array_enc << 0xF7ED
241
+ when 0x93; array_enc << 0xF7EC
242
+ when 0x94; array_enc << 0xF7EE
243
+ when 0x95; array_enc << 0xF7EF
244
+ when 0x96; array_enc << 0xF7F1
245
+ when 0x97; array_enc << 0xF7F3
246
+ when 0x98; array_enc << 0xF7F2
247
+ when 0x99; array_enc << 0xF7F4
248
+ when 0x9A; array_enc << 0xF7F6
249
+ when 0x9B; array_enc << 0xF7F5
250
+ when 0x9C; array_enc << 0xF7FA
251
+ when 0x9D; array_enc << 0xF7F9
252
+ when 0x9E; array_enc << 0xF7FB
253
+ when 0x9F; array_enc << 0xF7FC
254
+ when 0xA1; array_enc << 0x2078
255
+ when 0xA2; array_enc << 0x2084
256
+ when 0xA3; array_enc << 0x2083
257
+ when 0xA4; array_enc << 0x2086
258
+ when 0xA5; array_enc << 0x2088
259
+ when 0xA6; array_enc << 0x2087
260
+ when 0xA7; array_enc << 0xF6FD
261
+ when 0xA9; array_enc << 0xF6DF
262
+ when 0xAA; array_enc << 0x2082
263
+ when 0xAC; array_enc << 0xF7A8
264
+ when 0xAE; array_enc << 0xF6F5
265
+ when 0xAF; array_enc << 0xF6F0
266
+ when 0xB0; array_enc << 0x2085
267
+ when 0xB2; array_enc << 0xF6E1
268
+ when 0xB3; array_enc << 0xF6E7
269
+ when 0xB4; array_enc << 0xF7FD
270
+ when 0xB6; array_enc << 0xF6E3
271
+ when 0xB9; array_enc << 0xF7FE
272
+ when 0xBB; array_enc << 0x2089
273
+ when 0xBC; array_enc << 0x2080
274
+ when 0xBD; array_enc << 0xF6FF
275
+ when 0xBE; array_enc << 0xF7E6 # AEsmall
276
+ when 0xBF; array_enc << 0xF7F8
277
+ when 0xC0; array_enc << 0xF7BF
278
+ when 0xC1; array_enc << 0x2081
279
+ when 0xC2; array_enc << 0xF6F9
280
+ when 0xC9; array_enc << 0xF7B8
281
+ when 0xCF; array_enc << 0xF6FA
282
+ when 0xD0; array_enc << 0x2012
283
+ when 0xD1; array_enc << 0xF6E6
284
+ when 0xD6; array_enc << 0xF7A1
285
+ when 0xD8; array_enc << 0xF7FF
286
+ when 0xDA; array_enc << 0x00B9
287
+ when 0xDB; array_enc << 0x00B2
288
+ when 0xDC; array_enc << 0x00B3
289
+ when 0xDD; array_enc << 0x2074
290
+ when 0xDE; array_enc << 0x2075
291
+ when 0xDF; array_enc << 0x2076
292
+ when 0xE0; array_enc << 0x2077
293
+ when 0xE1; array_enc << 0x2079
294
+ when 0xE2; array_enc << 0x2070
295
+ when 0xE4; array_enc << 0xF6EC
296
+ when 0xE5; array_enc << 0xF6F1
297
+ when 0xE6; array_enc << 0xF6F3
298
+ when 0xE9; array_enc << 0xF6ED
299
+ when 0xEA; array_enc << 0xF6F2
300
+ when 0xEB; array_enc << 0xF6EB
301
+ when 0xF1; array_enc << 0xF6EE
302
+ when 0xF2; array_enc << 0xF6FB
303
+ when 0xF3; array_enc << 0xF6F4
304
+ when 0xF4; array_enc << 0xF7AF
305
+ when 0xF5; array_enc << 0xF6EF
306
+ when 0xF6; array_enc << 0x207F
307
+ when 0xF7; array_enc << 0xF6EF
308
+ when 0xF8; array_enc << 0xF6E2
309
+ when 0xF9; array_enc << 0xF6E8
310
+ when 0xFA; array_enc << 0xF6F7
311
+ when 0xFB; array_enc << 0xF6FC
312
+ else
313
+ array_enc << num
314
+ end
309
315
  end
310
316
  end
311
317
 
@@ -314,7 +320,7 @@ class PDF::Reader
314
320
 
315
321
  # replace charcters that didn't convert to unicode nicely with something valid
316
322
  array_enc.collect! { |c| c ? c : PDF::Reader::Encoding::UNKNOWN_CHAR }
317
-
323
+
318
324
  # pack all our Unicode codepoints into a UTF-8 string
319
325
  ret = array_enc.pack("U*")
320
326
 
@@ -335,138 +341,144 @@ class PDF::Reader
335
341
  array_mac = self.process_differences(array_mac)
336
342
  array_enc = []
337
343
  array_mac.each do |num|
338
- case num
339
- # change necesary characters to equivilant Unicode codepoints
340
- when 0x80; array_enc << 0x00C4
341
- when 0x81; array_enc << 0x00C5
342
- when 0x82; array_enc << 0x00C7
343
- when 0x83; array_enc << 0x00C9
344
- when 0x84; array_enc << 0x00D1
345
- when 0x85; array_enc << 0x00D6
346
- when 0x86; array_enc << 0x00DC
347
- when 0x87; array_enc << 0x00E1
348
- when 0x88; array_enc << 0x00E0
349
- when 0x89; array_enc << 0x00E2
350
- when 0x8A; array_enc << 0x00E4
351
- when 0x8B; array_enc << 0x00E3
352
- when 0x8C; array_enc << 0x00E5
353
- when 0x8D; array_enc << 0x00E7
354
- when 0x8E; array_enc << 0x00E9
355
- when 0x8F; array_enc << 0x00E8
356
- when 0x90; array_enc << 0x00EA
357
- when 0x91; array_enc << 0x00EB
358
- when 0x92; array_enc << 0x00ED
359
- when 0x93; array_enc << 0x00EC
360
- when 0x94; array_enc << 0x00EE
361
- when 0x95; array_enc << 0x00EF
362
- when 0x96; array_enc << 0x00F1
363
- when 0x97; array_enc << 0x00F3
364
- when 0x98; array_enc << 0x00F2
365
- when 0x99; array_enc << 0x00F4
366
- when 0x9A; array_enc << 0x00F6
367
- when 0x9B; array_enc << 0x00F5
368
- when 0x9C; array_enc << 0x00FA
369
- when 0x9D; array_enc << 0x00F9
370
- when 0x9E; array_enc << 0x00FB
371
- when 0x9F; array_enc << 0x00FC
372
- when 0xA0; array_enc << 0x2020
373
- when 0xA1; array_enc << 0x00B0
374
- when 0xA2; array_enc << 0x00A2
375
- when 0xA3; array_enc << 0x00A3
376
- when 0xA4; array_enc << 0x00A7
377
- when 0xA5; array_enc << 0x2022
378
- when 0xA6; array_enc << 0x00B6
379
- when 0xA7; array_enc << 0x00DF
380
- when 0xA8; array_enc << 0x00AE
381
- when 0xA9; array_enc << 0x00A9
382
- when 0xAA; array_enc << 0x2122
383
- when 0xAB; array_enc << 0x00B4
384
- when 0xAC; array_enc << 0x00A8
385
- when 0xAD; array_enc << 0x2260
386
- when 0xAE; array_enc << 0x00C6
387
- when 0xAF; array_enc << 0x00D8
388
- when 0xB0; array_enc << 0x221E
389
- when 0xB1; array_enc << 0x00B1
390
- when 0xB2; array_enc << 0x2264
391
- when 0xB3; array_enc << 0x2265
392
- when 0xB4; array_enc << 0x00A5
393
- when 0xB5; array_enc << 0x00B5
394
- when 0xB6; array_enc << 0x2202
395
- when 0xB7; array_enc << 0x2211
396
- when 0xB8; array_enc << 0x220F
397
- when 0xB9; array_enc << 0x03C0
398
- when 0xBA; array_enc << 0x222B
399
- when 0xBB; array_enc << 0x00AA
400
- when 0xBC; array_enc << 0x00BA
401
- when 0xBD; array_enc << 0x03A9
402
- when 0xBE; array_enc << 0x00E6
403
- when 0xBF; array_enc << 0x00F8
404
- when 0xC0; array_enc << 0x00BF
405
- when 0xC1; array_enc << 0x00A1
406
- when 0xC2; array_enc << 0x00AC
407
- when 0xC3; array_enc << 0x221A
408
- when 0xC4; array_enc << 0x0192
409
- when 0xC5; array_enc << 0x2248
410
- when 0xC6; array_enc << 0x2206
411
- when 0xC7; array_enc << 0x00AB
412
- when 0xC8; array_enc << 0x00BB
413
- when 0xC9; array_enc << 0x2026
414
- when 0xCA; array_enc << 0x00A0
415
- when 0xCB; array_enc << 0x00C0
416
- when 0xCC; array_enc << 0x00C3
417
- when 0xCD; array_enc << 0x00D5
418
- when 0xCE; array_enc << 0x0152
419
- when 0xCF; array_enc << 0x0153
420
- when 0xD0; array_enc << 0x2013
421
- when 0xD1; array_enc << 0x2014
422
- when 0xD2; array_enc << 0x201C
423
- when 0xD3; array_enc << 0x201D
424
- when 0xD4; array_enc << 0x2018
425
- when 0xD5; array_enc << 0x2019
426
- when 0xD6; array_enc << 0x00F7
427
- when 0xD7; array_enc << 0x25CA
428
- when 0xD8; array_enc << 0x00FF
429
- when 0xD9; array_enc << 0x0178
430
- when 0xDA; array_enc << 0x2044
431
- when 0xDB; array_enc << 0x20AC
432
- when 0xDC; array_enc << 0x2039
433
- when 0xDD; array_enc << 0x203A
434
- when 0xDE; array_enc << 0xFB01
435
- when 0xDF; array_enc << 0xFB02
436
- when 0xE0; array_enc << 0x2021
437
- when 0xE1; array_enc << 0x00B7
438
- when 0xE2; array_enc << 0x201A
439
- when 0xE3; array_enc << 0x201E
440
- when 0xE4; array_enc << 0x2030
441
- when 0xE5; array_enc << 0x00C2
442
- when 0xE6; array_enc << 0x00CA
443
- when 0xE7; array_enc << 0x00C1
444
- when 0xE8; array_enc << 0x00CB
445
- when 0xE9; array_enc << 0x00C8
446
- when 0xEA; array_enc << 0x00CD
447
- when 0xEB; array_enc << 0x00CE
448
- when 0xEC; array_enc << 0x00CF
449
- when 0xED; array_enc << 0x00CC
450
- when 0xEE; array_enc << 0x00D3
451
- when 0xEF; array_enc << 0x00D4
452
- when 0xF0; array_enc << 0xF8FF
453
- when 0xF1; array_enc << 0x00D2
454
- when 0xF2; array_enc << 0x00DA
455
- when 0xF3; array_enc << 0x00D8
456
- when 0xF4; array_enc << 0x00D9
457
- when 0xF5; array_enc << 0x0131
458
- when 0xF6; array_enc << 0x02C6
459
- when 0xF7; array_enc << 0x02DC
460
- when 0xF8; array_enc << 0x00AF
461
- when 0xF9; array_enc << 0x02D8
462
- when 0xFA; array_enc << 0x02D9
463
- when 0xFB; array_enc << 0x02DA
464
- when 0xFC; array_enc << 0x00B8
465
- when 0xFD; array_enc << 0x02DD
466
- when 0xFE; array_enc << 0x02DB
467
- when 0xFF; array_enc << 0x02C7
344
+ if tounicode && (code = tounicode.decode(num))
345
+ array_enc << code
346
+ elsif tounicode
347
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
468
348
  else
469
- array_enc << num
349
+ case num
350
+ # change necesary characters to equivilant Unicode codepoints
351
+ when 0x80; array_enc << 0x00C4
352
+ when 0x81; array_enc << 0x00C5
353
+ when 0x82; array_enc << 0x00C7
354
+ when 0x83; array_enc << 0x00C9
355
+ when 0x84; array_enc << 0x00D1
356
+ when 0x85; array_enc << 0x00D6
357
+ when 0x86; array_enc << 0x00DC
358
+ when 0x87; array_enc << 0x00E1
359
+ when 0x88; array_enc << 0x00E0
360
+ when 0x89; array_enc << 0x00E2
361
+ when 0x8A; array_enc << 0x00E4
362
+ when 0x8B; array_enc << 0x00E3
363
+ when 0x8C; array_enc << 0x00E5
364
+ when 0x8D; array_enc << 0x00E7
365
+ when 0x8E; array_enc << 0x00E9
366
+ when 0x8F; array_enc << 0x00E8
367
+ when 0x90; array_enc << 0x00EA
368
+ when 0x91; array_enc << 0x00EB
369
+ when 0x92; array_enc << 0x00ED
370
+ when 0x93; array_enc << 0x00EC
371
+ when 0x94; array_enc << 0x00EE
372
+ when 0x95; array_enc << 0x00EF
373
+ when 0x96; array_enc << 0x00F1
374
+ when 0x97; array_enc << 0x00F3
375
+ when 0x98; array_enc << 0x00F2
376
+ when 0x99; array_enc << 0x00F4
377
+ when 0x9A; array_enc << 0x00F6
378
+ when 0x9B; array_enc << 0x00F5
379
+ when 0x9C; array_enc << 0x00FA
380
+ when 0x9D; array_enc << 0x00F9
381
+ when 0x9E; array_enc << 0x00FB
382
+ when 0x9F; array_enc << 0x00FC
383
+ when 0xA0; array_enc << 0x2020
384
+ when 0xA1; array_enc << 0x00B0
385
+ when 0xA2; array_enc << 0x00A2
386
+ when 0xA3; array_enc << 0x00A3
387
+ when 0xA4; array_enc << 0x00A7
388
+ when 0xA5; array_enc << 0x2022
389
+ when 0xA6; array_enc << 0x00B6
390
+ when 0xA7; array_enc << 0x00DF
391
+ when 0xA8; array_enc << 0x00AE
392
+ when 0xA9; array_enc << 0x00A9
393
+ when 0xAA; array_enc << 0x2122
394
+ when 0xAB; array_enc << 0x00B4
395
+ when 0xAC; array_enc << 0x00A8
396
+ when 0xAD; array_enc << 0x2260
397
+ when 0xAE; array_enc << 0x00C6
398
+ when 0xAF; array_enc << 0x00D8
399
+ when 0xB0; array_enc << 0x221E
400
+ when 0xB1; array_enc << 0x00B1
401
+ when 0xB2; array_enc << 0x2264
402
+ when 0xB3; array_enc << 0x2265
403
+ when 0xB4; array_enc << 0x00A5
404
+ when 0xB5; array_enc << 0x00B5
405
+ when 0xB6; array_enc << 0x2202
406
+ when 0xB7; array_enc << 0x2211
407
+ when 0xB8; array_enc << 0x220F
408
+ when 0xB9; array_enc << 0x03C0
409
+ when 0xBA; array_enc << 0x222B
410
+ when 0xBB; array_enc << 0x00AA
411
+ when 0xBC; array_enc << 0x00BA
412
+ when 0xBD; array_enc << 0x03A9
413
+ when 0xBE; array_enc << 0x00E6
414
+ when 0xBF; array_enc << 0x00F8
415
+ when 0xC0; array_enc << 0x00BF
416
+ when 0xC1; array_enc << 0x00A1
417
+ when 0xC2; array_enc << 0x00AC
418
+ when 0xC3; array_enc << 0x221A
419
+ when 0xC4; array_enc << 0x0192
420
+ when 0xC5; array_enc << 0x2248
421
+ when 0xC6; array_enc << 0x2206
422
+ when 0xC7; array_enc << 0x00AB
423
+ when 0xC8; array_enc << 0x00BB
424
+ when 0xC9; array_enc << 0x2026
425
+ when 0xCA; array_enc << 0x00A0
426
+ when 0xCB; array_enc << 0x00C0
427
+ when 0xCC; array_enc << 0x00C3
428
+ when 0xCD; array_enc << 0x00D5
429
+ when 0xCE; array_enc << 0x0152
430
+ when 0xCF; array_enc << 0x0153
431
+ when 0xD0; array_enc << 0x2013
432
+ when 0xD1; array_enc << 0x2014
433
+ when 0xD2; array_enc << 0x201C
434
+ when 0xD3; array_enc << 0x201D
435
+ when 0xD4; array_enc << 0x2018
436
+ when 0xD5; array_enc << 0x2019
437
+ when 0xD6; array_enc << 0x00F7
438
+ when 0xD7; array_enc << 0x25CA
439
+ when 0xD8; array_enc << 0x00FF
440
+ when 0xD9; array_enc << 0x0178
441
+ when 0xDA; array_enc << 0x2044
442
+ when 0xDB; array_enc << 0x20AC
443
+ when 0xDC; array_enc << 0x2039
444
+ when 0xDD; array_enc << 0x203A
445
+ when 0xDE; array_enc << 0xFB01
446
+ when 0xDF; array_enc << 0xFB02
447
+ when 0xE0; array_enc << 0x2021
448
+ when 0xE1; array_enc << 0x00B7
449
+ when 0xE2; array_enc << 0x201A
450
+ when 0xE3; array_enc << 0x201E
451
+ when 0xE4; array_enc << 0x2030
452
+ when 0xE5; array_enc << 0x00C2
453
+ when 0xE6; array_enc << 0x00CA
454
+ when 0xE7; array_enc << 0x00C1
455
+ when 0xE8; array_enc << 0x00CB
456
+ when 0xE9; array_enc << 0x00C8
457
+ when 0xEA; array_enc << 0x00CD
458
+ when 0xEB; array_enc << 0x00CE
459
+ when 0xEC; array_enc << 0x00CF
460
+ when 0xED; array_enc << 0x00CC
461
+ when 0xEE; array_enc << 0x00D3
462
+ when 0xEF; array_enc << 0x00D4
463
+ when 0xF0; array_enc << 0xF8FF
464
+ when 0xF1; array_enc << 0x00D2
465
+ when 0xF2; array_enc << 0x00DA
466
+ when 0xF3; array_enc << 0x00D8
467
+ when 0xF4; array_enc << 0x00D9
468
+ when 0xF5; array_enc << 0x0131
469
+ when 0xF6; array_enc << 0x02C6
470
+ when 0xF7; array_enc << 0x02DC
471
+ when 0xF8; array_enc << 0x00AF
472
+ when 0xF9; array_enc << 0x02D8
473
+ when 0xFA; array_enc << 0x02D9
474
+ when 0xFB; array_enc << 0x02DA
475
+ when 0xFC; array_enc << 0x00B8
476
+ when 0xFD; array_enc << 0x02DD
477
+ when 0xFE; array_enc << 0x02DB
478
+ when 0xFF; array_enc << 0x02C7
479
+ else
480
+ array_enc << num
481
+ end
470
482
  end
471
483
  end
472
484
 
@@ -475,7 +487,7 @@ class PDF::Reader
475
487
 
476
488
  # replace charcters that didn't convert to unicode nicely with something valid
477
489
  array_enc.collect! { |c| c ? c : PDF::Reader::Encoding::UNKNOWN_CHAR }
478
-
490
+
479
491
  # pack all our Unicode codepoints into a UTF-8 string
480
492
  ret = array_enc.pack("U*")
481
493
 
@@ -495,62 +507,68 @@ class PDF::Reader
495
507
  array_std = self.process_differences(array_std)
496
508
  array_enc = []
497
509
  array_std.each do |num|
498
- case num
499
- when 0x27; array_enc << 0x2019
500
- when 0x60; array_enc << 0x2018
501
- when 0xA4; array_enc << 0x2044
502
- when 0xA6; array_enc << 0x0192
503
- when 0xA8; array_enc << 0x00A4
504
- when 0xA9; array_enc << 0x0027
505
- when 0xAA; array_enc << 0x201C
506
- when 0xAC; array_enc << 0x2039
507
- when 0xAD; array_enc << 0x203A
508
- when 0xAE; array_enc << 0xFB01
509
- when 0xAF; array_enc << 0xFB02
510
- when 0xB1; array_enc << 0x2013
511
- when 0xB2; array_enc << 0x2020
512
- when 0xB3; array_enc << 0x2021
513
- when 0xB4; array_enc << 0x00B7
514
- when 0xB7; array_enc << 0x2022
515
- when 0xB8; array_enc << 0x201A
516
- when 0xB9; array_enc << 0x201E
517
- when 0xBA; array_enc << 0x201D
518
- when 0xBC; array_enc << 0x2026
519
- when 0xBD; array_enc << 0x2030
520
- when 0xC1; array_enc << 0x0060
521
- when 0xC2; array_enc << 0x00B4
522
- when 0xC3; array_enc << 0x02C6
523
- when 0xC4; array_enc << 0x02DC
524
- when 0xC5; array_enc << 0x00AF
525
- when 0xC6; array_enc << 0x02D8
526
- when 0xC7; array_enc << 0x02D9
527
- when 0xC8; array_enc << 0x00A8
528
- when 0xCA; array_enc << 0x02DA
529
- when 0xCB; array_enc << 0x00B8
530
- when 0xCD; array_enc << 0x02DD
531
- when 0xCE; array_enc << 0x02DB
532
- when 0xCF; array_enc << 0x02C7
533
- when 0xD0; array_enc << 0x2014
534
- when 0xE1; array_enc << 0x00C6
535
- when 0xE3; array_enc << 0x00AA
536
- when 0xE8; array_enc << 0x0141
537
- when 0xE9; array_enc << 0x00D8
538
- when 0xEA; array_enc << 0x0152
539
- when 0xEB; array_enc << 0x00BA
540
- when 0xF1; array_enc << 0x00E6
541
- when 0xF5; array_enc << 0x0131
542
- when 0xF8; array_enc << 0x0142
543
- when 0xF9; array_enc << 0x00F8
544
- when 0xFA; array_enc << 0x0153
545
- when 0xFB; array_enc << 0x00DF
510
+ if tounicode && (code = tounicode.decode(num))
511
+ array_enc << code
512
+ elsif tounicode
513
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
546
514
  else
547
- array_enc << num
515
+ case num
516
+ when 0x27; array_enc << 0x2019
517
+ when 0x60; array_enc << 0x2018
518
+ when 0xA4; array_enc << 0x2044
519
+ when 0xA6; array_enc << 0x0192
520
+ when 0xA8; array_enc << 0x00A4
521
+ when 0xA9; array_enc << 0x0027
522
+ when 0xAA; array_enc << 0x201C
523
+ when 0xAC; array_enc << 0x2039
524
+ when 0xAD; array_enc << 0x203A
525
+ when 0xAE; array_enc << 0xFB01
526
+ when 0xAF; array_enc << 0xFB02
527
+ when 0xB1; array_enc << 0x2013
528
+ when 0xB2; array_enc << 0x2020
529
+ when 0xB3; array_enc << 0x2021
530
+ when 0xB4; array_enc << 0x00B7
531
+ when 0xB7; array_enc << 0x2022
532
+ when 0xB8; array_enc << 0x201A
533
+ when 0xB9; array_enc << 0x201E
534
+ when 0xBA; array_enc << 0x201D
535
+ when 0xBC; array_enc << 0x2026
536
+ when 0xBD; array_enc << 0x2030
537
+ when 0xC1; array_enc << 0x0060
538
+ when 0xC2; array_enc << 0x00B4
539
+ when 0xC3; array_enc << 0x02C6
540
+ when 0xC4; array_enc << 0x02DC
541
+ when 0xC5; array_enc << 0x00AF
542
+ when 0xC6; array_enc << 0x02D8
543
+ when 0xC7; array_enc << 0x02D9
544
+ when 0xC8; array_enc << 0x00A8
545
+ when 0xCA; array_enc << 0x02DA
546
+ when 0xCB; array_enc << 0x00B8
547
+ when 0xCD; array_enc << 0x02DD
548
+ when 0xCE; array_enc << 0x02DB
549
+ when 0xCF; array_enc << 0x02C7
550
+ when 0xD0; array_enc << 0x2014
551
+ when 0xE1; array_enc << 0x00C6
552
+ when 0xE3; array_enc << 0x00AA
553
+ when 0xE8; array_enc << 0x0141
554
+ when 0xE9; array_enc << 0x00D8
555
+ when 0xEA; array_enc << 0x0152
556
+ when 0xEB; array_enc << 0x00BA
557
+ when 0xF1; array_enc << 0x00E6
558
+ when 0xF5; array_enc << 0x0131
559
+ when 0xF8; array_enc << 0x0142
560
+ when 0xF9; array_enc << 0x00F8
561
+ when 0xFA; array_enc << 0x0153
562
+ when 0xFB; array_enc << 0x00DF
563
+ else
564
+ array_enc << num
565
+ end
548
566
  end
549
567
  end
550
-
568
+
551
569
  # convert any glyph names to unicode codepoints
552
570
  array_enc = self.process_glyphnames(array_enc)
553
-
571
+
554
572
  # replace charcters that didn't convert to unicode nicely with something valid
555
573
  array_enc.collect! { |c| c ? c : PDF::Reader::Encoding::UNKNOWN_CHAR }
556
574
 
@@ -571,163 +589,169 @@ class PDF::Reader
571
589
  array_symbol = self.process_differences(array_symbol)
572
590
  array_enc = []
573
591
  array_symbol.each do |num|
574
- case num
575
- when 0x22; array_enc << 0x2200
576
- when 0x24; array_enc << 0x2203
577
- when 0x27; array_enc << 0x220B
578
- when 0x2A; array_enc << 0x2217
579
- when 0x2D; array_enc << 0x2212
580
- when 0x40; array_enc << 0x2245
581
- when 0x41; array_enc << 0x0391
582
- when 0x42; array_enc << 0x0392
583
- when 0x43; array_enc << 0x03A7
584
- when 0x44; array_enc << 0x0394
585
- when 0x45; array_enc << 0x0395
586
- when 0x46; array_enc << 0x03A6
587
- when 0x47; array_enc << 0x0393
588
- when 0x48; array_enc << 0x0397
589
- when 0x49; array_enc << 0x0399
590
- when 0x4A; array_enc << 0x03D1
591
- when 0x4B; array_enc << 0x039A
592
- when 0x4C; array_enc << 0x039B
593
- when 0x4D; array_enc << 0x039C
594
- when 0x4E; array_enc << 0x039D
595
- when 0x4F; array_enc << 0x039F
596
- when 0x50; array_enc << 0x03A0
597
- when 0x51; array_enc << 0x0398
598
- when 0x52; array_enc << 0x03A1
599
- when 0x53; array_enc << 0x03A3
600
- when 0x54; array_enc << 0x03A4
601
- when 0x55; array_enc << 0x03A5
602
- when 0x56; array_enc << 0x03C2
603
- when 0x57; array_enc << 0x03A9
604
- when 0x58; array_enc << 0x039E
605
- when 0x59; array_enc << 0x03A8
606
- when 0x5A; array_enc << 0x0396
607
- when 0x5C; array_enc << 0x2234
608
- when 0x5E; array_enc << 0x22A5
609
- when 0x60; array_enc << 0xF8E5
610
- when 0x61; array_enc << 0x03B1
611
- when 0x62; array_enc << 0x03B2
612
- when 0x63; array_enc << 0x03C7
613
- when 0x64; array_enc << 0x03B4
614
- when 0x65; array_enc << 0x03B5
615
- when 0x66; array_enc << 0x03C6
616
- when 0x67; array_enc << 0x03B3
617
- when 0x68; array_enc << 0x03B7
618
- when 0x69; array_enc << 0x03B9
619
- when 0x6A; array_enc << 0x03D5
620
- when 0x6B; array_enc << 0x03BA
621
- when 0x6C; array_enc << 0x03BB
622
- when 0x6D; array_enc << 0x03BC
623
- when 0x6E; array_enc << 0x03BD
624
- when 0x6F; array_enc << 0x03BF
625
- when 0x70; array_enc << 0x03C0
626
- when 0x71; array_enc << 0x03B8
627
- when 0x72; array_enc << 0x03C1
628
- when 0x73; array_enc << 0x03C3
629
- when 0x74; array_enc << 0x03C4
630
- when 0x75; array_enc << 0x03C5
631
- when 0x76; array_enc << 0x03D6
632
- when 0x77; array_enc << 0x03C9
633
- when 0x78; array_enc << 0x03BE
634
- when 0x79; array_enc << 0x03C8
635
- when 0x7A; array_enc << 0x03B6
636
- when 0x7E; array_enc << 0x223C
637
- when 0xA0; array_enc << 0x20AC
638
- when 0xA1; array_enc << 0x03D2
639
- when 0xA2; array_enc << 0x2032
640
- when 0xA3; array_enc << 0x2264
641
- when 0xA4; array_enc << 0x2215
642
- when 0xA5; array_enc << 0x221E
643
- when 0xA6; array_enc << 0x0192
644
- when 0xA7; array_enc << 0x2663
645
- when 0xA8; array_enc << 0x2666
646
- when 0xA9; array_enc << 0x2665
647
- when 0xAA; array_enc << 0x2660
648
- when 0xAB; array_enc << 0x2194
649
- when 0xAC; array_enc << 0x2190
650
- when 0xAD; array_enc << 0x2191
651
- when 0xAE; array_enc << 0x2192
652
- when 0xAF; array_enc << 0x2193
653
- when 0xB2; array_enc << 0x2033
654
- when 0xB3; array_enc << 0x2265
655
- when 0xB4; array_enc << 0x00D7
656
- when 0xB5; array_enc << 0x221D
657
- when 0xB6; array_enc << 0x2202
658
- when 0xB7; array_enc << 0x2022
659
- when 0xB8; array_enc << 0x00F7
660
- when 0xB9; array_enc << 0x2260
661
- when 0xBA; array_enc << 0x2261
662
- when 0xBB; array_enc << 0x2248
663
- when 0xBC; array_enc << 0x2026
664
- when 0xBD; array_enc << 0xF8E6
665
- when 0xBE; array_enc << 0xF8E7
666
- when 0xBF; array_enc << 0x21B5
667
- when 0xC0; array_enc << 0x2135
668
- when 0xC1; array_enc << 0x2111
669
- when 0xC2; array_enc << 0x211C
670
- when 0xC3; array_enc << 0x2118
671
- when 0xC4; array_enc << 0x2297
672
- when 0xC5; array_enc << 0x2295
673
- when 0xC6; array_enc << 0x2205
674
- when 0xC7; array_enc << 0x2229
675
- when 0xC8; array_enc << 0x222A
676
- when 0xC9; array_enc << 0x2283
677
- when 0xCA; array_enc << 0x2287
678
- when 0xCB; array_enc << 0x2284
679
- when 0xCC; array_enc << 0x2282
680
- when 0xCD; array_enc << 0x2286
681
- when 0xCE; array_enc << 0x2208
682
- when 0xCF; array_enc << 0x2209
683
- when 0xD0; array_enc << 0x2220
684
- when 0xD1; array_enc << 0x2207
685
- when 0xD2; array_enc << 0xF6DA
686
- when 0xD3; array_enc << 0xF6D9
687
- when 0xD4; array_enc << 0xF6DB
688
- when 0xD5; array_enc << 0x220F
689
- when 0xD6; array_enc << 0x221A
690
- when 0xD7; array_enc << 0x22C5
691
- when 0xD8; array_enc << 0x00AC
692
- when 0xD9; array_enc << 0x2227
693
- when 0xDA; array_enc << 0x2228
694
- when 0xDB; array_enc << 0x21D4
695
- when 0xDC; array_enc << 0x21D0
696
- when 0xDD; array_enc << 0x21D1
697
- when 0xDE; array_enc << 0x21D2
698
- when 0xDF; array_enc << 0x21D3
699
- when 0xE0; array_enc << 0x25CA
700
- when 0xE1; array_enc << 0x2329
701
- when 0xE2; array_enc << 0xF8E8
702
- when 0xE3; array_enc << 0xF8E9
703
- when 0xE4; array_enc << 0xF8EA
704
- when 0xE5; array_enc << 0x2211
705
- when 0xE6; array_enc << 0xF8EB
706
- when 0xE7; array_enc << 0xF8EC
707
- when 0xE8; array_enc << 0xF8ED
708
- when 0xE9; array_enc << 0xF8EE
709
- when 0xEA; array_enc << 0xF8EF
710
- when 0xEB; array_enc << 0xF8F0
711
- when 0xEC; array_enc << 0xF8F1
712
- when 0xED; array_enc << 0xF8F2
713
- when 0xEE; array_enc << 0xF8F3
714
- when 0xEF; array_enc << 0xF8F4
715
- when 0xF1; array_enc << 0x232A
716
- when 0xF2; array_enc << 0x222B
717
- when 0xF3; array_enc << 0x2320
718
- when 0xF4; array_enc << 0xF8F5
719
- when 0xF5; array_enc << 0x2321
720
- when 0xF6; array_enc << 0xF8F6
721
- when 0xF7; array_enc << 0xF8F7
722
- when 0xF8; array_enc << 0xF8F8
723
- when 0xF9; array_enc << 0xF8F9
724
- when 0xFA; array_enc << 0xF8FA
725
- when 0xFB; array_enc << 0xF8FB
726
- when 0xFC; array_enc << 0xF8FC
727
- when 0xFD; array_enc << 0xF8FD
728
- when 0xFE; array_enc << 0xF8FE
592
+ if tounicode && (code = tounicode.decode(num))
593
+ array_enc << code
594
+ elsif tounicode
595
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
729
596
  else
730
- array_enc << num
597
+ case num
598
+ when 0x22; array_enc << 0x2200
599
+ when 0x24; array_enc << 0x2203
600
+ when 0x27; array_enc << 0x220B
601
+ when 0x2A; array_enc << 0x2217
602
+ when 0x2D; array_enc << 0x2212
603
+ when 0x40; array_enc << 0x2245
604
+ when 0x41; array_enc << 0x0391
605
+ when 0x42; array_enc << 0x0392
606
+ when 0x43; array_enc << 0x03A7
607
+ when 0x44; array_enc << 0x0394
608
+ when 0x45; array_enc << 0x0395
609
+ when 0x46; array_enc << 0x03A6
610
+ when 0x47; array_enc << 0x0393
611
+ when 0x48; array_enc << 0x0397
612
+ when 0x49; array_enc << 0x0399
613
+ when 0x4A; array_enc << 0x03D1
614
+ when 0x4B; array_enc << 0x039A
615
+ when 0x4C; array_enc << 0x039B
616
+ when 0x4D; array_enc << 0x039C
617
+ when 0x4E; array_enc << 0x039D
618
+ when 0x4F; array_enc << 0x039F
619
+ when 0x50; array_enc << 0x03A0
620
+ when 0x51; array_enc << 0x0398
621
+ when 0x52; array_enc << 0x03A1
622
+ when 0x53; array_enc << 0x03A3
623
+ when 0x54; array_enc << 0x03A4
624
+ when 0x55; array_enc << 0x03A5
625
+ when 0x56; array_enc << 0x03C2
626
+ when 0x57; array_enc << 0x03A9
627
+ when 0x58; array_enc << 0x039E
628
+ when 0x59; array_enc << 0x03A8
629
+ when 0x5A; array_enc << 0x0396
630
+ when 0x5C; array_enc << 0x2234
631
+ when 0x5E; array_enc << 0x22A5
632
+ when 0x60; array_enc << 0xF8E5
633
+ when 0x61; array_enc << 0x03B1
634
+ when 0x62; array_enc << 0x03B2
635
+ when 0x63; array_enc << 0x03C7
636
+ when 0x64; array_enc << 0x03B4
637
+ when 0x65; array_enc << 0x03B5
638
+ when 0x66; array_enc << 0x03C6
639
+ when 0x67; array_enc << 0x03B3
640
+ when 0x68; array_enc << 0x03B7
641
+ when 0x69; array_enc << 0x03B9
642
+ when 0x6A; array_enc << 0x03D5
643
+ when 0x6B; array_enc << 0x03BA
644
+ when 0x6C; array_enc << 0x03BB
645
+ when 0x6D; array_enc << 0x03BC
646
+ when 0x6E; array_enc << 0x03BD
647
+ when 0x6F; array_enc << 0x03BF
648
+ when 0x70; array_enc << 0x03C0
649
+ when 0x71; array_enc << 0x03B8
650
+ when 0x72; array_enc << 0x03C1
651
+ when 0x73; array_enc << 0x03C3
652
+ when 0x74; array_enc << 0x03C4
653
+ when 0x75; array_enc << 0x03C5
654
+ when 0x76; array_enc << 0x03D6
655
+ when 0x77; array_enc << 0x03C9
656
+ when 0x78; array_enc << 0x03BE
657
+ when 0x79; array_enc << 0x03C8
658
+ when 0x7A; array_enc << 0x03B6
659
+ when 0x7E; array_enc << 0x223C
660
+ when 0xA0; array_enc << 0x20AC
661
+ when 0xA1; array_enc << 0x03D2
662
+ when 0xA2; array_enc << 0x2032
663
+ when 0xA3; array_enc << 0x2264
664
+ when 0xA4; array_enc << 0x2215
665
+ when 0xA5; array_enc << 0x221E
666
+ when 0xA6; array_enc << 0x0192
667
+ when 0xA7; array_enc << 0x2663
668
+ when 0xA8; array_enc << 0x2666
669
+ when 0xA9; array_enc << 0x2665
670
+ when 0xAA; array_enc << 0x2660
671
+ when 0xAB; array_enc << 0x2194
672
+ when 0xAC; array_enc << 0x2190
673
+ when 0xAD; array_enc << 0x2191
674
+ when 0xAE; array_enc << 0x2192
675
+ when 0xAF; array_enc << 0x2193
676
+ when 0xB2; array_enc << 0x2033
677
+ when 0xB3; array_enc << 0x2265
678
+ when 0xB4; array_enc << 0x00D7
679
+ when 0xB5; array_enc << 0x221D
680
+ when 0xB6; array_enc << 0x2202
681
+ when 0xB7; array_enc << 0x2022
682
+ when 0xB8; array_enc << 0x00F7
683
+ when 0xB9; array_enc << 0x2260
684
+ when 0xBA; array_enc << 0x2261
685
+ when 0xBB; array_enc << 0x2248
686
+ when 0xBC; array_enc << 0x2026
687
+ when 0xBD; array_enc << 0xF8E6
688
+ when 0xBE; array_enc << 0xF8E7
689
+ when 0xBF; array_enc << 0x21B5
690
+ when 0xC0; array_enc << 0x2135
691
+ when 0xC1; array_enc << 0x2111
692
+ when 0xC2; array_enc << 0x211C
693
+ when 0xC3; array_enc << 0x2118
694
+ when 0xC4; array_enc << 0x2297
695
+ when 0xC5; array_enc << 0x2295
696
+ when 0xC6; array_enc << 0x2205
697
+ when 0xC7; array_enc << 0x2229
698
+ when 0xC8; array_enc << 0x222A
699
+ when 0xC9; array_enc << 0x2283
700
+ when 0xCA; array_enc << 0x2287
701
+ when 0xCB; array_enc << 0x2284
702
+ when 0xCC; array_enc << 0x2282
703
+ when 0xCD; array_enc << 0x2286
704
+ when 0xCE; array_enc << 0x2208
705
+ when 0xCF; array_enc << 0x2209
706
+ when 0xD0; array_enc << 0x2220
707
+ when 0xD1; array_enc << 0x2207
708
+ when 0xD2; array_enc << 0xF6DA
709
+ when 0xD3; array_enc << 0xF6D9
710
+ when 0xD4; array_enc << 0xF6DB
711
+ when 0xD5; array_enc << 0x220F
712
+ when 0xD6; array_enc << 0x221A
713
+ when 0xD7; array_enc << 0x22C5
714
+ when 0xD8; array_enc << 0x00AC
715
+ when 0xD9; array_enc << 0x2227
716
+ when 0xDA; array_enc << 0x2228
717
+ when 0xDB; array_enc << 0x21D4
718
+ when 0xDC; array_enc << 0x21D0
719
+ when 0xDD; array_enc << 0x21D1
720
+ when 0xDE; array_enc << 0x21D2
721
+ when 0xDF; array_enc << 0x21D3
722
+ when 0xE0; array_enc << 0x25CA
723
+ when 0xE1; array_enc << 0x2329
724
+ when 0xE2; array_enc << 0xF8E8
725
+ when 0xE3; array_enc << 0xF8E9
726
+ when 0xE4; array_enc << 0xF8EA
727
+ when 0xE5; array_enc << 0x2211
728
+ when 0xE6; array_enc << 0xF8EB
729
+ when 0xE7; array_enc << 0xF8EC
730
+ when 0xE8; array_enc << 0xF8ED
731
+ when 0xE9; array_enc << 0xF8EE
732
+ when 0xEA; array_enc << 0xF8EF
733
+ when 0xEB; array_enc << 0xF8F0
734
+ when 0xEC; array_enc << 0xF8F1
735
+ when 0xED; array_enc << 0xF8F2
736
+ when 0xEE; array_enc << 0xF8F3
737
+ when 0xEF; array_enc << 0xF8F4
738
+ when 0xF1; array_enc << 0x232A
739
+ when 0xF2; array_enc << 0x222B
740
+ when 0xF3; array_enc << 0x2320
741
+ when 0xF4; array_enc << 0xF8F5
742
+ when 0xF5; array_enc << 0x2321
743
+ when 0xF6; array_enc << 0xF8F6
744
+ when 0xF7; array_enc << 0xF8F7
745
+ when 0xF8; array_enc << 0xF8F8
746
+ when 0xF9; array_enc << 0xF8F9
747
+ when 0xFA; array_enc << 0xF8FA
748
+ when 0xFB; array_enc << 0xF8FB
749
+ when 0xFC; array_enc << 0xF8FC
750
+ when 0xFD; array_enc << 0xF8FD
751
+ when 0xFE; array_enc << 0xF8FE
752
+ else
753
+ array_enc << num
754
+ end
731
755
  end
732
756
  end
733
757
 
@@ -757,37 +781,43 @@ class PDF::Reader
757
781
  array_latin9 = self.process_differences(array_latin9)
758
782
  array_enc = []
759
783
  array_latin9.each do |num|
760
- case num
761
- # characters that added compared to iso-8859-1
762
- when 0x80; array_enc << 0x20AC # 0xe2 0x82 0xac
763
- when 0x82; array_enc << 0x201A # 0xe2 0x82 0x9a
764
- when 0x83; array_enc << 0x0192 # 0xc6 0x92
765
- when 0x84; array_enc << 0x201E # 0xe2 0x82 0x9e
766
- when 0x85; array_enc << 0x2026 # 0xe2 0x80 0xa6
767
- when 0x86; array_enc << 0x2020 # 0xe2 0x80 0xa0
768
- when 0x87; array_enc << 0x2021 # 0xe2 0x80 0xa1
769
- when 0x88; array_enc << 0x02C6 # 0xcb 0x86
770
- when 0x89; array_enc << 0x2030 # 0xe2 0x80 0xb0
771
- when 0x8A; array_enc << 0x0160 # 0xc5 0xa0
772
- when 0x8B; array_enc << 0x2039 # 0xe2 0x80 0xb9
773
- when 0x8C; array_enc << 0x0152 # 0xc5 0x92
774
- when 0x8E; array_enc << 0x017D # 0xc5 0xbd
775
- when 0x91; array_enc << 0x2018 # 0xe2 0x80 0x98
776
- when 0x92; array_enc << 0x2019 # 0xe2 0x80 0x99
777
- when 0x93; array_enc << 0x201C
778
- when 0x94; array_enc << 0x201D
779
- when 0x95; array_enc << 0x2022
780
- when 0x96; array_enc << 0x2013
781
- when 0x97; array_enc << 0x2014
782
- when 0x98; array_enc << 0x02DC
783
- when 0x99; array_enc << 0x2122
784
- when 0x9A; array_enc << 0x0161
785
- when 0x9B; array_enc << 0x203A
786
- when 0x9C; array_enc << 0x0152 # 0xc5 0x93
787
- when 0x9E; array_enc << 0x017E # 0xc5 0xbe
788
- when 0x9F; array_enc << 0x0178
784
+ if tounicode && (code = tounicode.decode(num))
785
+ array_enc << code
786
+ elsif tounicode
787
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
789
788
  else
790
- array_enc << num
789
+ case num
790
+ # characters that added compared to iso-8859-1
791
+ when 0x80; array_enc << 0x20AC # 0xe2 0x82 0xac
792
+ when 0x82; array_enc << 0x201A # 0xe2 0x82 0x9a
793
+ when 0x83; array_enc << 0x0192 # 0xc6 0x92
794
+ when 0x84; array_enc << 0x201E # 0xe2 0x82 0x9e
795
+ when 0x85; array_enc << 0x2026 # 0xe2 0x80 0xa6
796
+ when 0x86; array_enc << 0x2020 # 0xe2 0x80 0xa0
797
+ when 0x87; array_enc << 0x2021 # 0xe2 0x80 0xa1
798
+ when 0x88; array_enc << 0x02C6 # 0xcb 0x86
799
+ when 0x89; array_enc << 0x2030 # 0xe2 0x80 0xb0
800
+ when 0x8A; array_enc << 0x0160 # 0xc5 0xa0
801
+ when 0x8B; array_enc << 0x2039 # 0xe2 0x80 0xb9
802
+ when 0x8C; array_enc << 0x0152 # 0xc5 0x92
803
+ when 0x8E; array_enc << 0x017D # 0xc5 0xbd
804
+ when 0x91; array_enc << 0x2018 # 0xe2 0x80 0x98
805
+ when 0x92; array_enc << 0x2019 # 0xe2 0x80 0x99
806
+ when 0x93; array_enc << 0x201C
807
+ when 0x94; array_enc << 0x201D
808
+ when 0x95; array_enc << 0x2022
809
+ when 0x96; array_enc << 0x2013
810
+ when 0x97; array_enc << 0x2014
811
+ when 0x98; array_enc << 0x02DC
812
+ when 0x99; array_enc << 0x2122
813
+ when 0x9A; array_enc << 0x0161
814
+ when 0x9B; array_enc << 0x203A
815
+ when 0x9C; array_enc << 0x0152 # 0xc5 0x93
816
+ when 0x9E; array_enc << 0x017E # 0xc5 0xbe
817
+ when 0x9F; array_enc << 0x0178
818
+ else
819
+ array_enc << num
820
+ end
791
821
  end
792
822
  end
793
823
 
@@ -816,210 +846,216 @@ class PDF::Reader
816
846
  array_symbol = self.process_differences(array_symbol)
817
847
  array_enc = []
818
848
  array_symbol.each do |num|
819
- case num
820
- when 0x21; array_enc << 0x2701
821
- when 0x22; array_enc << 0x2702
822
- when 0x23; array_enc << 0x2703
823
- when 0x24; array_enc << 0x2704
824
- when 0x25; array_enc << 0x260E
825
- when 0x26; array_enc << 0x2706
826
- when 0x27; array_enc << 0x2707
827
- when 0x28; array_enc << 0x2708
828
- when 0x29; array_enc << 0x2709
829
- when 0x2A; array_enc << 0x261B
830
- when 0x2B; array_enc << 0x261E
831
- when 0x2C; array_enc << 0x270C
832
- when 0x2D; array_enc << 0x270D
833
- when 0x2E; array_enc << 0x270E
834
- when 0x2F; array_enc << 0x270F
835
- when 0x30; array_enc << 0x2710
836
- when 0x31; array_enc << 0x2711
837
- when 0x32; array_enc << 0x2712
838
- when 0x33; array_enc << 0x2713
839
- when 0x34; array_enc << 0x2714
840
- when 0x35; array_enc << 0x2715
841
- when 0x36; array_enc << 0x2716
842
- when 0x37; array_enc << 0x2717
843
- when 0x38; array_enc << 0x2718
844
- when 0x39; array_enc << 0x2719
845
- when 0x3A; array_enc << 0x271A
846
- when 0x3B; array_enc << 0x271B
847
- when 0x3C; array_enc << 0x271C
848
- when 0x3D; array_enc << 0x271D
849
- when 0x3E; array_enc << 0x271E
850
- when 0x3F; array_enc << 0x271E
851
- when 0x40; array_enc << 0x2720
852
- when 0x41; array_enc << 0x2721
853
- when 0x42; array_enc << 0x2722
854
- when 0x43; array_enc << 0x2723
855
- when 0x44; array_enc << 0x2724
856
- when 0x45; array_enc << 0x2725
857
- when 0x46; array_enc << 0x2726
858
- when 0x47; array_enc << 0x2727
859
- when 0x48; array_enc << 0x2605
860
- when 0x49; array_enc << 0x2729
861
- when 0x4A; array_enc << 0x272A
862
- when 0x4B; array_enc << 0x272B
863
- when 0x4C; array_enc << 0x272C
864
- when 0x4D; array_enc << 0x272D
865
- when 0x4E; array_enc << 0x272E
866
- when 0x4F; array_enc << 0x272F
867
- when 0x50; array_enc << 0x2730
868
- when 0x51; array_enc << 0x2731
869
- when 0x52; array_enc << 0x2732
870
- when 0x53; array_enc << 0x2733
871
- when 0x54; array_enc << 0x2734
872
- when 0x55; array_enc << 0x2735
873
- when 0x56; array_enc << 0x2736
874
- when 0x57; array_enc << 0x2737
875
- when 0x58; array_enc << 0x2738
876
- when 0x59; array_enc << 0x2739
877
- when 0x5A; array_enc << 0x273A
878
- when 0x5B; array_enc << 0x273B
879
- when 0x5C; array_enc << 0x273C
880
- when 0x5D; array_enc << 0x273D
881
- when 0x5E; array_enc << 0x273E
882
- when 0x5F; array_enc << 0x273F
883
- when 0x60; array_enc << 0x2740
884
- when 0x61; array_enc << 0x2741
885
- when 0x62; array_enc << 0x2742
886
- when 0x63; array_enc << 0x2743
887
- when 0x64; array_enc << 0x2744
888
- when 0x65; array_enc << 0x2745
889
- when 0x66; array_enc << 0x2746
890
- when 0x67; array_enc << 0x2747
891
- when 0x68; array_enc << 0x2748
892
- when 0x69; array_enc << 0x2749
893
- when 0x6A; array_enc << 0x274A
894
- when 0x6B; array_enc << 0x274B
895
- when 0x6C; array_enc << 0x25CF
896
- when 0x6D; array_enc << 0x274D
897
- when 0x6E; array_enc << 0x25A0
898
- when 0x6F; array_enc << 0x274F
899
- when 0x70; array_enc << 0x2750
900
- when 0x71; array_enc << 0x2751
901
- when 0x72; array_enc << 0x2752
902
- when 0x73; array_enc << 0x2753
903
- when 0x74; array_enc << 0x2754
904
- when 0x75; array_enc << 0x2755
905
- when 0x76; array_enc << 0x2756
906
- when 0x77; array_enc << 0x2757
907
- when 0x78; array_enc << 0x2758
908
- when 0x79; array_enc << 0x2759
909
- when 0x7A; array_enc << 0x275A
910
- when 0x7B; array_enc << 0x275B
911
- when 0x7C; array_enc << 0x275C
912
- when 0x7D; array_enc << 0x275D
913
- when 0x7E; array_enc << 0x275E
914
- when 0x80; array_enc << 0xF8D7
915
- when 0x81; array_enc << 0xF8D8
916
- when 0x82; array_enc << 0xF8D9
917
- when 0x83; array_enc << 0xF8DA
918
- when 0x84; array_enc << 0xF8DB
919
- when 0x85; array_enc << 0xF8DC
920
- when 0x86; array_enc << 0xF8DD
921
- when 0x87; array_enc << 0xF8DE
922
- when 0x88; array_enc << 0xF8DF
923
- when 0x89; array_enc << 0xF8E0
924
- when 0x8A; array_enc << 0xF8E1
925
- when 0x8B; array_enc << 0xF8E2
926
- when 0x8C; array_enc << 0xF8E3
927
- when 0x8D; array_enc << 0xF8E4
928
- when 0xA1; array_enc << 0x2761
929
- when 0xA2; array_enc << 0x2762
930
- when 0xA3; array_enc << 0x2763
931
- when 0xA4; array_enc << 0x2764
932
- when 0xA5; array_enc << 0x2765
933
- when 0xA6; array_enc << 0x2766
934
- when 0xA7; array_enc << 0x2767
935
- when 0xA8; array_enc << 0x2663
936
- when 0xA9; array_enc << 0x2666
937
- when 0xAA; array_enc << 0x2665
938
- when 0xAB; array_enc << 0x2660
939
- when 0xAC; array_enc << 0x2460
940
- when 0xAD; array_enc << 0x2461
941
- when 0xAE; array_enc << 0x2462
942
- when 0xAF; array_enc << 0x2463
943
- when 0xB0; array_enc << 0x2464
944
- when 0xB1; array_enc << 0x2465
945
- when 0xB2; array_enc << 0x2466
946
- when 0xB3; array_enc << 0x2467
947
- when 0xB4; array_enc << 0x2468
948
- when 0xB5; array_enc << 0x2469
949
- when 0xB6; array_enc << 0x2776
950
- when 0xB7; array_enc << 0x2777
951
- when 0xB8; array_enc << 0x2778
952
- when 0xB9; array_enc << 0x2779
953
- when 0xBA; array_enc << 0x277A
954
- when 0xBB; array_enc << 0x277B
955
- when 0xBC; array_enc << 0x277C
956
- when 0xBD; array_enc << 0x277D
957
- when 0xBE; array_enc << 0x277E
958
- when 0xBF; array_enc << 0x277F
959
- when 0xC0; array_enc << 0x2780
960
- when 0xC1; array_enc << 0x2781
961
- when 0xC2; array_enc << 0x2782
962
- when 0xC3; array_enc << 0x2783
963
- when 0xC4; array_enc << 0x2784
964
- when 0xC5; array_enc << 0x2785
965
- when 0xC6; array_enc << 0x2786
966
- when 0xC7; array_enc << 0x2787
967
- when 0xC8; array_enc << 0x2788
968
- when 0xC9; array_enc << 0x2789
969
- when 0xCA; array_enc << 0x278A
970
- when 0xCB; array_enc << 0x278B
971
- when 0xCC; array_enc << 0x278C
972
- when 0xCD; array_enc << 0x278D
973
- when 0xCE; array_enc << 0x278E
974
- when 0xCF; array_enc << 0x278F
975
- when 0xD0; array_enc << 0x2790
976
- when 0xD1; array_enc << 0x2791
977
- when 0xD2; array_enc << 0x2792
978
- when 0xD3; array_enc << 0x2793
979
- when 0xD4; array_enc << 0x2794
980
- when 0xD5; array_enc << 0x2795
981
- when 0xD6; array_enc << 0x2796
982
- when 0xD7; array_enc << 0x2797
983
- when 0xD8; array_enc << 0x2798
984
- when 0xD9; array_enc << 0x2799
985
- when 0xDA; array_enc << 0x279A
986
- when 0xDB; array_enc << 0x279B
987
- when 0xDC; array_enc << 0x279C
988
- when 0xDD; array_enc << 0x279D
989
- when 0xDE; array_enc << 0x279E
990
- when 0xDF; array_enc << 0x279F
991
- when 0xE0; array_enc << 0x27A0
992
- when 0xE1; array_enc << 0x27A1
993
- when 0xE2; array_enc << 0x27A2
994
- when 0xE3; array_enc << 0x27A3
995
- when 0xE4; array_enc << 0x27A4
996
- when 0xE5; array_enc << 0x27A5
997
- when 0xE6; array_enc << 0x27A6
998
- when 0xE7; array_enc << 0x27A7
999
- when 0xE8; array_enc << 0x27A8
1000
- when 0xE9; array_enc << 0x27A9
1001
- when 0xEA; array_enc << 0x27AA
1002
- when 0xEB; array_enc << 0x27AB
1003
- when 0xEC; array_enc << 0x27AC
1004
- when 0xED; array_enc << 0x27AD
1005
- when 0xEE; array_enc << 0x27AE
1006
- when 0xEF; array_enc << 0x27AF
1007
- when 0xF1; array_enc << 0x27B1
1008
- when 0xF2; array_enc << 0x27B2
1009
- when 0xF3; array_enc << 0x27B3
1010
- when 0xF4; array_enc << 0x27B4
1011
- when 0xF5; array_enc << 0x27B5
1012
- when 0xF6; array_enc << 0x27B6
1013
- when 0xF7; array_enc << 0x27B7
1014
- when 0xF8; array_enc << 0x27B8
1015
- when 0xF9; array_enc << 0x27B9
1016
- when 0xFA; array_enc << 0x27BA
1017
- when 0xFB; array_enc << 0x27BB
1018
- when 0xFC; array_enc << 0x27BC
1019
- when 0xFD; array_enc << 0x27BD
1020
- when 0xFE; array_enc << 0x27BE
849
+ if tounicode && (code = tounicode.decode(num))
850
+ array_enc << code
851
+ elsif tounicode
852
+ array_enc << PDF::Reader::Encoding::UNKNOWN_CHAR
1021
853
  else
1022
- array_enc << num
854
+ case num
855
+ when 0x21; array_enc << 0x2701
856
+ when 0x22; array_enc << 0x2702
857
+ when 0x23; array_enc << 0x2703
858
+ when 0x24; array_enc << 0x2704
859
+ when 0x25; array_enc << 0x260E
860
+ when 0x26; array_enc << 0x2706
861
+ when 0x27; array_enc << 0x2707
862
+ when 0x28; array_enc << 0x2708
863
+ when 0x29; array_enc << 0x2709
864
+ when 0x2A; array_enc << 0x261B
865
+ when 0x2B; array_enc << 0x261E
866
+ when 0x2C; array_enc << 0x270C
867
+ when 0x2D; array_enc << 0x270D
868
+ when 0x2E; array_enc << 0x270E
869
+ when 0x2F; array_enc << 0x270F
870
+ when 0x30; array_enc << 0x2710
871
+ when 0x31; array_enc << 0x2711
872
+ when 0x32; array_enc << 0x2712
873
+ when 0x33; array_enc << 0x2713
874
+ when 0x34; array_enc << 0x2714
875
+ when 0x35; array_enc << 0x2715
876
+ when 0x36; array_enc << 0x2716
877
+ when 0x37; array_enc << 0x2717
878
+ when 0x38; array_enc << 0x2718
879
+ when 0x39; array_enc << 0x2719
880
+ when 0x3A; array_enc << 0x271A
881
+ when 0x3B; array_enc << 0x271B
882
+ when 0x3C; array_enc << 0x271C
883
+ when 0x3D; array_enc << 0x271D
884
+ when 0x3E; array_enc << 0x271E
885
+ when 0x3F; array_enc << 0x271E
886
+ when 0x40; array_enc << 0x2720
887
+ when 0x41; array_enc << 0x2721
888
+ when 0x42; array_enc << 0x2722
889
+ when 0x43; array_enc << 0x2723
890
+ when 0x44; array_enc << 0x2724
891
+ when 0x45; array_enc << 0x2725
892
+ when 0x46; array_enc << 0x2726
893
+ when 0x47; array_enc << 0x2727
894
+ when 0x48; array_enc << 0x2605
895
+ when 0x49; array_enc << 0x2729
896
+ when 0x4A; array_enc << 0x272A
897
+ when 0x4B; array_enc << 0x272B
898
+ when 0x4C; array_enc << 0x272C
899
+ when 0x4D; array_enc << 0x272D
900
+ when 0x4E; array_enc << 0x272E
901
+ when 0x4F; array_enc << 0x272F
902
+ when 0x50; array_enc << 0x2730
903
+ when 0x51; array_enc << 0x2731
904
+ when 0x52; array_enc << 0x2732
905
+ when 0x53; array_enc << 0x2733
906
+ when 0x54; array_enc << 0x2734
907
+ when 0x55; array_enc << 0x2735
908
+ when 0x56; array_enc << 0x2736
909
+ when 0x57; array_enc << 0x2737
910
+ when 0x58; array_enc << 0x2738
911
+ when 0x59; array_enc << 0x2739
912
+ when 0x5A; array_enc << 0x273A
913
+ when 0x5B; array_enc << 0x273B
914
+ when 0x5C; array_enc << 0x273C
915
+ when 0x5D; array_enc << 0x273D
916
+ when 0x5E; array_enc << 0x273E
917
+ when 0x5F; array_enc << 0x273F
918
+ when 0x60; array_enc << 0x2740
919
+ when 0x61; array_enc << 0x2741
920
+ when 0x62; array_enc << 0x2742
921
+ when 0x63; array_enc << 0x2743
922
+ when 0x64; array_enc << 0x2744
923
+ when 0x65; array_enc << 0x2745
924
+ when 0x66; array_enc << 0x2746
925
+ when 0x67; array_enc << 0x2747
926
+ when 0x68; array_enc << 0x2748
927
+ when 0x69; array_enc << 0x2749
928
+ when 0x6A; array_enc << 0x274A
929
+ when 0x6B; array_enc << 0x274B
930
+ when 0x6C; array_enc << 0x25CF
931
+ when 0x6D; array_enc << 0x274D
932
+ when 0x6E; array_enc << 0x25A0
933
+ when 0x6F; array_enc << 0x274F
934
+ when 0x70; array_enc << 0x2750
935
+ when 0x71; array_enc << 0x2751
936
+ when 0x72; array_enc << 0x2752
937
+ when 0x73; array_enc << 0x2753
938
+ when 0x74; array_enc << 0x2754
939
+ when 0x75; array_enc << 0x2755
940
+ when 0x76; array_enc << 0x2756
941
+ when 0x77; array_enc << 0x2757
942
+ when 0x78; array_enc << 0x2758
943
+ when 0x79; array_enc << 0x2759
944
+ when 0x7A; array_enc << 0x275A
945
+ when 0x7B; array_enc << 0x275B
946
+ when 0x7C; array_enc << 0x275C
947
+ when 0x7D; array_enc << 0x275D
948
+ when 0x7E; array_enc << 0x275E
949
+ when 0x80; array_enc << 0xF8D7
950
+ when 0x81; array_enc << 0xF8D8
951
+ when 0x82; array_enc << 0xF8D9
952
+ when 0x83; array_enc << 0xF8DA
953
+ when 0x84; array_enc << 0xF8DB
954
+ when 0x85; array_enc << 0xF8DC
955
+ when 0x86; array_enc << 0xF8DD
956
+ when 0x87; array_enc << 0xF8DE
957
+ when 0x88; array_enc << 0xF8DF
958
+ when 0x89; array_enc << 0xF8E0
959
+ when 0x8A; array_enc << 0xF8E1
960
+ when 0x8B; array_enc << 0xF8E2
961
+ when 0x8C; array_enc << 0xF8E3
962
+ when 0x8D; array_enc << 0xF8E4
963
+ when 0xA1; array_enc << 0x2761
964
+ when 0xA2; array_enc << 0x2762
965
+ when 0xA3; array_enc << 0x2763
966
+ when 0xA4; array_enc << 0x2764
967
+ when 0xA5; array_enc << 0x2765
968
+ when 0xA6; array_enc << 0x2766
969
+ when 0xA7; array_enc << 0x2767
970
+ when 0xA8; array_enc << 0x2663
971
+ when 0xA9; array_enc << 0x2666
972
+ when 0xAA; array_enc << 0x2665
973
+ when 0xAB; array_enc << 0x2660
974
+ when 0xAC; array_enc << 0x2460
975
+ when 0xAD; array_enc << 0x2461
976
+ when 0xAE; array_enc << 0x2462
977
+ when 0xAF; array_enc << 0x2463
978
+ when 0xB0; array_enc << 0x2464
979
+ when 0xB1; array_enc << 0x2465
980
+ when 0xB2; array_enc << 0x2466
981
+ when 0xB3; array_enc << 0x2467
982
+ when 0xB4; array_enc << 0x2468
983
+ when 0xB5; array_enc << 0x2469
984
+ when 0xB6; array_enc << 0x2776
985
+ when 0xB7; array_enc << 0x2777
986
+ when 0xB8; array_enc << 0x2778
987
+ when 0xB9; array_enc << 0x2779
988
+ when 0xBA; array_enc << 0x277A
989
+ when 0xBB; array_enc << 0x277B
990
+ when 0xBC; array_enc << 0x277C
991
+ when 0xBD; array_enc << 0x277D
992
+ when 0xBE; array_enc << 0x277E
993
+ when 0xBF; array_enc << 0x277F
994
+ when 0xC0; array_enc << 0x2780
995
+ when 0xC1; array_enc << 0x2781
996
+ when 0xC2; array_enc << 0x2782
997
+ when 0xC3; array_enc << 0x2783
998
+ when 0xC4; array_enc << 0x2784
999
+ when 0xC5; array_enc << 0x2785
1000
+ when 0xC6; array_enc << 0x2786
1001
+ when 0xC7; array_enc << 0x2787
1002
+ when 0xC8; array_enc << 0x2788
1003
+ when 0xC9; array_enc << 0x2789
1004
+ when 0xCA; array_enc << 0x278A
1005
+ when 0xCB; array_enc << 0x278B
1006
+ when 0xCC; array_enc << 0x278C
1007
+ when 0xCD; array_enc << 0x278D
1008
+ when 0xCE; array_enc << 0x278E
1009
+ when 0xCF; array_enc << 0x278F
1010
+ when 0xD0; array_enc << 0x2790
1011
+ when 0xD1; array_enc << 0x2791
1012
+ when 0xD2; array_enc << 0x2792
1013
+ when 0xD3; array_enc << 0x2793
1014
+ when 0xD4; array_enc << 0x2794
1015
+ when 0xD5; array_enc << 0x2795
1016
+ when 0xD6; array_enc << 0x2796
1017
+ when 0xD7; array_enc << 0x2797
1018
+ when 0xD8; array_enc << 0x2798
1019
+ when 0xD9; array_enc << 0x2799
1020
+ when 0xDA; array_enc << 0x279A
1021
+ when 0xDB; array_enc << 0x279B
1022
+ when 0xDC; array_enc << 0x279C
1023
+ when 0xDD; array_enc << 0x279D
1024
+ when 0xDE; array_enc << 0x279E
1025
+ when 0xDF; array_enc << 0x279F
1026
+ when 0xE0; array_enc << 0x27A0
1027
+ when 0xE1; array_enc << 0x27A1
1028
+ when 0xE2; array_enc << 0x27A2
1029
+ when 0xE3; array_enc << 0x27A3
1030
+ when 0xE4; array_enc << 0x27A4
1031
+ when 0xE5; array_enc << 0x27A5
1032
+ when 0xE6; array_enc << 0x27A6
1033
+ when 0xE7; array_enc << 0x27A7
1034
+ when 0xE8; array_enc << 0x27A8
1035
+ when 0xE9; array_enc << 0x27A9
1036
+ when 0xEA; array_enc << 0x27AA
1037
+ when 0xEB; array_enc << 0x27AB
1038
+ when 0xEC; array_enc << 0x27AC
1039
+ when 0xED; array_enc << 0x27AD
1040
+ when 0xEE; array_enc << 0x27AE
1041
+ when 0xEF; array_enc << 0x27AF
1042
+ when 0xF1; array_enc << 0x27B1
1043
+ when 0xF2; array_enc << 0x27B2
1044
+ when 0xF3; array_enc << 0x27B3
1045
+ when 0xF4; array_enc << 0x27B4
1046
+ when 0xF5; array_enc << 0x27B5
1047
+ when 0xF6; array_enc << 0x27B6
1048
+ when 0xF7; array_enc << 0x27B7
1049
+ when 0xF8; array_enc << 0x27B8
1050
+ when 0xF9; array_enc << 0x27B9
1051
+ when 0xFA; array_enc << 0x27BA
1052
+ when 0xFB; array_enc << 0x27BB
1053
+ when 0xFC; array_enc << 0x27BC
1054
+ when 0xFD; array_enc << 0x27BD
1055
+ when 0xFE; array_enc << 0x27BE
1056
+ else
1057
+ array_enc << num
1058
+ end
1023
1059
  end
1024
1060
  end
1025
1061