pdf-reader 0.5.1 → 0.6

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG CHANGED
@@ -1,3 +1,12 @@
1
+ v0.6.0 (xxx)
2
+ - all text is now transparently converted to UTF-8 before being passed to the callbacks.
3
+ before this version, text was just passed as a byte level copy of what was in the PDF file, which
4
+ was mildly annoying with some encodings, and resulted in garbled text for Unicode encoded text.
5
+ - Fonts that use a difference table are now handled correctly
6
+ - fixed some 1.9 incompatible syntax
7
+ - expanded RegisterReceiver class to record extra info
8
+ - tweaked a README example
9
+
1
10
  v0.5.1 (1st January 2008)
2
11
  - Several documentation tweaks
3
12
  - Improve support for parsing PDFs under windows (thanks to Jari Williamsson)
data/README CHANGED
@@ -29,6 +29,12 @@ For a full list of the supported callback methods and a description of when they
29
29
  will be called, refer to PDF::Reader::Content. See the code examples below for a
30
30
  way to print a list of all the callbacks generated by a file to STDOUT.
31
31
 
32
+ = Text Encoding
33
+
34
+ Internally, text can be stored inside a PDF in various encodings, including
35
+ zingbats, win-1252, mac roman and a form of Unicode. To avoid confusion, all
36
+ text will be converted to UTF-8 before it is passed back from PDF::Reader.
37
+
32
38
  = Exceptions
33
39
 
34
40
  There are two key exceptions that you will need to watch out for when processing a
@@ -47,6 +53,12 @@ us with future code improvements.
47
53
  - Peter Jones <mailto:pjones@pmade.com>
48
54
  - James Healy <mailto:jimmy@deefa.com>
49
55
 
56
+ = Mailing List
57
+
58
+ Any questions or feedback should be sent to the PDF::Reader google group.
59
+
60
+ http://groups.google.com/group/pdf-reader
61
+
50
62
  = Examples
51
63
 
52
64
  The easiest way to explain how this works in practice is to show some examples.
@@ -117,6 +129,10 @@ it through less or to a text file.
117
129
  alias :move_to_next_line_and_show_text :show_text
118
130
  alias :set_spacing_next_line_show_text :show_text
119
131
 
132
+ def show_text_with_positioning(*params)
133
+ params = params.first
134
+ params.each { |str| show_text(str) if str.kind_of?(String)}
135
+ end
120
136
  end
121
137
 
122
138
  context "My generated PDF" do
@@ -183,6 +199,12 @@ Requires the rbook-isbn gem.
183
199
  receiver = ISBNReceiver.new
184
200
  PDF::Reader.file("somefile.pdf", receiver)
185
201
 
202
+ = Known Limitations
203
+
204
+ The order of the callbacks is unpredicable, and is dependent on the internal
205
+ layout of the file, not the order objects are displayed to the user. As a
206
+ consequence of this it is highly unlikely that text will be completely in
207
+ order.
186
208
 
187
209
  = Resources
188
210
 
data/Rakefile CHANGED
@@ -6,7 +6,7 @@ require 'rake/testtask'
6
6
  require "rake/gempackagetask"
7
7
  require 'spec/rake/spectask'
8
8
 
9
- PKG_VERSION = "0.5.1"
9
+ PKG_VERSION = "0.6"
10
10
  PKG_NAME = "pdf-reader"
11
11
  PKG_FILE_NAME = "#{PKG_NAME}-#{PKG_VERSION}"
12
12
 
data/TODO CHANGED
@@ -1,10 +1,28 @@
1
- Some ideas for future work
2
- - Allows the user to only process certain aspects of the PDF file. For example, if they're only
3
- interested in meta data, there's no point in walking the pages tree.
1
+ v0.7
2
+ - Allow the user to only process certain aspects of the PDF file. For example, if they're only
3
+ interested in meta data or bookmarks, there's no point in walking the pages tree.
4
+ - maybe a third option to Reader.parse?
5
+ parse(io, receiver, {:pages => true, :fonts => false, :metadata => true, :bookmarks => false})
4
6
 
7
+ - Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged.
8
+ poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight
9
+ from the Original encoding to Unicode.
10
+
11
+ v0.9
12
+ - Support for CJK text (convert to UTF-8 like all other encodings)
13
+ - Add a way to extract raster images
14
+
15
+
16
+ Sometime
5
17
  - Ship some extra receivers in the standard package, particuarly ones that are useful for running
6
18
  rspec over generated PDF files
7
19
 
8
20
  - Improve metadata support
9
21
 
10
22
  - Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
23
+
24
+ - Add support for additional encodings:
25
+ - PDFDocEncoding
26
+ - Identity-V(I *think* this relates to vertical text. Not sure how we'd support it sensibly)
27
+
28
+ - Investigate how R->L text is handled
@@ -43,7 +43,7 @@ module PDF
43
43
  #
44
44
  # This is useful for processing a PDF that is already in memory
45
45
  #
46
- # PDF::Reader.string("somefile.pdf", receiver)
46
+ # PDF::Reader.string(pdf_string, receiver)
47
47
  #
48
48
  # = Parsing an IO object
49
49
  #
@@ -73,9 +73,12 @@ end
73
73
  ################################################################################
74
74
  require 'pdf/reader/explore'
75
75
  require 'pdf/reader/buffer'
76
+ require 'pdf/reader/cmap'
76
77
  require 'pdf/reader/content'
78
+ require 'pdf/reader/encoding'
77
79
  require 'pdf/reader/error'
78
80
  require 'pdf/reader/filter'
81
+ require 'pdf/reader/font'
79
82
  require 'pdf/reader/name'
80
83
  require 'pdf/reader/parser'
81
84
  require 'pdf/reader/reference'
@@ -89,9 +89,9 @@ class PDF::Reader
89
89
  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size
90
90
 
91
91
  token_chars =
92
- if i == 0 and @buffer[i,2] == "<<" : 2
93
- elsif i == 0 and @buffer[i,2] == ">>" : 2
94
- elsif i == 0 : 1
92
+ if i == 0 and @buffer[i,2] == "<<" then 2
93
+ elsif i == 0 and @buffer[i,2] == ">>" then 2
94
+ elsif i == 0 then 1
95
95
  else i
96
96
  end
97
97
 
@@ -0,0 +1,48 @@
1
+ ################################################################################
2
+ #
3
+ # Copyright (C) 2008 James Healy (jimmy@deefa.com)
4
+ #
5
+ # Permission is hereby granted, free of charge, to any person obtaining
6
+ # a copy of this software and associated documentation files (the
7
+ # "Software"), to deal in the Software without restriction, including
8
+ # without limitation the rights to use, copy, modify, merge, publish,
9
+ # distribute, sublicense, and/or sell copies of the Software, and to
10
+ # permit persons to whom the Software is furnished to do so, subject to
11
+ # the following conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be
14
+ # included in all copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23
+ #
24
+ ################################################################################
25
+
26
+ class PDF::Reader
27
+ class CMap
28
+
29
+ def initialize(data)
30
+ @map = {}
31
+ inmap = false
32
+ data.each_line do |l|
33
+ inmap = true if l.include?("beginbfchar")
34
+ if inmap
35
+ m, find, replace = *l.match(/<([0-9a-fA-F]+)> <([0-9a-fA-F]+)>/)
36
+ @map["0x#{find}".hex] = "0x#{replace}".hex if find && replace
37
+ end
38
+ end
39
+ end
40
+
41
+ def decode(c)
42
+ # TODO: implement the conversion
43
+ Error.assert_equal(c.class, Fixnum)
44
+ @map[c]
45
+ end
46
+
47
+ end
48
+ end
@@ -55,6 +55,12 @@ class PDF::Reader
55
55
  # puts params.inspect
56
56
  #
57
57
  # == Text Callbacks
58
+ #
59
+ # All text passed into these callbacks will be encoded as UTF-8. Depending on where (and when) the
60
+ # PDF was generated, there's a good chance the text is NOT stored as UTF-8 internally so be careful
61
+ # when doing a comparison on strings returned from PDF::Reader (when doing unit tests for example). The
62
+ # string may not be byte-by-byte identical with the string that was originally written to the PDF.
63
+ #
58
64
  # - end_text_object
59
65
  # - move_to_start_of_next_line
60
66
  # - set_character_spacing
@@ -221,6 +227,7 @@ class PDF::Reader
221
227
  def initialize (receiver, xref)
222
228
  @receiver = receiver
223
229
  @xref = xref
230
+ @fonts ||= {}
224
231
  end
225
232
  ################################################################################
226
233
  # Begin processing the document
@@ -233,6 +240,9 @@ class PDF::Reader
233
240
  # Walk over all pages in the PDF file, calling the appropriate callbacks for each page and all
234
241
  # its content
235
242
  def walk_pages (page)
243
+ resolve_resources(@xref.object(page['Resources'])) if page['Resources']
244
+
245
+ # extract page content
236
246
  if page['Type'] == "Pages"
237
247
  callback(:begin_page_container, [page])
238
248
  page['Kids'].each {|child| walk_pages(@xref.object(child))}
@@ -262,7 +272,12 @@ class PDF::Reader
262
272
  token = @parser.parse_token(OPERATORS)
263
273
 
264
274
  if token.kind_of?(Token) and OPERATORS.has_key?(token)
265
- resolve_resources
275
+ @current_font = @params.first if OPERATORS[token] == :set_text_font_and_size
276
+
277
+ # convert any text to utf-8
278
+ if OPERATORS[token].to_s.include?("show_text") && @fonts[@current_font]
279
+ @params = @fonts[@current_font].to_utf8(@params)
280
+ end
266
281
  callback(OPERATORS[token], @params)
267
282
  @params.clear
268
283
  break
@@ -274,8 +289,27 @@ class PDF::Reader
274
289
  rescue EOFError => e
275
290
  end
276
291
  ################################################################################
277
- def resolve_resources
278
- # FIXME TODO
292
+ def resolve_resources(resources)
293
+ # extract any font information
294
+ if resources['Font']
295
+ @xref.object(resources['Font']).each do |label, desc|
296
+ desc = @xref.object(desc)
297
+ @fonts[label] = PDF::Reader::Font.new
298
+ @fonts[label].label = label
299
+ @fonts[label].subtype = desc['Subtype'] if desc['Subtype']
300
+ @fonts[label].basefont = desc['BaseFont'] if desc['BaseFont']
301
+ @fonts[label].encoding = PDF::Reader::Encoding.factory(@xref.object(desc['Encoding']))
302
+ @fonts[label].descendantfonts = desc['DescendantFonts'] if desc['DescendantFonts']
303
+ if desc['ToUnicode']
304
+ @fonts[label].tounicode = desc['ToUnicode']
305
+ @fonts[label].tounicode = @xref.object(@fonts[label].tounicode)
306
+ end
307
+ end
308
+ end
309
+ #@fonts.each do |key,val|
310
+ # puts "#{key}: #{val.inspect}"
311
+ # puts
312
+ #end
279
313
  end
280
314
  ################################################################################
281
315
  # calls the name callback method on the receiver class with params as the arguments
@@ -0,0 +1,1012 @@
1
+ ################################################################################
2
+ #
3
+ # Copyright (C) 2008 James Healy (jimmy@deefa.com)
4
+ #
5
+ # Permission is hereby granted, free of charge, to any person obtaining
6
+ # a copy of this software and associated documentation files (the
7
+ # "Software"), to deal in the Software without restriction, including
8
+ # without limitation the rights to use, copy, modify, merge, publish,
9
+ # distribute, sublicense, and/or sell copies of the Software, and to
10
+ # permit persons to whom the Software is furnished to do so, subject to
11
+ # the following conditions:
12
+ #
13
+ # The above copyright notice and this permission notice shall be
14
+ # included in all copies or substantial portions of the Software.
15
+ #
16
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ # EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ # MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ # LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ # WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23
+ #
24
+ ################################################################################
25
+
26
+ require 'enumerator'
27
+
28
+ class PDF::Reader
29
+ class Encoding
30
+
31
+ attr_reader :differences
32
+
33
+ # set the differences table for this encoding. should be an array in the following format:
34
+ #
35
+ # [25, "A", 26, "B"]
36
+ #
37
+ # The array alternates bewteen a decimal byte number and a glyph name to map to that byte
38
+ #
39
+ # To save space the following array is also valid and equivilant to the previous one
40
+ #
41
+ # [25, "A", "B"]
42
+ def differences=(diff)
43
+ raise ArgumentError, "diff must be an array" unless diff.kind_of?(Array)
44
+
45
+ @differences = {}
46
+ byte = 0
47
+ diff.each do |val|
48
+ if val.kind_of?(Numeric)
49
+ byte = val.to_i
50
+ else
51
+ @differences[byte] = val
52
+ byte += 1
53
+ end
54
+ end
55
+ @differences
56
+ end
57
+
58
+ # Takes the "Encoding" value of a Font dictionary and builds a PDF::Reader::Encoding object
59
+ def self.factory(enc)
60
+ if enc.kind_of?(Hash)
61
+ diff = enc['Differences']
62
+ enc = enc['Encoding'] || enc['BaseEncoding']
63
+ elsif enc != nil
64
+ enc = enc.to_s
65
+ end
66
+
67
+ case enc
68
+ when nil then enc = PDF::Reader::Encoding::StandardEncoding.new
69
+ when "Identity-H" then enc = PDF::Reader::Encoding::IdentityH.new
70
+ when "MacRomanEncoding" then enc = PDF::Reader::Encoding::MacRomanEncoding.new
71
+ when "MacExpertEncoding" then enc = PDF::Reader::Encoding::MacExpertEncoding.new
72
+ when "StandardEncoding" then enc = PDF::Reader::Encoding::StandardEncoding.new
73
+ when "SymbolEncoding" then enc = PDF::Reader::Encoding::SymbolEncoding.new
74
+ when "WinAnsiEncoding" then enc = PDF::Reader::Encoding::WinAnsiEncoding.new
75
+ when "ZapfDingbatsEncoding" then enc = PDF::Reader::Encoding::ZapfDingbatsEncoding.new
76
+ else raise UnsupportedFeatureError, "#{enc} is not currently a supported encoding"
77
+ end
78
+
79
+ enc.differences = diff if enc && diff
80
+
81
+ return enc
82
+ end
83
+
84
+ def to_utf8(str, tounicode = nil)
85
+ # abstract method, of sorts
86
+ raise RuntimeError, "Called abstract method"
87
+ end
88
+
89
+ # accepts an array of byte numbers, and replaces any that have entries in the differences table
90
+ # with a glyph name
91
+ def process_differences(arr)
92
+ @differences ||= {}
93
+ arr.collect! { |n| @differences[n].nil? ? n : @differences[n]}
94
+ end
95
+ protected :process_differences
96
+
97
+ # accepts an array of unicode code points and glyphnames, and converts any glyph names to codepoints
98
+ def process_glyphnames(arr)
99
+ @differences ||= {}
100
+ arr.collect! { |n| n.kind_of?(Numeric) ? n : PDF::Reader::Font.glyphnames[n]}
101
+ end
102
+ protected :process_glyphnames
103
+
104
+ class IdentityH < Encoding
105
+ def to_utf8(str, map = nil)
106
+ raise ArgumentError, "a ToUnicode cmap is required to decode an IdentityH string" if map.nil?
107
+
108
+ array_enc = []
109
+
110
+ # iterate over string, reading it in 2 byte chunks and interpreting those
111
+ # chunks as ints
112
+ str.unpack("n*").each do |c|
113
+ # convert the int to a unicode codepoint
114
+ array_enc << map.decode(c)
115
+ end
116
+
117
+ # pack all our Unicode codepoints into a UTF-8 string
118
+ ret = array_enc.pack("U*")
119
+
120
+ # set the strings encoding correctly under ruby 1.9+
121
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
122
+
123
+ return ret
124
+ end
125
+ end
126
+
127
+ class MacExpertEncoding < Encoding
128
+ # convert a MacExpertEncoding string into UTF-8
129
+ def to_utf8(str, tounicode = nil)
130
+ array_expert = str.unpack('C*')
131
+ array_expert = self.process_differences(array_expert)
132
+ array_enc = []
133
+ array_expert.each do |num|
134
+ case num
135
+ # change necesary characters to equivilant Unicode codepoints
136
+ when 0x21; array_enc << 0xF721
137
+ when 0x22; array_enc << 0xF6F8 # Hungarumlautsmall
138
+ when 0x23; array_enc << 0xF7A2
139
+ when 0x24; array_enc << 0xF724
140
+ when 0x25; array_enc << 0xF6E4
141
+ when 0x26; array_enc << 0xF726
142
+ when 0x27; array_enc << 0xF7B4
143
+ when 0x28; array_enc << 0x207D
144
+ when 0x29; array_enc << 0xF07E
145
+ when 0x2A; array_enc << 0x2025
146
+ when 0x2B; array_enc << 0x2024
147
+ when 0x2F; array_enc << 0x2044
148
+ when 0x30; array_enc << 0xF730
149
+ when 0x31; array_enc << 0xF731
150
+ when 0x32; array_enc << 0xF732
151
+ when 0x33; array_enc << 0xF733
152
+ when 0x34; array_enc << 0xF734
153
+ when 0x35; array_enc << 0xF735
154
+ when 0x36; array_enc << 0xF736
155
+ when 0x37; array_enc << 0xF737
156
+ when 0x38; array_enc << 0xF738
157
+ when 0x39; array_enc << 0xF739
158
+ when 0x3D; array_enc << 0xF6DE
159
+ when 0x3F; array_enc << 0xF73F
160
+ when 0x44; array_enc << 0xF7F0
161
+ when 0x47; array_enc << 0x00BC
162
+ when 0x48; array_enc << 0x00BD
163
+ when 0x49; array_enc << 0x00BE
164
+ when 0x4A; array_enc << 0x215B
165
+ when 0x4B; array_enc << 0x215C
166
+ when 0x4C; array_enc << 0x215D
167
+ when 0x4D; array_enc << 0x215E
168
+ when 0x4E; array_enc << 0x2153
169
+ when 0x4F; array_enc << 0x2154
170
+ when 0x56; array_enc << 0xFB00
171
+ when 0x57; array_enc << 0xFB01
172
+ when 0x58; array_enc << 0xFB02
173
+ when 0x59; array_enc << 0xFB03
174
+ when 0x5A; array_enc << 0xFB04
175
+ when 0x5B; array_enc << 0x208D
176
+ when 0x5D; array_enc << 0x208E
177
+ when 0x5E; array_enc << 0xF6F6
178
+ when 0x5F; array_enc << 0xF6E5
179
+ when 0x60; array_enc << 0xF760
180
+ when 0x61; array_enc << 0xF761
181
+ when 0x62; array_enc << 0xF762
182
+ when 0x63; array_enc << 0xF763
183
+ when 0x64; array_enc << 0xF764
184
+ when 0x65; array_enc << 0xF765
185
+ when 0x66; array_enc << 0xF766
186
+ when 0x67; array_enc << 0xF767
187
+ when 0x68; array_enc << 0xF768
188
+ when 0x69; array_enc << 0xF769
189
+ when 0x6A; array_enc << 0xF76A
190
+ when 0x6B; array_enc << 0xF76B
191
+ when 0x6C; array_enc << 0xF76C
192
+ when 0x6D; array_enc << 0xF76D
193
+ when 0x6E; array_enc << 0xF76E
194
+ when 0x6F; array_enc << 0xF76F
195
+ when 0x70; array_enc << 0xF770
196
+ when 0x71; array_enc << 0xF771
197
+ when 0x72; array_enc << 0xF772
198
+ when 0x73; array_enc << 0xF773
199
+ when 0x74; array_enc << 0xF774
200
+ when 0x75; array_enc << 0xF775
201
+ when 0x76; array_enc << 0xF776
202
+ when 0x77; array_enc << 0xF777
203
+ when 0x78; array_enc << 0xF778
204
+ when 0x79; array_enc << 0xF779
205
+ when 0x7A; array_enc << 0xF77A
206
+ when 0x7B; array_enc << 0x20A1
207
+ when 0x7C; array_enc << 0xF6DC
208
+ when 0x7D; array_enc << 0xF6DD
209
+ when 0x7E; array_enc << 0xF6FE
210
+ when 0x81; array_enc << 0xF6E9
211
+ when 0x82; array_enc << 0xF6E0
212
+ when 0x87; array_enc << 0xF7E1 # Acircumflexsmall
213
+ when 0x88; array_enc << 0xF7E0
214
+ when 0x89; array_enc << 0xF7E2 # Acutesmall
215
+ when 0x8A; array_enc << 0xF7E4
216
+ when 0x8B; array_enc << 0xF7E3
217
+ when 0x8C; array_enc << 0xF7E5
218
+ when 0x8D; array_enc << 0xF7E7
219
+ when 0x8E; array_enc << 0xF7E9
220
+ when 0x8F; array_enc << 0xF7E8
221
+ when 0x90; array_enc << 0xF7E4
222
+ when 0x91; array_enc << 0xF7EB
223
+ when 0x92; array_enc << 0xF7ED
224
+ when 0x93; array_enc << 0xF7EC
225
+ when 0x94; array_enc << 0xF7EE
226
+ when 0x95; array_enc << 0xF7EF
227
+ when 0x96; array_enc << 0xF7F1
228
+ when 0x97; array_enc << 0xF7F3
229
+ when 0x98; array_enc << 0xF7F2
230
+ when 0x99; array_enc << 0xF7F4
231
+ when 0x9A; array_enc << 0xF7F6
232
+ when 0x9B; array_enc << 0xF7F5
233
+ when 0x9C; array_enc << 0xF7FA
234
+ when 0x9D; array_enc << 0xF7F9
235
+ when 0x9E; array_enc << 0xF7FB
236
+ when 0x9F; array_enc << 0xF7FC
237
+ when 0xA1; array_enc << 0x2078
238
+ when 0xA2; array_enc << 0x2084
239
+ when 0xA3; array_enc << 0x2083
240
+ when 0xA4; array_enc << 0x2086
241
+ when 0xA5; array_enc << 0x2088
242
+ when 0xA6; array_enc << 0x2087
243
+ when 0xA7; array_enc << 0xF6FD
244
+ when 0xA9; array_enc << 0xF6DF
245
+ when 0xAA; array_enc << 0x2082
246
+ when 0xAC; array_enc << 0xF7A8
247
+ when 0xAE; array_enc << 0xF6F5
248
+ when 0xAF; array_enc << 0xF6F0
249
+ when 0xB0; array_enc << 0x2085
250
+ when 0xB2; array_enc << 0xF6E1
251
+ when 0xB3; array_enc << 0xF6E7
252
+ when 0xB4; array_enc << 0xF7FD
253
+ when 0xB6; array_enc << 0xF6E3
254
+ when 0xB9; array_enc << 0xF7FE
255
+ when 0xBB; array_enc << 0x2089
256
+ when 0xBC; array_enc << 0x2080
257
+ when 0xBD; array_enc << 0xF6FF
258
+ when 0xBE; array_enc << 0xF7E6 # AEsmall
259
+ when 0xBF; array_enc << 0xF7F8
260
+ when 0xC0; array_enc << 0xF7BF
261
+ when 0xC1; array_enc << 0x2081
262
+ when 0xC2; array_enc << 0xF6F9
263
+ when 0xC9; array_enc << 0xF7B8
264
+ when 0xCF; array_enc << 0xF6FA
265
+ when 0xD0; array_enc << 0x2012
266
+ when 0xD1; array_enc << 0xF6E6
267
+ when 0xD6; array_enc << 0xF7A1
268
+ when 0xD8; array_enc << 0xF7FF
269
+ when 0xDA; array_enc << 0x00B9
270
+ when 0xDB; array_enc << 0x00B2
271
+ when 0xDC; array_enc << 0x00B3
272
+ when 0xDD; array_enc << 0x2074
273
+ when 0xDE; array_enc << 0x2075
274
+ when 0xDF; array_enc << 0x2076
275
+ when 0xE0; array_enc << 0x2077
276
+ when 0xE1; array_enc << 0x2079
277
+ when 0xE2; array_enc << 0x2070
278
+ when 0xE4; array_enc << 0xF6EC
279
+ when 0xE5; array_enc << 0xF6F1
280
+ when 0xE6; array_enc << 0xF6F3
281
+ when 0xE9; array_enc << 0xF6ED
282
+ when 0xEA; array_enc << 0xF6F2
283
+ when 0xEB; array_enc << 0xF6EB
284
+ when 0xF1; array_enc << 0xF6EE
285
+ when 0xF2; array_enc << 0xF6FB
286
+ when 0xF3; array_enc << 0xF6F4
287
+ when 0xF4; array_enc << 0xF7AF
288
+ when 0xF5; array_enc << 0xF6EF
289
+ when 0xF6; array_enc << 0x207F
290
+ when 0xF7; array_enc << 0xF6EF
291
+ when 0xF8; array_enc << 0xF6E2
292
+ when 0xF9; array_enc << 0xF6E8
293
+ when 0xFA; array_enc << 0xF6F7
294
+ when 0xFB; array_enc << 0xF6FC
295
+ else
296
+ array_enc << num
297
+ end
298
+ end
299
+
300
+ # convert any glyph names to unicode codepoints
301
+ array_enc = self.process_glyphnames(array_enc)
302
+
303
+ # pack all our Unicode codepoints into a UTF-8 string
304
+ ret = array_enc.pack("U*")
305
+
306
+ # set the strings encoding correctly under ruby 1.9+
307
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
308
+
309
+ return ret
310
+ end
311
+ end
312
+
313
+ # The default encoding for OSX <= v9
314
+ # see: http://en.wikipedia.org/wiki/Mac_OS_Roman
315
+ class MacRomanEncoding < Encoding
316
+ # convert a MacRomanEncoding string into UTF-8
317
+ def to_utf8(str, tounicode = nil)
318
+ # content of this method borrowed from REXML::Encoding.decode_cp1252
319
+ array_mac = str.unpack('C*')
320
+ array_mac = self.process_differences(array_mac)
321
+ array_enc = []
322
+ array_mac.each do |num|
323
+ case num
324
+ # change necesary characters to equivilant Unicode codepoints
325
+ when 0x80; array_enc << 0x00C4
326
+ when 0x81; array_enc << 0x00C5
327
+ when 0x82; array_enc << 0x00C7
328
+ when 0x83; array_enc << 0x00C9
329
+ when 0x84; array_enc << 0x00D1
330
+ when 0x85; array_enc << 0x00D6
331
+ when 0x86; array_enc << 0x00DC
332
+ when 0x87; array_enc << 0x00E1
333
+ when 0x88; array_enc << 0x00E0
334
+ when 0x89; array_enc << 0x00E2
335
+ when 0x8A; array_enc << 0x00E4
336
+ when 0x8B; array_enc << 0x00E3
337
+ when 0x8C; array_enc << 0x00E5
338
+ when 0x8D; array_enc << 0x00E7
339
+ when 0x8E; array_enc << 0x00E9
340
+ when 0x8F; array_enc << 0x00E8
341
+ when 0x90; array_enc << 0x00EA
342
+ when 0x91; array_enc << 0x00EB
343
+ when 0x92; array_enc << 0x00ED
344
+ when 0x93; array_enc << 0x00EC
345
+ when 0x94; array_enc << 0x00EE
346
+ when 0x95; array_enc << 0x00EF
347
+ when 0x96; array_enc << 0x00F1
348
+ when 0x97; array_enc << 0x00F3
349
+ when 0x98; array_enc << 0x00F2
350
+ when 0x99; array_enc << 0x00F4
351
+ when 0x9A; array_enc << 0x00F6
352
+ when 0x9B; array_enc << 0x00F5
353
+ when 0x9C; array_enc << 0x00FA
354
+ when 0x9D; array_enc << 0x00F9
355
+ when 0x9E; array_enc << 0x00FB
356
+ when 0x9F; array_enc << 0x00FC
357
+ when 0xA0; array_enc << 0x2020
358
+ when 0xA1; array_enc << 0x00B0
359
+ when 0xA2; array_enc << 0x00A2
360
+ when 0xA3; array_enc << 0x00A3
361
+ when 0xA4; array_enc << 0x00A7
362
+ when 0xA5; array_enc << 0x2022
363
+ when 0xA6; array_enc << 0x00B6
364
+ when 0xA7; array_enc << 0x00DF
365
+ when 0xA8; array_enc << 0x00AE
366
+ when 0xA9; array_enc << 0x00A9
367
+ when 0xAA; array_enc << 0x2122
368
+ when 0xAB; array_enc << 0x00B4
369
+ when 0xAC; array_enc << 0x00A8
370
+ when 0xAD; array_enc << 0x2260
371
+ when 0xAE; array_enc << 0x00C6
372
+ when 0xAF; array_enc << 0x00D8
373
+ when 0xB0; array_enc << 0x221E
374
+ when 0xB1; array_enc << 0x00B1
375
+ when 0xB2; array_enc << 0x2264
376
+ when 0xB3; array_enc << 0x2265
377
+ when 0xB4; array_enc << 0x00A5
378
+ when 0xB5; array_enc << 0x00B5
379
+ when 0xB6; array_enc << 0x2202
380
+ when 0xB7; array_enc << 0x2211
381
+ when 0xB8; array_enc << 0x220F
382
+ when 0xB9; array_enc << 0x03C0
383
+ when 0xBA; array_enc << 0x222B
384
+ when 0xBB; array_enc << 0x00AA
385
+ when 0xBC; array_enc << 0x00BA
386
+ when 0xBD; array_enc << 0x03A9
387
+ when 0xBE; array_enc << 0x00E6
388
+ when 0xBF; array_enc << 0x00F8
389
+ when 0xC0; array_enc << 0x00BF
390
+ when 0xC1; array_enc << 0x00A1
391
+ when 0xC2; array_enc << 0x00AC
392
+ when 0xC3; array_enc << 0x221A
393
+ when 0xC4; array_enc << 0x0192
394
+ when 0xC5; array_enc << 0x2248
395
+ when 0xC6; array_enc << 0x2206
396
+ when 0xC7; array_enc << 0x00AB
397
+ when 0xC8; array_enc << 0x00BB
398
+ when 0xC9; array_enc << 0x2026
399
+ when 0xCA; array_enc << 0x00A0
400
+ when 0xCB; array_enc << 0x00C0
401
+ when 0xCC; array_enc << 0x00C3
402
+ when 0xCD; array_enc << 0x00D5
403
+ when 0xCE; array_enc << 0x0152
404
+ when 0xCF; array_enc << 0x0153
405
+ when 0xD0; array_enc << 0x2013
406
+ when 0xD1; array_enc << 0x2014
407
+ when 0xD2; array_enc << 0x201C
408
+ when 0xD3; array_enc << 0x201D
409
+ when 0xD4; array_enc << 0x2018
410
+ when 0xD5; array_enc << 0x2019
411
+ when 0xD6; array_enc << 0x00F7
412
+ when 0xD7; array_enc << 0x25CA
413
+ when 0xD8; array_enc << 0x00FF
414
+ when 0xD9; array_enc << 0x0178
415
+ when 0xDA; array_enc << 0x2044
416
+ when 0xDB; array_enc << 0x20AC
417
+ when 0xDC; array_enc << 0x2039
418
+ when 0xDD; array_enc << 0x203A
419
+ when 0xDE; array_enc << 0xFB01
420
+ when 0xDF; array_enc << 0xFB02
421
+ when 0xE0; array_enc << 0x2021
422
+ when 0xE1; array_enc << 0x00B7
423
+ when 0xE2; array_enc << 0x201A
424
+ when 0xE3; array_enc << 0x201E
425
+ when 0xE4; array_enc << 0x2030
426
+ when 0xE5; array_enc << 0x00C2
427
+ when 0xE6; array_enc << 0x00CA
428
+ when 0xE7; array_enc << 0x00C1
429
+ when 0xE8; array_enc << 0x00CB
430
+ when 0xE9; array_enc << 0x00C8
431
+ when 0xEA; array_enc << 0x00CD
432
+ when 0xEB; array_enc << 0x00CE
433
+ when 0xEC; array_enc << 0x00CF
434
+ when 0xED; array_enc << 0x00CC
435
+ when 0xEE; array_enc << 0x00D3
436
+ when 0xEF; array_enc << 0x00D4
437
+ when 0xF0; array_enc << 0xF8FF
438
+ when 0xF1; array_enc << 0x00D2
439
+ when 0xF2; array_enc << 0x00DA
440
+ when 0xF3; array_enc << 0x00D8
441
+ when 0xF4; array_enc << 0x00D9
442
+ when 0xF5; array_enc << 0x0131
443
+ when 0xF6; array_enc << 0x02C6
444
+ when 0xF7; array_enc << 0x02DC
445
+ when 0xF8; array_enc << 0x00AF
446
+ when 0xF9; array_enc << 0x02D8
447
+ when 0xFA; array_enc << 0x02D9
448
+ when 0xFB; array_enc << 0x02DA
449
+ when 0xFC; array_enc << 0x00B8
450
+ when 0xFD; array_enc << 0x02DD
451
+ when 0xFE; array_enc << 0x02DB
452
+ when 0xFF; array_enc << 0x02C7
453
+ else
454
+ array_enc << num
455
+ end
456
+ end
457
+
458
+ # convert any glyph names to unicode codepoints
459
+ array_enc = self.process_glyphnames(array_enc)
460
+
461
+ # pack all our Unicode codepoints into a UTF-8 string
462
+ ret = array_enc.pack("U*")
463
+
464
+ # set the strings encoding correctly under ruby 1.9+
465
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
466
+
467
+ return ret
468
+ end
469
+ end
470
+
471
+ class StandardEncoding < Encoding
472
+ # convert an Adobe Standard Encoding string into UTF-8
473
+ def to_utf8(str, tounicode = nil)
474
+ # based on mapping described at:
475
+ # http://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/stdenc.txt
476
+ array_std = str.unpack('C*')
477
+ array_std = self.process_differences(array_std)
478
+ array_enc = []
479
+ array_std.each do |num|
480
+ case num
481
+ when 0x27; array_enc << 0x2019
482
+ when 0x60; array_enc << 0x2018
483
+ when 0xA4; array_enc << 0x2044
484
+ when 0xA6; array_enc << 0x0192
485
+ when 0xA8; array_enc << 0x00A4
486
+ when 0xA9; array_enc << 0x0027
487
+ when 0xAA; array_enc << 0x201C
488
+ when 0xAC; array_enc << 0x2039
489
+ when 0xAD; array_enc << 0x203A
490
+ when 0xAE; array_enc << 0xFB01
491
+ when 0xAF; array_enc << 0xFB02
492
+ when 0xB1; array_enc << 0x2013
493
+ when 0xB2; array_enc << 0x2020
494
+ when 0xB3; array_enc << 0x2021
495
+ when 0xB4; array_enc << 0x00B7
496
+ when 0xB7; array_enc << 0x2022
497
+ when 0xB8; array_enc << 0x201A
498
+ when 0xB9; array_enc << 0x201E
499
+ when 0xBA; array_enc << 0x201D
500
+ when 0xBC; array_enc << 0x2026
501
+ when 0xBD; array_enc << 0x2030
502
+ when 0xC1; array_enc << 0x0060
503
+ when 0xC2; array_enc << 0x00B4
504
+ when 0xC3; array_enc << 0x02C6
505
+ when 0xC4; array_enc << 0x02DC
506
+ when 0xC5; array_enc << 0x00AF
507
+ when 0xC6; array_enc << 0x02D8
508
+ when 0xC7; array_enc << 0x02D9
509
+ when 0xC8; array_enc << 0x00A8
510
+ when 0xCA; array_enc << 0x02DA
511
+ when 0xCB; array_enc << 0x00B8
512
+ when 0xCD; array_enc << 0x02DD
513
+ when 0xCE; array_enc << 0x02DB
514
+ when 0xCF; array_enc << 0x02C7
515
+ when 0xD0; array_enc << 0x2014
516
+ when 0xE1; array_enc << 0x00C6
517
+ when 0xE3; array_enc << 0x00AA
518
+ when 0xE8; array_enc << 0x0141
519
+ when 0xE9; array_enc << 0x00D8
520
+ when 0xEA; array_enc << 0x0152
521
+ when 0xEB; array_enc << 0x00BA
522
+ when 0xF1; array_enc << 0x00E6
523
+ when 0xF5; array_enc << 0x0131
524
+ when 0xF8; array_enc << 0x0142
525
+ when 0xF9; array_enc << 0x00F8
526
+ when 0xFA; array_enc << 0x0153
527
+ when 0xFB; array_enc << 0x00DF
528
+ else
529
+ array_enc << num
530
+ end
531
+ end
532
+
533
+ # convert any glyph names to unicode codepoints
534
+ array_enc = self.process_glyphnames(array_enc)
535
+
536
+ # pack all our Unicode codepoints into a UTF-8 string
537
+ ret = array_enc.pack("U*")
538
+
539
+ # set the strings encoding correctly under ruby 1.9+
540
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
541
+
542
+ return ret
543
+ end
544
+ end
545
+
546
+ class SymbolEncoding < Encoding
547
+ # convert a SymbolEncoding string into UTF-8
548
+ def to_utf8(str, tounicode = nil)
549
+ array_symbol = str.unpack('C*')
550
+ array_symbol = self.process_differences(array_symbol)
551
+ array_enc = []
552
+ array_symbol.each do |num|
553
+ case num
554
+ when 0x22; array_enc << 0x2200
555
+ when 0x24; array_enc << 0x2203
556
+ when 0x27; array_enc << 0x220B
557
+ when 0x2A; array_enc << 0x2217
558
+ when 0x2D; array_enc << 0x2212
559
+ when 0x40; array_enc << 0x2245
560
+ when 0x41; array_enc << 0x0391
561
+ when 0x42; array_enc << 0x0392
562
+ when 0x43; array_enc << 0x03A7
563
+ when 0x44; array_enc << 0x0394
564
+ when 0x45; array_enc << 0x0395
565
+ when 0x46; array_enc << 0x03A6
566
+ when 0x47; array_enc << 0x0393
567
+ when 0x48; array_enc << 0x0397
568
+ when 0x49; array_enc << 0x0399
569
+ when 0x4A; array_enc << 0x03D1
570
+ when 0x4B; array_enc << 0x039A
571
+ when 0x4C; array_enc << 0x039B
572
+ when 0x4D; array_enc << 0x039C
573
+ when 0x4E; array_enc << 0x039D
574
+ when 0x4F; array_enc << 0x039F
575
+ when 0x50; array_enc << 0x03A0
576
+ when 0x51; array_enc << 0x0398
577
+ when 0x52; array_enc << 0x03A1
578
+ when 0x53; array_enc << 0x03A3
579
+ when 0x54; array_enc << 0x03A4
580
+ when 0x55; array_enc << 0x03A5
581
+ when 0x56; array_enc << 0x03C2
582
+ when 0x57; array_enc << 0x03A9
583
+ when 0x58; array_enc << 0x039E
584
+ when 0x59; array_enc << 0x03A8
585
+ when 0x5A; array_enc << 0x0396
586
+ when 0x5C; array_enc << 0x2234
587
+ when 0x5E; array_enc << 0x22A5
588
+ when 0x60; array_enc << 0xF8E5
589
+ when 0x61; array_enc << 0x03B1
590
+ when 0x62; array_enc << 0x03B2
591
+ when 0x63; array_enc << 0x03C7
592
+ when 0x64; array_enc << 0x03B4
593
+ when 0x65; array_enc << 0x03B5
594
+ when 0x66; array_enc << 0x03C6
595
+ when 0x67; array_enc << 0x03B3
596
+ when 0x68; array_enc << 0x03B7
597
+ when 0x69; array_enc << 0x03B9
598
+ when 0x6A; array_enc << 0x03D5
599
+ when 0x6B; array_enc << 0x03BA
600
+ when 0x6C; array_enc << 0x03BB
601
+ when 0x6D; array_enc << 0x03BC
602
+ when 0x6E; array_enc << 0x03BD
603
+ when 0x6F; array_enc << 0x03BF
604
+ when 0x70; array_enc << 0x03C0
605
+ when 0x71; array_enc << 0x03B8
606
+ when 0x72; array_enc << 0x03C1
607
+ when 0x73; array_enc << 0x03C3
608
+ when 0x74; array_enc << 0x03C4
609
+ when 0x75; array_enc << 0x03C5
610
+ when 0x76; array_enc << 0x03D6
611
+ when 0x77; array_enc << 0x03C9
612
+ when 0x78; array_enc << 0x03BE
613
+ when 0x79; array_enc << 0x03C8
614
+ when 0x7A; array_enc << 0x03B6
615
+ when 0x7E; array_enc << 0x223C
616
+ when 0xA0; array_enc << 0x20AC
617
+ when 0xA1; array_enc << 0x03D2
618
+ when 0xA2; array_enc << 0x2032
619
+ when 0xA3; array_enc << 0x2264
620
+ when 0xA4; array_enc << 0x2215
621
+ when 0xA5; array_enc << 0x221E
622
+ when 0xA6; array_enc << 0x0192
623
+ when 0xA7; array_enc << 0x2663
624
+ when 0xA8; array_enc << 0x2666
625
+ when 0xA9; array_enc << 0x2665
626
+ when 0xAA; array_enc << 0x2660
627
+ when 0xAB; array_enc << 0x2194
628
+ when 0xAC; array_enc << 0x2190
629
+ when 0xAD; array_enc << 0x2191
630
+ when 0xAE; array_enc << 0x2192
631
+ when 0xAF; array_enc << 0x2193
632
+ when 0xB2; array_enc << 0x2033
633
+ when 0xB3; array_enc << 0x2265
634
+ when 0xB4; array_enc << 0x00D7
635
+ when 0xB5; array_enc << 0x221D
636
+ when 0xB6; array_enc << 0x2202
637
+ when 0xB7; array_enc << 0x2022
638
+ when 0xB8; array_enc << 0x00F7
639
+ when 0xB9; array_enc << 0x2260
640
+ when 0xBA; array_enc << 0x2261
641
+ when 0xBB; array_enc << 0x2248
642
+ when 0xBC; array_enc << 0x2026
643
+ when 0xBD; array_enc << 0xF8E6
644
+ when 0xBE; array_enc << 0xF8E7
645
+ when 0xBF; array_enc << 0x21B5
646
+ when 0xC0; array_enc << 0x2135
647
+ when 0xC1; array_enc << 0x2111
648
+ when 0xC2; array_enc << 0x211C
649
+ when 0xC3; array_enc << 0x2118
650
+ when 0xC4; array_enc << 0x2297
651
+ when 0xC5; array_enc << 0x2295
652
+ when 0xC6; array_enc << 0x2205
653
+ when 0xC7; array_enc << 0x2229
654
+ when 0xC8; array_enc << 0x222A
655
+ when 0xC9; array_enc << 0x2283
656
+ when 0xCA; array_enc << 0x2287
657
+ when 0xCB; array_enc << 0x2284
658
+ when 0xCC; array_enc << 0x2282
659
+ when 0xCD; array_enc << 0x2286
660
+ when 0xCE; array_enc << 0x2208
661
+ when 0xCF; array_enc << 0x2209
662
+ when 0xD0; array_enc << 0x2220
663
+ when 0xD1; array_enc << 0x2207
664
+ when 0xD2; array_enc << 0xF6DA
665
+ when 0xD3; array_enc << 0xF6D9
666
+ when 0xD4; array_enc << 0xF6DB
667
+ when 0xD5; array_enc << 0x220F
668
+ when 0xD6; array_enc << 0x221A
669
+ when 0xD7; array_enc << 0x22C5
670
+ when 0xD8; array_enc << 0x00AC
671
+ when 0xD9; array_enc << 0x2227
672
+ when 0xDA; array_enc << 0x2228
673
+ when 0xDB; array_enc << 0x21D4
674
+ when 0xDC; array_enc << 0x21D0
675
+ when 0xDD; array_enc << 0x21D1
676
+ when 0xDE; array_enc << 0x21D2
677
+ when 0xDF; array_enc << 0x21D3
678
+ when 0xE0; array_enc << 0x25CA
679
+ when 0xE1; array_enc << 0x2329
680
+ when 0xE2; array_enc << 0xF8E8
681
+ when 0xE3; array_enc << 0xF8E9
682
+ when 0xE4; array_enc << 0xF8EA
683
+ when 0xE5; array_enc << 0x2211
684
+ when 0xE6; array_enc << 0xF8EB
685
+ when 0xE7; array_enc << 0xF8EC
686
+ when 0xE8; array_enc << 0xF8ED
687
+ when 0xE9; array_enc << 0xF8EE
688
+ when 0xEA; array_enc << 0xF8EF
689
+ when 0xEB; array_enc << 0xF8F0
690
+ when 0xEC; array_enc << 0xF8F1
691
+ when 0xED; array_enc << 0xF8F2
692
+ when 0xEE; array_enc << 0xF8F3
693
+ when 0xEF; array_enc << 0xF8F4
694
+ when 0xF1; array_enc << 0x232A
695
+ when 0xF2; array_enc << 0x222B
696
+ when 0xF3; array_enc << 0x2320
697
+ when 0xF4; array_enc << 0xF8F5
698
+ when 0xF5; array_enc << 0x2321
699
+ when 0xF6; array_enc << 0xF8F6
700
+ when 0xF7; array_enc << 0xF8F7
701
+ when 0xF8; array_enc << 0xF8F8
702
+ when 0xF9; array_enc << 0xF8F9
703
+ when 0xFA; array_enc << 0xF8FA
704
+ when 0xFB; array_enc << 0xF8FB
705
+ when 0xFC; array_enc << 0xF8FC
706
+ when 0xFD; array_enc << 0xF8FD
707
+ when 0xFE; array_enc << 0xF8FE
708
+ else
709
+ array_enc << num
710
+ end
711
+ end
712
+
713
+ # convert any glyph names to unicode codepoints
714
+ array_enc = self.process_glyphnames(array_enc)
715
+
716
+ # pack all our Unicode codepoints into a UTF-8 string
717
+ ret = array_enc.pack("U*")
718
+
719
+ # set the strings encoding correctly under ruby 1.9+
720
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
721
+
722
+ return ret
723
+ end
724
+ end
725
+
726
+ class WinAnsiEncoding < Encoding
727
+ # convert a WinAnsiEncoding string into UTF-8
728
+ def to_utf8(str, tounicode = nil)
729
+ # content of this method borrowed from REXML::Encoding.decode_cp1252
730
+ # for further reading:
731
+ # http://www.intertwingly.net/stories/2004/04/14/i18n.html
732
+ array_latin9 = str.unpack('C*')
733
+ array_latin9 = self.process_differences(array_latin9)
734
+ array_enc = []
735
+ array_latin9.each do |num|
736
+ case num
737
+ # characters that added compared to iso-8859-1
738
+ when 0x80; array_enc << 0x20AC # 0xe2 0x82 0xac
739
+ when 0x82; array_enc << 0x201A # 0xe2 0x82 0x9a
740
+ when 0x83; array_enc << 0x0192 # 0xc6 0x92
741
+ when 0x84; array_enc << 0x201E # 0xe2 0x82 0x9e
742
+ when 0x85; array_enc << 0x2026 # 0xe2 0x80 0xa6
743
+ when 0x86; array_enc << 0x2020 # 0xe2 0x80 0xa0
744
+ when 0x87; array_enc << 0x2021 # 0xe2 0x80 0xa1
745
+ when 0x88; array_enc << 0x02C6 # 0xcb 0x86
746
+ when 0x89; array_enc << 0x2030 # 0xe2 0x80 0xb0
747
+ when 0x8A; array_enc << 0x0160 # 0xc5 0xa0
748
+ when 0x8B; array_enc << 0x2039 # 0xe2 0x80 0xb9
749
+ when 0x8C; array_enc << 0x0152 # 0xc5 0x92
750
+ when 0x8E; array_enc << 0x017D # 0xc5 0xbd
751
+ when 0x91; array_enc << 0x2018 # 0xe2 0x80 0x98
752
+ when 0x92; array_enc << 0x2019 # 0xe2 0x80 0x99
753
+ when 0x93; array_enc << 0x201C
754
+ when 0x94; array_enc << 0x201D
755
+ when 0x95; array_enc << 0x2022
756
+ when 0x96; array_enc << 0x2013
757
+ when 0x97; array_enc << 0x2014
758
+ when 0x98; array_enc << 0x02DC
759
+ when 0x99; array_enc << 0x2122
760
+ when 0x9A; array_enc << 0x0161
761
+ when 0x9B; array_enc << 0x203A
762
+ when 0x9C; array_enc << 0x0152 # 0xc5 0x93
763
+ when 0x9E; array_enc << 0x017E # 0xc5 0xbe
764
+ when 0x9F; array_enc << 0x0178
765
+ else
766
+ array_enc << num
767
+ end
768
+ end
769
+
770
+ # convert any glyph names to unicode codepoints
771
+ array_enc = self.process_glyphnames(array_enc)
772
+
773
+ # pack all our Unicode codepoints into a UTF-8 string
774
+ ret = array_enc.pack("U*")
775
+
776
+ # set the strings encoding correctly under ruby 1.9+
777
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
778
+
779
+ return ret
780
+ end
781
+ end
782
+
783
+ class ZapfDingbatsEncoding < Encoding
784
+ # convert a ZapfDingbatsEncoding string into UTF-8
785
+ def to_utf8(str, tounicode = nil)
786
+ # mapping to unicode taken from:
787
+ # http://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt
788
+ array_symbol = str.unpack('C*')
789
+ array_symbol = self.process_differences(array_symbol)
790
+ array_enc = []
791
+ array_symbol.each do |num|
792
+ case num
793
+ when 0x21; array_enc << 0x2701
794
+ when 0x22; array_enc << 0x2702
795
+ when 0x23; array_enc << 0x2703
796
+ when 0x24; array_enc << 0x2704
797
+ when 0x25; array_enc << 0x260E
798
+ when 0x26; array_enc << 0x2706
799
+ when 0x27; array_enc << 0x2707
800
+ when 0x28; array_enc << 0x2708
801
+ when 0x29; array_enc << 0x2709
802
+ when 0x2A; array_enc << 0x261B
803
+ when 0x2B; array_enc << 0x261E
804
+ when 0x2C; array_enc << 0x270C
805
+ when 0x2D; array_enc << 0x270D
806
+ when 0x2E; array_enc << 0x270E
807
+ when 0x2F; array_enc << 0x270F
808
+ when 0x30; array_enc << 0x2710
809
+ when 0x31; array_enc << 0x2711
810
+ when 0x32; array_enc << 0x2712
811
+ when 0x33; array_enc << 0x2713
812
+ when 0x34; array_enc << 0x2714
813
+ when 0x35; array_enc << 0x2715
814
+ when 0x36; array_enc << 0x2716
815
+ when 0x37; array_enc << 0x2717
816
+ when 0x38; array_enc << 0x2718
817
+ when 0x39; array_enc << 0x2719
818
+ when 0x3A; array_enc << 0x271A
819
+ when 0x3B; array_enc << 0x271B
820
+ when 0x3C; array_enc << 0x271C
821
+ when 0x3D; array_enc << 0x271D
822
+ when 0x3E; array_enc << 0x271E
823
+ when 0x3F; array_enc << 0x271E
824
+ when 0x40; array_enc << 0x2720
825
+ when 0x41; array_enc << 0x2721
826
+ when 0x42; array_enc << 0x2722
827
+ when 0x43; array_enc << 0x2723
828
+ when 0x44; array_enc << 0x2724
829
+ when 0x45; array_enc << 0x2725
830
+ when 0x46; array_enc << 0x2726
831
+ when 0x47; array_enc << 0x2727
832
+ when 0x48; array_enc << 0x2605
833
+ when 0x49; array_enc << 0x2729
834
+ when 0x4A; array_enc << 0x272A
835
+ when 0x4B; array_enc << 0x272B
836
+ when 0x4C; array_enc << 0x272C
837
+ when 0x4D; array_enc << 0x272D
838
+ when 0x4E; array_enc << 0x272E
839
+ when 0x4F; array_enc << 0x272F
840
+ when 0x50; array_enc << 0x2730
841
+ when 0x51; array_enc << 0x2731
842
+ when 0x52; array_enc << 0x2732
843
+ when 0x53; array_enc << 0x2733
844
+ when 0x54; array_enc << 0x2734
845
+ when 0x55; array_enc << 0x2735
846
+ when 0x56; array_enc << 0x2736
847
+ when 0x57; array_enc << 0x2737
848
+ when 0x58; array_enc << 0x2738
849
+ when 0x59; array_enc << 0x2739
850
+ when 0x5A; array_enc << 0x273A
851
+ when 0x5B; array_enc << 0x273B
852
+ when 0x5C; array_enc << 0x273C
853
+ when 0x5D; array_enc << 0x273D
854
+ when 0x5E; array_enc << 0x273E
855
+ when 0x5F; array_enc << 0x273F
856
+ when 0x60; array_enc << 0x2740
857
+ when 0x61; array_enc << 0x2741
858
+ when 0x62; array_enc << 0x2742
859
+ when 0x63; array_enc << 0x2743
860
+ when 0x64; array_enc << 0x2744
861
+ when 0x65; array_enc << 0x2745
862
+ when 0x66; array_enc << 0x2746
863
+ when 0x67; array_enc << 0x2747
864
+ when 0x68; array_enc << 0x2748
865
+ when 0x69; array_enc << 0x2749
866
+ when 0x6A; array_enc << 0x274A
867
+ when 0x6B; array_enc << 0x274B
868
+ when 0x6C; array_enc << 0x25CF
869
+ when 0x6D; array_enc << 0x274D
870
+ when 0x6E; array_enc << 0x25A0
871
+ when 0x6F; array_enc << 0x274F
872
+ when 0x70; array_enc << 0x2750
873
+ when 0x71; array_enc << 0x2751
874
+ when 0x72; array_enc << 0x2752
875
+ when 0x73; array_enc << 0x2753
876
+ when 0x74; array_enc << 0x2754
877
+ when 0x75; array_enc << 0x2755
878
+ when 0x76; array_enc << 0x2756
879
+ when 0x77; array_enc << 0x2757
880
+ when 0x78; array_enc << 0x2758
881
+ when 0x79; array_enc << 0x2759
882
+ when 0x7A; array_enc << 0x275A
883
+ when 0x7B; array_enc << 0x275B
884
+ when 0x7C; array_enc << 0x275C
885
+ when 0x7D; array_enc << 0x275D
886
+ when 0x7E; array_enc << 0x275E
887
+ when 0x80; array_enc << 0xF8D7
888
+ when 0x81; array_enc << 0xF8D8
889
+ when 0x82; array_enc << 0xF8D9
890
+ when 0x83; array_enc << 0xF8DA
891
+ when 0x84; array_enc << 0xF8DB
892
+ when 0x85; array_enc << 0xF8DC
893
+ when 0x86; array_enc << 0xF8DD
894
+ when 0x87; array_enc << 0xF8DE
895
+ when 0x88; array_enc << 0xF8DF
896
+ when 0x89; array_enc << 0xF8E0
897
+ when 0x8A; array_enc << 0xF8E1
898
+ when 0x8B; array_enc << 0xF8E2
899
+ when 0x8C; array_enc << 0xF8E3
900
+ when 0x8D; array_enc << 0xF8E4
901
+ when 0xA1; array_enc << 0x2761
902
+ when 0xA2; array_enc << 0x2762
903
+ when 0xA3; array_enc << 0x2763
904
+ when 0xA4; array_enc << 0x2764
905
+ when 0xA5; array_enc << 0x2765
906
+ when 0xA6; array_enc << 0x2766
907
+ when 0xA7; array_enc << 0x2767
908
+ when 0xA8; array_enc << 0x2663
909
+ when 0xA9; array_enc << 0x2666
910
+ when 0xAA; array_enc << 0x2665
911
+ when 0xAB; array_enc << 0x2660
912
+ when 0xAC; array_enc << 0x2460
913
+ when 0xAD; array_enc << 0x2461
914
+ when 0xAE; array_enc << 0x2462
915
+ when 0xAF; array_enc << 0x2463
916
+ when 0xB0; array_enc << 0x2464
917
+ when 0xB1; array_enc << 0x2465
918
+ when 0xB2; array_enc << 0x2466
919
+ when 0xB3; array_enc << 0x2467
920
+ when 0xB4; array_enc << 0x2468
921
+ when 0xB5; array_enc << 0x2469
922
+ when 0xB6; array_enc << 0x2776
923
+ when 0xB7; array_enc << 0x2777
924
+ when 0xB8; array_enc << 0x2778
925
+ when 0xB9; array_enc << 0x2779
926
+ when 0xBA; array_enc << 0x277A
927
+ when 0xBB; array_enc << 0x277B
928
+ when 0xBC; array_enc << 0x277C
929
+ when 0xBD; array_enc << 0x277D
930
+ when 0xBE; array_enc << 0x277E
931
+ when 0xBF; array_enc << 0x277F
932
+ when 0xC0; array_enc << 0x2780
933
+ when 0xC1; array_enc << 0x2781
934
+ when 0xC2; array_enc << 0x2782
935
+ when 0xC3; array_enc << 0x2783
936
+ when 0xC4; array_enc << 0x2784
937
+ when 0xC5; array_enc << 0x2785
938
+ when 0xC6; array_enc << 0x2786
939
+ when 0xC7; array_enc << 0x2787
940
+ when 0xC8; array_enc << 0x2788
941
+ when 0xC9; array_enc << 0x2789
942
+ when 0xCA; array_enc << 0x278A
943
+ when 0xCB; array_enc << 0x278B
944
+ when 0xCC; array_enc << 0x278C
945
+ when 0xCD; array_enc << 0x278D
946
+ when 0xCE; array_enc << 0x278E
947
+ when 0xCF; array_enc << 0x278F
948
+ when 0xD0; array_enc << 0x2790
949
+ when 0xD1; array_enc << 0x2791
950
+ when 0xD2; array_enc << 0x2792
951
+ when 0xD3; array_enc << 0x2793
952
+ when 0xD4; array_enc << 0x2794
953
+ when 0xD5; array_enc << 0x2795
954
+ when 0xD6; array_enc << 0x2796
955
+ when 0xD7; array_enc << 0x2797
956
+ when 0xD8; array_enc << 0x2798
957
+ when 0xD9; array_enc << 0x2799
958
+ when 0xDA; array_enc << 0x279A
959
+ when 0xDB; array_enc << 0x279B
960
+ when 0xDC; array_enc << 0x279C
961
+ when 0xDD; array_enc << 0x279D
962
+ when 0xDE; array_enc << 0x279E
963
+ when 0xDF; array_enc << 0x279F
964
+ when 0xE0; array_enc << 0x27A0
965
+ when 0xE1; array_enc << 0x27A1
966
+ when 0xE2; array_enc << 0x27A2
967
+ when 0xE3; array_enc << 0x27A3
968
+ when 0xE4; array_enc << 0x27A4
969
+ when 0xE5; array_enc << 0x27A5
970
+ when 0xE6; array_enc << 0x27A6
971
+ when 0xE7; array_enc << 0x27A7
972
+ when 0xE8; array_enc << 0x27A8
973
+ when 0xE9; array_enc << 0x27A9
974
+ when 0xEA; array_enc << 0x27AA
975
+ when 0xEB; array_enc << 0x27AB
976
+ when 0xEC; array_enc << 0x27AC
977
+ when 0xED; array_enc << 0x27AD
978
+ when 0xEE; array_enc << 0x27AE
979
+ when 0xEF; array_enc << 0x27AF
980
+ when 0xF1; array_enc << 0x27B1
981
+ when 0xF2; array_enc << 0x27B2
982
+ when 0xF3; array_enc << 0x27B3
983
+ when 0xF4; array_enc << 0x27B4
984
+ when 0xF5; array_enc << 0x27B5
985
+ when 0xF6; array_enc << 0x27B6
986
+ when 0xF7; array_enc << 0x27B7
987
+ when 0xF8; array_enc << 0x27B8
988
+ when 0xF9; array_enc << 0x27B9
989
+ when 0xFA; array_enc << 0x27BA
990
+ when 0xFB; array_enc << 0x27BB
991
+ when 0xFC; array_enc << 0x27BC
992
+ when 0xFD; array_enc << 0x27BD
993
+ when 0xFE; array_enc << 0x27BE
994
+ else
995
+ array_enc << num
996
+ end
997
+ end
998
+
999
+ # convert any glyph names to unicode codepoints
1000
+ array_enc = self.process_glyphnames(array_enc)
1001
+
1002
+ # pack all our Unicode codepoints into a UTF-8 string
1003
+ ret = array_enc.pack("U*")
1004
+
1005
+ # set the strings encoding correctly under ruby 1.9+
1006
+ ret.force_encoding("UTF-8") if ret.respond_to?(:force_encoding)
1007
+
1008
+ return ret
1009
+ end
1010
+ end
1011
+ end
1012
+ end