encoding-codepage 0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE.MIT ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2012 Conrad Irwin <conrad.irwin@gmail.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
data/LICENSE.MLPL ADDED
@@ -0,0 +1,18 @@
1
+ © 2010 Microsoft Corporation. All rights reserved.
2
+
3
+ 1. Definitions
4
+ The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the same meaning here as under U.S. copyright law.
5
+ A “contribution” is the original software, or any additions or changes to the software.
6
+ A “contributor” is any person that distributes its contribution under this license.
7
+ “Licensed patents” are a contributor’s patent claims that read directly on its contribution.
8
+ 2. Grant of Rights
9
+ (A) Copyright Grant - Subject to the terms of this license, including the license conditions and limitations in section 3, each contributor grants you a non-exclusive, worldwide, royalty-free copyright license to reproduce its contribution, prepare derivative works of its contribution, and distribute its contribution or any derivative works that you create.
10
+ (B) Patent Grant - Subject to the terms of this license, including the license conditions and limitations in section 3, each contributor grants you a non-exclusive, worldwide, royalty-free license under its licensed patents to make, have made, use, sell, offer for sale, import, and/or otherwise dispose of its contribution in the software or derivative works of the contribution in the software.
11
+ 3. Conditions and Limitations
12
+ (A) No Trademark License- This license does not grant you rights to use any contributors’ name, logo, or trademarks.
13
+ (B) If you bring a patent claim against any contributor over patents that you claim are infringed by the software, your patent license from such contributor to the software ends automatically.
14
+ (C) If you distribute any portion of the software, you must retain all copyright, patent, trademark, and attribution notices that are present in the software.
15
+ (D) If you distribute any portion of the software in source code form, you may do so only under this license by including a complete copy of this license with your distribution. If you distribute any portion of the software in compiled or object code form, you may only do so under a license that complies with this license.
16
+ (E) The software is licensed “as-is.” You bear the risk of using it. The contributors give no express warranties, guarantees or conditions. You may have additional consumer rights under your local laws which this license cannot change. To the extent permitted under your local laws, the contributors exclude the implied warranties of merchantability, fitness for a particular purpose and non-infringement.
17
+ (F) Platform Limitation - The licenses granted in sections 2(A) and 2(B) extend only to the software or derivative works that you create that run on a Microsoft Windows operating system product.
18
+
data/README.md ADDED
@@ -0,0 +1,217 @@
1
+ The encoding-codepage gem adds a little bit of sugar for dealing with Microsoft Code Page
2
+ Identifiers instead of Encoding names. Importantly, it does not add any new encodings, it
3
+ just adds new names for existing encodings.
4
+
5
+ To install:
6
+ gem install encoding-codepage
7
+
8
+ To use:
9
+ require 'encoding-codepage'
10
+
11
+ (If you're using Bundler, you can just add `gem 'encoding-codepage'` to your Gemfile)
12
+
13
+
14
+ Features
15
+ ========
16
+
17
+ Adds three methods to the `Encoding` class:
18
+
19
+ For looking up encodings by their Code Page Identifier:
20
+
21
+ Encoding.codepage(28591)
22
+ # => #<Encoding:CP28591>
23
+
24
+ For seeing whether encodings exist:
25
+
26
+ Encoding.exist?("CP28591")
27
+ # => #<Encoding:CP28591>
28
+
29
+ Encoding.exist?("CP37")
30
+ # => nil
31
+
32
+ For seeing whether code-pages exist:
33
+
34
+ Encoding.codepage?(28591)
35
+ # => #<Encoding:CP28591>
36
+
37
+ Encoding.codepage?(37)
38
+ # => nil
39
+
40
+ Also makes all supported Code Pages available with a `CP` prefix:
41
+
42
+ Encoding::CP28591
43
+ # => #<Encoding:CP28591>
44
+
45
+ Encoding.find("CP28591")
46
+ # => #<Encoding:CP28591>
47
+
48
+ Encodings
49
+ =========
50
+
51
+ After installing this gem, you'll be able to access the following Code Pages from Ruby:
52
+
53
+ CP437 => IBM437 # OEM United States
54
+ CP737 => IBM737 # OEM Greek (formerly 437G); Greek (DOS)
55
+ CP775 => IBM775 # OEM Baltic; Baltic (DOS)
56
+ CP850 => IBM850 # OEM Multilingual Latin 1; Western European (DOS)
57
+ CP852 => IBM852 # OEM Latin 2; Central European (DOS)
58
+ CP855 => IBM855 # OEM Cyrillic (primarily Russian)
59
+ CP857 => IBM857 # OEM Turkish; Turkish (DOS)
60
+ CP860 => IBM860 # OEM Portuguese; Portuguese (DOS)
61
+ CP861 => IBM861 # OEM Icelandic; Icelandic (DOS)
62
+ CP862 => DOS-862 # OEM Hebrew; Hebrew (DOS)
63
+ CP863 => IBM863 # OEM French Canadian; French Canadian (DOS)
64
+ CP864 => IBM864 # OEM Arabic; Arabic (864)
65
+ CP865 => IBM865 # OEM Nordic; Nordic (DOS)
66
+ CP866 => CP866 # OEM Russian; Cyrillic (DOS)
67
+ CP869 => IBM869 # OEM Modern Greek; Greek, Modern (DOS)
68
+ CP874 => WINDOWS-874 # ANSI/OEM Thai (same as 28605, ISO 8859-15); Thai (Windows)
69
+ CP932 => SHIFT_JIS # ANSI/OEM Japanese; Japanese (Shift-JIS)
70
+ CP936 => GB2312 # ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)
71
+ CP949 => KS_C_5601-1987 # ANSI/OEM Korean (Unified Hangul Code)
72
+ CP950 => BIG5 # ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)
73
+ CP1200 => UTF-16 # Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
74
+ CP1250 => WINDOWS-1250 # ANSI Central European; Central European (Windows)
75
+ CP1251 => WINDOWS-1251 # ANSI Cyrillic; Cyrillic (Windows)
76
+ CP1252 => WINDOWS-1252 # ANSI Latin 1; Western European (Windows)
77
+ CP1253 => WINDOWS-1253 # ANSI Greek; Greek (Windows)
78
+ CP1254 => WINDOWS-1254 # ANSI Turkish; Turkish (Windows)
79
+ CP1255 => WINDOWS-1255 # ANSI Hebrew; Hebrew (Windows)
80
+ CP1256 => WINDOWS-1256 # ANSI Arabic; Arabic (Windows)
81
+ CP1257 => WINDOWS-1257 # ANSI Baltic; Baltic (Windows)
82
+ CP1258 => WINDOWS-1258 # ANSI/OEM Vietnamese; Vietnamese (Windows)
83
+ CP12000 => UTF-32 # Unicode UTF-32, little endian byte order; available only to managed applications
84
+ CP12001 => UTF-32BE # Unicode UTF-32, big endian byte order; available only to managed applications
85
+ CP20127 => US-ASCII # US-ASCII (7-bit)
86
+ CP20866 => KOI8-R # Russian (KOI8-R); Cyrillic (KOI8-R)
87
+ CP20932 => EUC-JP # Japanese (JIS 0208-1990 and 0121-1990)
88
+ CP21866 => KOI8-U # Ukrainian (KOI8-U); Cyrillic (KOI8-U)
89
+ CP28591 => ISO-8859-1 # ISO 8859-1 Latin 1; Western European (ISO)
90
+ CP28592 => ISO-8859-2 # ISO 8859-2 Central European; Central European (ISO)
91
+ CP28593 => ISO-8859-3 # ISO 8859-3 Latin 3
92
+ CP28594 => ISO-8859-4 # ISO 8859-4 Baltic
93
+ CP28595 => ISO-8859-5 # ISO 8859-5 Cyrillic
94
+ CP28596 => ISO-8859-6 # ISO 8859-6 Arabic
95
+ CP28597 => ISO-8859-7 # ISO 8859-7 Greek
96
+ CP28598 => ISO-8859-8 # ISO 8859-8 Hebrew; Hebrew (ISO-Visual)
97
+ CP28599 => ISO-8859-9 # ISO 8859-9 Turkish
98
+ CP28603 => ISO-8859-13 # ISO 8859-13 Estonian
99
+ CP28605 => ISO-8859-15 # ISO 8859-15 Latin 9
100
+ CP50220 => ISO-2022-JP # ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)
101
+ CP50221 => CSISO2022JP # ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)
102
+ CP50222 => ISO-2022-JP # ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)
103
+ CP51932 => EUC-JP # EUC Japanese
104
+ CP51936 => EUC-CN # EUC Simplified Chinese; Chinese Simplified (EUC)
105
+ CP51949 => EUC-KR # EUC Korean
106
+ CP54936 => GB18030 # Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)
107
+ CP65000 => UTF-7 # Unicode (UTF-7)
108
+ CP65001 => UTF-8 # Unicode (UTF-8)
109
+
110
+ The following code pages are known not to be supported:
111
+
112
+ CP37 => IBM037 # IBM EBCDIC US-Canada
113
+ CP500 => IBM500 # IBM EBCDIC International
114
+ CP708 => ASMO-708 # Arabic (ASMO 708)
115
+ CP709 => # Arabic (ASMO-449+, BCON V4)
116
+ CP710 => # Arabic - Transparent Arabic
117
+ CP720 => DOS-720 # Arabic (Transparent ASMO); Arabic (DOS)
118
+ CP858 => IBM00858 # OEM Multilingual Latin 1 + Euro symbol
119
+ CP870 => IBM870 # IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
120
+ CP875 => CP875 # IBM EBCDIC Greek Modern
121
+ CP1026 => IBM1026 # IBM EBCDIC Turkish (Latin 5)
122
+ CP1047 => IBM01047 # IBM EBCDIC Latin 1/Open System
123
+ CP1140 => IBM01140 # IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)
124
+ CP1141 => IBM01141 # IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
125
+ CP1142 => IBM01142 # IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)
126
+ CP1143 => IBM01143 # IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)
127
+ CP1144 => IBM01144 # IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)
128
+ CP1145 => IBM01145 # IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)
129
+ CP1146 => IBM01146 # IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)
130
+ CP1147 => IBM01147 # IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)
131
+ CP1148 => IBM01148 # IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)
132
+ CP1149 => IBM01149 # IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)
133
+ CP1201 => UNICODEFFFE # Unicode UTF-16, big endian byte order; available only to managed applications
134
+ CP1361 => JOHAB # Korean (Johab)
135
+ CP10000 => MACINTOSH # MAC Roman; Western European (Mac)
136
+ CP10001 => X-MAC-JAPANESE # Japanese (Mac)
137
+ CP10002 => X-MAC-CHINESETRAD # MAC Traditional Chinese (Big5); Chinese Traditional (Mac)
138
+ CP10003 => X-MAC-KOREAN # Korean (Mac)
139
+ CP10004 => X-MAC-ARABIC # Arabic (Mac)
140
+ CP10005 => X-MAC-HEBREW # Hebrew (Mac)
141
+ CP10006 => X-MAC-GREEK # Greek (Mac)
142
+ CP10007 => X-MAC-CYRILLIC # Cyrillic (Mac)
143
+ CP10008 => X-MAC-CHINESESIMP # MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)
144
+ CP10010 => X-MAC-ROMANIAN # Romanian (Mac)
145
+ CP10017 => X-MAC-UKRAINIAN # Ukrainian (Mac)
146
+ CP10021 => X-MAC-THAI # Thai (Mac)
147
+ CP10029 => X-MAC-CE # MAC Latin 2; Central European (Mac)
148
+ CP10079 => X-MAC-ICELANDIC # Icelandic (Mac)
149
+ CP10081 => X-MAC-TURKISH # Turkish (Mac)
150
+ CP10082 => X-MAC-CROATIAN # Croatian (Mac)
151
+ CP20000 => X-CHINESE_CNS # CNS Taiwan; Chinese Traditional (CNS)
152
+ CP20001 => X-CP20001 # TCA Taiwan
153
+ CP20002 => X_CHINESE-ETEN # Eten Taiwan; Chinese Traditional (Eten)
154
+ CP20003 => X-CP20003 # IBM5550 Taiwan
155
+ CP20004 => X-CP20004 # TeleText Taiwan
156
+ CP20005 => X-CP20005 # Wang Taiwan
157
+ CP20105 => X-IA5 # IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)
158
+ CP20106 => X-IA5-GERMAN # IA5 German (7-bit)
159
+ CP20107 => X-IA5-SWEDISH # IA5 Swedish (7-bit)
160
+ CP20108 => X-IA5-NORWEGIAN # IA5 Norwegian (7-bit)
161
+ CP20261 => X-CP20261 # T.61
162
+ CP20269 => X-CP20269 # ISO 6937 Non-Spacing Accent
163
+ CP20273 => IBM273 # IBM EBCDIC Germany
164
+ CP20277 => IBM277 # IBM EBCDIC Denmark-Norway
165
+ CP20278 => IBM278 # IBM EBCDIC Finland-Sweden
166
+ CP20280 => IBM280 # IBM EBCDIC Italy
167
+ CP20284 => IBM284 # IBM EBCDIC Latin America-Spain
168
+ CP20285 => IBM285 # IBM EBCDIC United Kingdom
169
+ CP20290 => IBM290 # IBM EBCDIC Japanese Katakana Extended
170
+ CP20297 => IBM297 # IBM EBCDIC France
171
+ CP20420 => IBM420 # IBM EBCDIC Arabic
172
+ CP20423 => IBM423 # IBM EBCDIC Greek
173
+ CP20424 => IBM424 # IBM EBCDIC Hebrew
174
+ CP20833 => X-EBCDIC-KOREANEXTENDED # IBM EBCDIC Korean Extended
175
+ CP20838 => IBM-THAI # IBM EBCDIC Thai
176
+ CP20871 => IBM871 # IBM EBCDIC Icelandic
177
+ CP20880 => IBM880 # IBM EBCDIC Cyrillic Russian
178
+ CP20905 => IBM905 # IBM EBCDIC Turkish
179
+ CP20924 => IBM00924 # IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
180
+ CP20936 => X-CP20936 # Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)
181
+ CP20949 => X-CP20949 # Korean Wansung
182
+ CP21025 => CP1025 # IBM EBCDIC Cyrillic Serbian-Bulgarian
183
+ CP21027 => # (deprecated)
184
+ CP29001 => X-EUROPA # Europa 3
185
+ CP38598 => ISO-8859-8-I # ISO 8859-8 Hebrew; Hebrew (ISO-Logical)
186
+ CP50225 => ISO-2022-KR # ISO 2022 Korean
187
+ CP50227 => X-CP50227 # ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)
188
+ CP50229 => # ISO 2022 Traditional Chinese
189
+ CP50930 => # EBCDIC Japanese (Katakana) Extended
190
+ CP50931 => # EBCDIC US-Canada and Japanese
191
+ CP50933 => # EBCDIC Korean Extended and Korean
192
+ CP50935 => # EBCDIC Simplified Chinese Extended and Simplified Chinese
193
+ CP50936 => # EBCDIC Simplified Chinese
194
+ CP50937 => # EBCDIC US-Canada and Traditional Chinese
195
+ CP50939 => # EBCDIC Japanese (Latin) Extended and Japanese
196
+ CP51950 => # EUC Traditional Chinese
197
+ CP52936 => HZ-GB-2312 # HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)
198
+ CP57002 => X-ISCII-DE # ISCII Devanagari
199
+ CP57003 => X-ISCII-BE # ISCII Bengali
200
+ CP57004 => X-ISCII-TA # ISCII Tamil
201
+ CP57005 => X-ISCII-TE # ISCII Telugu
202
+ CP57006 => X-ISCII-AS # ISCII Assamese
203
+ CP57007 => X-ISCII-OR # ISCII Oriya
204
+ CP57008 => X-ISCII-KA # ISCII Kannada
205
+ CP57009 => X-ISCII-MA # ISCII Malayalam
206
+ CP57010 => X-ISCII-GU # ISCII Gujarati
207
+ CP57011 => X-ISCII-PA # ISCII Punjabi
208
+
209
+ Original list from: http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx
210
+
211
+ Meta-foo
212
+ =======
213
+
214
+ Code licensed under the MIT license, see LICENSE.MIT for details.
215
+
216
+ List licensed under the Microsoft Limited Public License, see LICENSE.MLPL for details.
217
+
data/lib/codepages.tsv ADDED
@@ -0,0 +1,154 @@
1
+ # Copied and pasted from http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
2
+ # Code Page Identifiers
3
+ 037 IBM037 IBM EBCDIC US-Canada
4
+ 437 IBM437 OEM United States
5
+ 500 IBM500 IBM EBCDIC International
6
+ 708 ASMO-708 Arabic (ASMO 708)
7
+ 709 Arabic (ASMO-449+, BCON V4)
8
+ 710 Arabic - Transparent Arabic
9
+ 720 DOS-720 Arabic (Transparent ASMO); Arabic (DOS)
10
+ 737 ibm737 OEM Greek (formerly 437G); Greek (DOS)
11
+ 775 ibm775 OEM Baltic; Baltic (DOS)
12
+ 850 ibm850 OEM Multilingual Latin 1; Western European (DOS)
13
+ 852 ibm852 OEM Latin 2; Central European (DOS)
14
+ 855 IBM855 OEM Cyrillic (primarily Russian)
15
+ 857 ibm857 OEM Turkish; Turkish (DOS)
16
+ 858 IBM00858 OEM Multilingual Latin 1 + Euro symbol
17
+ 860 IBM860 OEM Portuguese; Portuguese (DOS)
18
+ 861 ibm861 OEM Icelandic; Icelandic (DOS)
19
+ 862 DOS-862 OEM Hebrew; Hebrew (DOS)
20
+ 863 IBM863 OEM French Canadian; French Canadian (DOS)
21
+ 864 IBM864 OEM Arabic; Arabic (864)
22
+ 865 IBM865 OEM Nordic; Nordic (DOS)
23
+ 866 cp866 OEM Russian; Cyrillic (DOS)
24
+ 869 ibm869 OEM Modern Greek; Greek, Modern (DOS)
25
+ 870 IBM870 IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
26
+ 874 windows-874 ANSI/OEM Thai (same as 28605, ISO 8859-15); Thai (Windows)
27
+ 875 cp875 IBM EBCDIC Greek Modern
28
+ 932 shift_jis ANSI/OEM Japanese; Japanese (Shift-JIS)
29
+ 936 gb2312 ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)
30
+ 949 ks_c_5601-1987 ANSI/OEM Korean (Unified Hangul Code)
31
+ 950 big5 ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)
32
+ 1026 IBM1026 IBM EBCDIC Turkish (Latin 5)
33
+ 1047 IBM01047 IBM EBCDIC Latin 1/Open System
34
+ 1140 IBM01140 IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)
35
+ 1141 IBM01141 IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
36
+ 1142 IBM01142 IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)
37
+ 1143 IBM01143 IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)
38
+ 1144 IBM01144 IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)
39
+ 1145 IBM01145 IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)
40
+ 1146 IBM01146 IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)
41
+ 1147 IBM01147 IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)
42
+ 1148 IBM01148 IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)
43
+ 1149 IBM01149 IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)
44
+ 1200 utf-16 Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
45
+ 1201 unicodeFFFE Unicode UTF-16, big endian byte order; available only to managed applications
46
+ 1250 windows-1250 ANSI Central European; Central European (Windows)
47
+ 1251 windows-1251 ANSI Cyrillic; Cyrillic (Windows)
48
+ 1252 windows-1252 ANSI Latin 1; Western European (Windows)
49
+ 1253 windows-1253 ANSI Greek; Greek (Windows)
50
+ 1254 windows-1254 ANSI Turkish; Turkish (Windows)
51
+ 1255 windows-1255 ANSI Hebrew; Hebrew (Windows)
52
+ 1256 windows-1256 ANSI Arabic; Arabic (Windows)
53
+ 1257 windows-1257 ANSI Baltic; Baltic (Windows)
54
+ 1258 windows-1258 ANSI/OEM Vietnamese; Vietnamese (Windows)
55
+ 1361 Johab Korean (Johab)
56
+ 10000 macintosh MAC Roman; Western European (Mac)
57
+ 10001 x-mac-japanese Japanese (Mac)
58
+ 10002 x-mac-chinesetrad MAC Traditional Chinese (Big5); Chinese Traditional (Mac)
59
+ 10003 x-mac-korean Korean (Mac)
60
+ 10004 x-mac-arabic Arabic (Mac)
61
+ 10005 x-mac-hebrew Hebrew (Mac)
62
+ 10006 x-mac-greek Greek (Mac)
63
+ 10007 x-mac-cyrillic Cyrillic (Mac)
64
+ 10008 x-mac-chinesesimp MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)
65
+ 10010 x-mac-romanian Romanian (Mac)
66
+ 10017 x-mac-ukrainian Ukrainian (Mac)
67
+ 10021 x-mac-thai Thai (Mac)
68
+ 10029 x-mac-ce MAC Latin 2; Central European (Mac)
69
+ 10079 x-mac-icelandic Icelandic (Mac)
70
+ 10081 x-mac-turkish Turkish (Mac)
71
+ 10082 x-mac-croatian Croatian (Mac)
72
+ 12000 utf-32 Unicode UTF-32, little endian byte order; available only to managed applications
73
+ 12001 utf-32BE Unicode UTF-32, big endian byte order; available only to managed applications
74
+ 20000 x-Chinese_CNS CNS Taiwan; Chinese Traditional (CNS)
75
+ 20001 x-cp20001 TCA Taiwan
76
+ 20002 x_Chinese-Eten Eten Taiwan; Chinese Traditional (Eten)
77
+ 20003 x-cp20003 IBM5550 Taiwan
78
+ 20004 x-cp20004 TeleText Taiwan
79
+ 20005 x-cp20005 Wang Taiwan
80
+ 20105 x-IA5 IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)
81
+ 20106 x-IA5-German IA5 German (7-bit)
82
+ 20107 x-IA5-Swedish IA5 Swedish (7-bit)
83
+ 20108 x-IA5-Norwegian IA5 Norwegian (7-bit)
84
+ 20127 us-ascii US-ASCII (7-bit)
85
+ 20261 x-cp20261 T.61
86
+ 20269 x-cp20269 ISO 6937 Non-Spacing Accent
87
+ 20273 IBM273 IBM EBCDIC Germany
88
+ 20277 IBM277 IBM EBCDIC Denmark-Norway
89
+ 20278 IBM278 IBM EBCDIC Finland-Sweden
90
+ 20280 IBM280 IBM EBCDIC Italy
91
+ 20284 IBM284 IBM EBCDIC Latin America-Spain
92
+ 20285 IBM285 IBM EBCDIC United Kingdom
93
+ 20290 IBM290 IBM EBCDIC Japanese Katakana Extended
94
+ 20297 IBM297 IBM EBCDIC France
95
+ 20420 IBM420 IBM EBCDIC Arabic
96
+ 20423 IBM423 IBM EBCDIC Greek
97
+ 20424 IBM424 IBM EBCDIC Hebrew
98
+ 20833 x-EBCDIC-KoreanExtended IBM EBCDIC Korean Extended
99
+ 20838 IBM-Thai IBM EBCDIC Thai
100
+ 20866 koi8-r Russian (KOI8-R); Cyrillic (KOI8-R)
101
+ 20871 IBM871 IBM EBCDIC Icelandic
102
+ 20880 IBM880 IBM EBCDIC Cyrillic Russian
103
+ 20905 IBM905 IBM EBCDIC Turkish
104
+ 20924 IBM00924 IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
105
+ 20932 EUC-JP Japanese (JIS 0208-1990 and 0121-1990)
106
+ 20936 x-cp20936 Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)
107
+ 20949 x-cp20949 Korean Wansung
108
+ 21025 cp1025 IBM EBCDIC Cyrillic Serbian-Bulgarian
109
+ 21027 (deprecated)
110
+ 21866 koi8-u Ukrainian (KOI8-U); Cyrillic (KOI8-U)
111
+ 28591 iso-8859-1 ISO 8859-1 Latin 1; Western European (ISO)
112
+ 28592 iso-8859-2 ISO 8859-2 Central European; Central European (ISO)
113
+ 28593 iso-8859-3 ISO 8859-3 Latin 3
114
+ 28594 iso-8859-4 ISO 8859-4 Baltic
115
+ 28595 iso-8859-5 ISO 8859-5 Cyrillic
116
+ 28596 iso-8859-6 ISO 8859-6 Arabic
117
+ 28597 iso-8859-7 ISO 8859-7 Greek
118
+ 28598 iso-8859-8 ISO 8859-8 Hebrew; Hebrew (ISO-Visual)
119
+ 28599 iso-8859-9 ISO 8859-9 Turkish
120
+ 28603 iso-8859-13 ISO 8859-13 Estonian
121
+ 28605 iso-8859-15 ISO 8859-15 Latin 9
122
+ 29001 x-Europa Europa 3
123
+ 38598 iso-8859-8-i ISO 8859-8 Hebrew; Hebrew (ISO-Logical)
124
+ 50220 iso-2022-jp ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)
125
+ 50221 csISO2022JP ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)
126
+ 50222 iso-2022-jp ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)
127
+ 50225 iso-2022-kr ISO 2022 Korean
128
+ 50227 x-cp50227 ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)
129
+ 50229 ISO 2022 Traditional Chinese
130
+ 50930 EBCDIC Japanese (Katakana) Extended
131
+ 50931 EBCDIC US-Canada and Japanese
132
+ 50933 EBCDIC Korean Extended and Korean
133
+ 50935 EBCDIC Simplified Chinese Extended and Simplified Chinese
134
+ 50936 EBCDIC Simplified Chinese
135
+ 50937 EBCDIC US-Canada and Traditional Chinese
136
+ 50939 EBCDIC Japanese (Latin) Extended and Japanese
137
+ 51932 euc-jp EUC Japanese
138
+ 51936 EUC-CN EUC Simplified Chinese; Chinese Simplified (EUC)
139
+ 51949 euc-kr EUC Korean
140
+ 51950 EUC Traditional Chinese
141
+ 52936 hz-gb-2312 HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)
142
+ 54936 GB18030 Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)
143
+ 57002 x-iscii-de ISCII Devanagari
144
+ 57003 x-iscii-be ISCII Bengali
145
+ 57004 x-iscii-ta ISCII Tamil
146
+ 57005 x-iscii-te ISCII Telugu
147
+ 57006 x-iscii-as ISCII Assamese
148
+ 57007 x-iscii-or ISCII Oriya
149
+ 57008 x-iscii-ka ISCII Kannada
150
+ 57009 x-iscii-ma ISCII Malayalam
151
+ 57010 x-iscii-gu ISCII Gujarati
152
+ 57011 x-iscii-pa ISCII Punjabi
153
+ 65000 utf-7 Unicode (UTF-7)
154
+ 65001 utf-8 Unicode (UTF-8)
@@ -0,0 +1,70 @@
1
+ # -*- coding: utf-8 -*-
2
+ class Encoding
3
+ # Encoding::Codepage adds two methods to the Encoding class to help you look
4
+ # Encodings by their Microsoft® Codae Page Identifier. It doesn't add support
5
+ # to Ruby for Encodings that are not already supported, just makes it easier
6
+ # to find those encodings which are.
7
+ #
8
+ # At the moment, this list is available on the web at:
9
+ # * http://msdn.microsoft.com/en-us/library/dd317756
10
+ #
11
+ module CodePage
12
+ # Find an Encoding object from a Micrsoft® Code Page Identifier.
13
+ #
14
+ # A list of Code Page identifiers can be found at:
15
+ # * http://msdn.microsoft.com/en-us/library/dd317756
16
+ #
17
+ # NOTE: This library doesn't add support for all Code Pages, it merely
18
+ # allows you to look up existing encodings by their Code Page Identifier.
19
+ #
20
+ # @param Integer The Code Page Identifier.
21
+ # @return Encoding The Encoding object.
22
+ # @raise ArgumentError The Code Page you tried to find doesn't exist.
23
+ #
24
+ def codepage(id)
25
+ Encoding.find("CP#{id}")
26
+ end
27
+
28
+ # Determine whether an Encoding exists with the given name.
29
+ #
30
+ # @param String The name to search for.
31
+ # @return [Encoding, nil] The Encoding iff it exists.
32
+ #
33
+ def exist?(name)
34
+ find(name)
35
+ rescue ArgumentError => e
36
+ nil
37
+ end
38
+
39
+ # Determine whether a Code Page exists with the given Code Page Identifier.
40
+ #
41
+ # @param Integer The Code Page Identifier.
42
+ # @return [Encoding, nil] The Encoding iff it exists.
43
+ #
44
+ def codepage?(id)
45
+ exist?("CP#{id}")
46
+ end
47
+
48
+ private
49
+
50
+ def load_codepages!
51
+ File.read(codepage_file).each_line{ |line|
52
+ next if line.start_with?('#') || line =~ /\A\s*\z/
53
+
54
+ number, original, comment = line.split("\t", 3)
55
+ number = Integer(number, 10)
56
+
57
+ if !codepage?(number) && exist?(original.upcase)
58
+ Encoding.find(original.upcase).replicate "CP#{number}"
59
+ end
60
+ }
61
+ end
62
+
63
+ def codepage_file
64
+ File.join(File.dirname(__FILE__), "codepages.tsv")
65
+ end
66
+ end
67
+
68
+ extend CodePage
69
+ load_codepages!
70
+ end
metadata ADDED
@@ -0,0 +1,53 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: encoding-codepage
3
+ version: !ruby/object:Gem::Version
4
+ version: '0.1'
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Conrad Irwin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-03-30 00:00:00.000000000Z
13
+ dependencies: []
14
+ description: Provides aliases for Encodings that have Code Page Identifiers to make
15
+ it easier to interface with Microsoft APIs that use Code Page Identifiers to describe
16
+ content
17
+ email: conrad.irwin@gmail.com
18
+ executables: []
19
+ extensions: []
20
+ extra_rdoc_files: []
21
+ files:
22
+ - lib/encoding-codepage.rb
23
+ - lib/codepages.tsv
24
+ - README.md
25
+ - LICENSE.MIT
26
+ - LICENSE.MLPL
27
+ homepage: https://github.com/ConradIrwin/encoding-codepage
28
+ licenses:
29
+ - MIT, Microsoft Limited Public License
30
+ post_install_message:
31
+ rdoc_options: []
32
+ require_paths:
33
+ - lib
34
+ required_ruby_version: !ruby/object:Gem::Requirement
35
+ none: false
36
+ requirements:
37
+ - - ! '>='
38
+ - !ruby/object:Gem::Version
39
+ version: '0'
40
+ required_rubygems_version: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ requirements: []
47
+ rubyforge_project:
48
+ rubygems_version: 1.8.10
49
+ signing_key:
50
+ specification_version: 3
51
+ summary: Allow looking up Encodings by their Code Page Identifier
52
+ test_files: []
53
+ has_rdoc: