interscript 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.adoc +9 -8
- data/lib/__pycache__/g2pwrapper.cpython-38.pyc +0 -0
- data/lib/interscript-opal.rb +2 -0
- data/lib/interscript.rb +46 -56
- data/lib/interscript/command.rb +3 -2
- data/lib/interscript/fs.rb +69 -0
- data/lib/interscript/mapping.rb +35 -18
- data/lib/interscript/opal.rb +23 -0
- data/lib/interscript/opal/maps.js.erb +7 -0
- data/lib/interscript/opal_map_translate.rb +12 -0
- data/lib/interscript/version.rb +1 -1
- data/maps/{bgnpcgn-chn-Hans-Latn-1979.yaml → bgnpcgn-zho-Hans-Latn-1979.yaml} +1 -1
- data/maps/odni-aze-Cyrl-Latn-2015.yaml +144 -0
- data/maps/odni-kaz-Cyrl-Latn-2015.yaml +148 -0
- data/maps/odni-kir-Cyrl-Latn-2015.yaml +136 -0
- data/maps/odni-mkd-cyrl-latn-2015.yaml +122 -0
- data/maps/odni-tat-Cyrl-Latn-2015.yaml +142 -0
- data/maps/odni-tgk-Cyrl-Latn-2015.yaml +148 -0
- data/maps/odni-uig-Cyrl-Latn-2015.yaml +138 -0
- data/maps/ses-ara-arab-latn-1930.yaml +275 -0
- data/maps/un-ara-Arab-Latn-1971.yaml +127 -0
- data/maps/un-ara-Arab-Latn-1972.yaml +152 -0
- data/maps/un-ara-Arab-Latn-2017.yaml +383 -0
- metadata +89 -2
@@ -0,0 +1,275 @@
|
|
1
|
+
---
|
2
|
+
authority_id: ungegn
|
3
|
+
id: 1930
|
4
|
+
language: ara
|
5
|
+
source_script: Arab
|
6
|
+
destination_script: Latn
|
7
|
+
name: ROMANIZATION OF ARABIC -- UNGEGN 2017 System
|
8
|
+
url: http://www.eki.ee/wgrs/rom1_ar.pdf
|
9
|
+
creation_date: 1930
|
10
|
+
confirmation date: 2018-06
|
11
|
+
description: |
|
12
|
+
The current United Nations recommended romanization
|
13
|
+
system was approved in 2017 (resolution XI/3), based on
|
14
|
+
the system adopted by Arabic experts at the conference
|
15
|
+
held in Beirut in 2007, the Unified Arabic
|
16
|
+
Transliteration System, taking into account the
|
17
|
+
practical amendments and corrections carried out and
|
18
|
+
agreed upon by the representatives of the Arabic-
|
19
|
+
speaking countries at the Fourth Arab Conference on
|
20
|
+
Geographical Names, held in Beirut in 2008, and some
|
21
|
+
clarifications and amendments agreed in Riyadh in 20171.
|
22
|
+
Previously, the United Nations had approved a
|
23
|
+
romanization system in 1972 (resolution II/8), based on the
|
24
|
+
system adopted by Arabic experts at the conference
|
25
|
+
held at Beirut in 1971 with the practical amendments carried out
|
26
|
+
and agreed upon by the representatives of the Arabic-speaking
|
27
|
+
countries at their conference. The table was published in volume
|
28
|
+
II of the conference report.
|
29
|
+
In UN resolution XI/3 it is specifically stated that the
|
30
|
+
system was recommended for the “romanization of the
|
31
|
+
geographical names within those Arabic-speaking countries
|
32
|
+
where this system is officially adopted”. There is
|
33
|
+
evidence of its partial implementation in Jordan, Oman and
|
34
|
+
Saudi Arabia. The UNGEGN Working Group on Romanization
|
35
|
+
Systems intends to continue monitoring the UN system’s
|
36
|
+
implementation across Arabic-speaking countries.
|
37
|
+
In some countries there exist local romanization schemes
|
38
|
+
or practices. The geographical names of Algeria, Djibouti,
|
39
|
+
Mauritania, Morocco and Tunisia are generally rendered in
|
40
|
+
the traditional manner which conforms to the principles of
|
41
|
+
the French orthography.
|
42
|
+
The previous UN-approved system is still found in
|
43
|
+
considerable international usage.
|
44
|
+
Arabic is written from right to left. The Arabic script
|
45
|
+
usually omits vowel points and diacritical marks from
|
46
|
+
writing which makes it difficult to obtain uniform results
|
47
|
+
in the romanization of Arabic. It is essential to identify
|
48
|
+
correctly the words which appear in any particular name
|
49
|
+
and to know the standard Arabic-script spelling including
|
50
|
+
the relevant vowels. One must also take into account
|
51
|
+
dialectal and idiosyncratic deviations. The romanization
|
52
|
+
is generally reversible though there may be some ambiguous
|
53
|
+
letter sequences (dh, kh, sh, th) which may also point to
|
54
|
+
combinations of Arabic characters in addition to the
|
55
|
+
respective single characters.
|
56
|
+
notes:
|
57
|
+
- |
|
58
|
+
The Survey of Egypt System (SES) of romanization has the following correspondences with
|
59
|
+
the UN system:
|
60
|
+
á = a # ـَى fatha followed by ى which is ا not ي
|
61
|
+
ā = â (a) # ـَا fatha followed by alef // آ
|
62
|
+
-ah (ة- = (a # ة ta' marboota at the end of a sentence
|
63
|
+
aw = ô (au) # ـَوْ
|
64
|
+
ay = ei (ai) # ـَيْ
|
65
|
+
ḏ = ḍ # ض
|
66
|
+
dh = dh (z) # ذ
|
67
|
+
d͟h = ẓ (d) # ظ
|
68
|
+
ẖ = ḥ # ح
|
69
|
+
ī = î
|
70
|
+
j = g (j)
|
71
|
+
q = q (k)
|
72
|
+
s = s (c)
|
73
|
+
s̱ = ṣ
|
74
|
+
ṯ = ṭ
|
75
|
+
th = th (t)
|
76
|
+
ū = û
|
77
|
+
‘ = ‛
|
78
|
+
- |
|
79
|
+
The variants in parentheses are used depending on pronunciation and tradition. Not all the
|
80
|
+
variations have been given above. The article is always written el- (El-Kafr el-Qadîm, Sharm
|
81
|
+
el-Sheikh).
|
82
|
+
tests:
|
83
|
+
|
84
|
+
# Examples taken from:
|
85
|
+
# https://unstats.un.org/unsd/geoinfo/geonames/
|
86
|
+
|
87
|
+
- source: شَرم الشَيْخ
|
88
|
+
expected: sharm el-sheikh
|
89
|
+
|
90
|
+
- source: الكَفر القَدِيم
|
91
|
+
expected: el-kafr el-qadîm
|
92
|
+
map:
|
93
|
+
inherit: "un-ara-Arab-Latn-2017"
|
94
|
+
characters:
|
95
|
+
|
96
|
+
|
97
|
+
# special pointed letters
|
98
|
+
'\u0639\u064e' : '‛a' # عَ
|
99
|
+
'\u0639\u0650' : '‛i' # عِ
|
100
|
+
'\u0639\u064f' : '‛û' # عُ
|
101
|
+
# handle MacOS regex difference
|
102
|
+
'\u0639\u064f\u0648' : '‛û' # عُو damma followed by و
|
103
|
+
'\u0650\u064a' : 'î' # ـِي kasra followed by ي
|
104
|
+
'\u0650\u064a\u0651\u064e' : 'îy' # ـِيَّ
|
105
|
+
'\u064f\u0648' : 'û' # ـُو damma followed by و
|
106
|
+
'\u064e\u0627' : # ـَا fatha followed by ا
|
107
|
+
- 'â'
|
108
|
+
- 'a'
|
109
|
+
'\u064e\u0649' : 'a' # ـَى fatha followed by ى which is ا not ي
|
110
|
+
'\u064e\u0648\u0652' : # ـَوْ
|
111
|
+
- 'ô'
|
112
|
+
- 'au'
|
113
|
+
'\u064e\u064a\u0652' : # ـَيْ
|
114
|
+
- 'ei'
|
115
|
+
- 'ai'
|
116
|
+
'\u0622' : # آ
|
117
|
+
- 'â'
|
118
|
+
- 'a'
|
119
|
+
|
120
|
+
# ta' marboota
|
121
|
+
'\u0629$' : 'a'
|
122
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{2})\u0629' : 'a'
|
123
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{3})\u0629' : 'a'
|
124
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{4})\u0629' : 'a'
|
125
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{5})\u0629' : 'a'
|
126
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{6})\u0629' : 'a'
|
127
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{7})\u0629' : 'a'
|
128
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{8})\u0629' : 'a'
|
129
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{9})\u0629' : 'a'
|
130
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{10})\u0629' : 'a'
|
131
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{11})\u0629' : 'a'
|
132
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{12})\u0629' : 'a'
|
133
|
+
'(?<=\b\u0627\u0644[\u0600-\u06ff]{13})\u0629' : 'a'
|
134
|
+
|
135
|
+
|
136
|
+
# Sun letters
|
137
|
+
'\b\u0627\u0644\u062a' : 'el-t' # الت
|
138
|
+
'\b\u0627\u0644\u062b' : # الث
|
139
|
+
- 'el-th'
|
140
|
+
- 'el-t'
|
141
|
+
'\b\u0627\u0644\u062f' : 'el-d' # الد
|
142
|
+
'\b\u0627\u0644\u0630' : # الذ
|
143
|
+
- 'el-dh'
|
144
|
+
- 'el-z'
|
145
|
+
'\b\u0627\u0644\u0631' : 'el-r' # الر
|
146
|
+
'\b\u0627\u0644\u0632' : 'el-z' # الز
|
147
|
+
'\b\u0627\u0644\u0633' : # الس
|
148
|
+
- 'el-s'
|
149
|
+
- 'el-c'
|
150
|
+
'\b\u0627\u0644\u0634' : 'el-sh' # الش
|
151
|
+
'\b\u0627\u0644\u0635' : 'el-ṣ' # الص
|
152
|
+
'\b\u0627\u0644\u0636' : 'el-ḍ' # الض
|
153
|
+
'\b\u0627\u0644\u0637' : 'el-ṭ' # الط
|
154
|
+
'\b\u0627\u0644\u0638' : # الظ
|
155
|
+
- 'el-ẓ'
|
156
|
+
- 'el-d'
|
157
|
+
'\b\u0627\u0644\u0644' : 'el-l' # الل
|
158
|
+
'\b\u0627\u0644\u0646' : 'el-n' # الن
|
159
|
+
|
160
|
+
|
161
|
+
# shadda
|
162
|
+
'\u062b\u0651' : # ث
|
163
|
+
- 'thth'
|
164
|
+
- 'tt'
|
165
|
+
'\u062c\u0651' : # ج
|
166
|
+
- 'gg'
|
167
|
+
- 'jj'
|
168
|
+
'\u062d\u0651' : 'ḥḥ' # ح
|
169
|
+
'\u062e\u0651' : 'khkh' # خ
|
170
|
+
|
171
|
+
'\u0633\u0651' : # س
|
172
|
+
- 'ss'
|
173
|
+
- 'cc'
|
174
|
+
'\u0635\u0651' : 'ṣṣ' # ص
|
175
|
+
'\u0636\u0651' : 'ḍḍ' # ض
|
176
|
+
'\u0637\u0651' : 'ṭṭ' # ط
|
177
|
+
'\u0638\u0651' : # ظ
|
178
|
+
- 'ẓẓ'
|
179
|
+
- 'dd'
|
180
|
+
'\u0642\u0651' : # ق
|
181
|
+
- 'qq'
|
182
|
+
- 'kk'
|
183
|
+
|
184
|
+
'\b\u0627\u0644' : 'el-' # ال
|
185
|
+
|
186
|
+
# normal letters
|
187
|
+
'\u062c' : # ج
|
188
|
+
- 'g'
|
189
|
+
- 'j'
|
190
|
+
'\ufe9f' : # ﺟ
|
191
|
+
- 'g'
|
192
|
+
- 'j'
|
193
|
+
'\ufea0' : # ﺠ
|
194
|
+
- 'g'
|
195
|
+
- 'j'
|
196
|
+
'\ufe9e' : # ﺞ
|
197
|
+
- 'g'
|
198
|
+
- 'j'
|
199
|
+
|
200
|
+
'\u062d' : 'ḥ' # ح
|
201
|
+
'\ufea3' : 'ḥ' # ﺣ
|
202
|
+
'\ufea4' : 'ḥ' # ﺤ
|
203
|
+
'\ufea2' : 'ḥ' # ﺢ
|
204
|
+
|
205
|
+
'\u062e' : 'kh' # خ
|
206
|
+
'\ufea7' : 'kh' # ﺧ
|
207
|
+
'\ufea8' : 'kh' # ﺨ
|
208
|
+
'\ufea6' : 'kh' # ﺦ
|
209
|
+
|
210
|
+
'\u0630' : # ذ
|
211
|
+
- 'dh'
|
212
|
+
- 'z'
|
213
|
+
'\ufeac' : # ﺬ
|
214
|
+
- 'dh'
|
215
|
+
- 'z'
|
216
|
+
|
217
|
+
|
218
|
+
'\u0633' : # س
|
219
|
+
- 's'
|
220
|
+
- 'c'
|
221
|
+
'\ufeb3' : # ﺳ
|
222
|
+
- 's'
|
223
|
+
- 'c'
|
224
|
+
'\ufeb4' : # ﺴ
|
225
|
+
- 's'
|
226
|
+
- 'c'
|
227
|
+
'\ufeb2' : # ﺲ
|
228
|
+
- 's'
|
229
|
+
- 'c'
|
230
|
+
|
231
|
+
'\u0635' : 'ṣ' # ص
|
232
|
+
'\ufebb' : 'ṣ' # ﺻ
|
233
|
+
'\ufebc' : 'ṣ' # ﺼ
|
234
|
+
'\ufeba' : 'ṣ' # ﺺ
|
235
|
+
|
236
|
+
'\u0636' : 'ḍ' # ض
|
237
|
+
'\ufebf' : 'ḍ' # ﺿ
|
238
|
+
'\ufec0' : 'ḍ' # ﻀ
|
239
|
+
'\ufebe' : 'ḍ' # ﺾ
|
240
|
+
|
241
|
+
'\u0637' : 'ṭ' # ط
|
242
|
+
'\ufec3' : 'ṭ' # ﻃ
|
243
|
+
'\ufec4' : 'ṭ' # ﻄ
|
244
|
+
'\ufec2' : 'ṭ' # ﻂ
|
245
|
+
|
246
|
+
'\u0639' : '‛' # ع
|
247
|
+
'\ufecb' : '‛' # ﻋ
|
248
|
+
'\ufecc' : '‛' # ﻌ
|
249
|
+
'\ufeca' : '‛' # ﻊ
|
250
|
+
|
251
|
+
'\u0638' : # ظ
|
252
|
+
- 'ẓ'
|
253
|
+
- 'd'
|
254
|
+
'\ufec7' : # ظ
|
255
|
+
- 'ẓ'
|
256
|
+
- 'd'
|
257
|
+
'\ufec8' : # ظ
|
258
|
+
- 'ẓ'
|
259
|
+
- 'd'
|
260
|
+
'\ufec6' : # ظ
|
261
|
+
- 'ẓ'
|
262
|
+
- 'd'
|
263
|
+
|
264
|
+
'\u0642' : # ق
|
265
|
+
- 'q'
|
266
|
+
- 'k'
|
267
|
+
'\ufed7' : # ﻗ
|
268
|
+
- 'q'
|
269
|
+
- 'k'
|
270
|
+
'\ufed8' : # ﻘ
|
271
|
+
- 'q'
|
272
|
+
- 'k'
|
273
|
+
'\ufed6' : # ﻖ
|
274
|
+
- 'q'
|
275
|
+
- 'k'
|
@@ -0,0 +1,127 @@
|
|
1
|
+
---
|
2
|
+
authority_id: ungegn
|
3
|
+
id: 1971
|
4
|
+
language: ara
|
5
|
+
source_script: Arab
|
6
|
+
destination_script: Latn
|
7
|
+
name: 1971 "Beirut system"
|
8
|
+
url: https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/2nd-uncsgn-docs/E_Conf61_4_Add1_e.pdf
|
9
|
+
creation_date: 1971
|
10
|
+
confirmation date: 2018-06
|
11
|
+
description: |
|
12
|
+
The current United Nations recommended romanization
|
13
|
+
system was approved in 2017 (resolution XI/3), based on
|
14
|
+
the system adopted by Arabic experts at the conference
|
15
|
+
held in Beirut in 2007, the Unified Arabic
|
16
|
+
Transliteration System, taking into account the
|
17
|
+
practical amendments and corrections carried out and
|
18
|
+
agreed upon by the representatives of the Arabic-
|
19
|
+
speaking countries at the Fourth Arab Conference on
|
20
|
+
Geographical Names, held in Beirut in 2008, and some
|
21
|
+
clarifications and amendments agreed in Riyadh in 20171.
|
22
|
+
Previously, the United Nations had approved a
|
23
|
+
romanization system in 1972 (resolution II/8), based on the
|
24
|
+
system adopted by Arabic experts at the conference
|
25
|
+
held at Beirut in 1971 with the practical amendments carried out
|
26
|
+
and agreed upon by the representatives of the Arabic-speaking
|
27
|
+
countries at their conference. The table was published in volume
|
28
|
+
II of the conference report.
|
29
|
+
In UN resolution XI/3 it is specifically stated that the
|
30
|
+
system was recommended for the “romanization of the
|
31
|
+
geographical names within those Arabic-speaking countries
|
32
|
+
where this system is officially adopted”. There is
|
33
|
+
evidence of its partial implementation in Jordan, Oman and
|
34
|
+
Saudi Arabia. The UNGEGN Working Group on Romanization
|
35
|
+
Systems intends to continue monitoring the UN system’s
|
36
|
+
implementation across Arabic-speaking countries.
|
37
|
+
In some countries there exist local romanization schemes
|
38
|
+
or practices. The geographical names of Algeria, Djibouti,
|
39
|
+
Mauritania, Morocco and Tunisia are generally rendered in
|
40
|
+
the traditional manner which conforms to the principles of
|
41
|
+
the French orthography.
|
42
|
+
The previous UN-approved system is still found in
|
43
|
+
considerable international usage.
|
44
|
+
Arabic is written from right to left. The Arabic script
|
45
|
+
usually omits vowel points and diacritical marks from
|
46
|
+
writing which makes it difficult to obtain uniform results
|
47
|
+
in the romanization of Arabic. It is essential to identify
|
48
|
+
correctly the words which appear in any particular name
|
49
|
+
and to know the standard Arabic-script spelling including
|
50
|
+
the relevant vowels. One must also take into account
|
51
|
+
dialectal and idiosyncratic deviations. The romanization
|
52
|
+
is generally reversible though there may be some ambiguous
|
53
|
+
letter sequences (dh, kh, sh, th) which may also point to
|
54
|
+
combinations of Arabic characters in addition to the
|
55
|
+
respective single characters.
|
56
|
+
notes:
|
57
|
+
- |
|
58
|
+
ث is t͟h (th with sub marcon)
|
59
|
+
خ is k͟h (kh with sub marcon)
|
60
|
+
ذ is d͟h (dh with sub marcon)
|
61
|
+
ش is s͟h (sh with sub marcon)
|
62
|
+
ظ is z͟h (zh with sub marcon)
|
63
|
+
غ is g͟h (gh witg sub marcon)
|
64
|
+
The previous UN 1972 System had the following differences:
|
65
|
+
the character (ظ) was romanized as z̧ instead of d͟h;
|
66
|
+
the cedilla (¸) was used instead of sub-macron (_) in all characters with sub-macrons. - |
|
67
|
+
|
68
|
+
tests:
|
69
|
+
|
70
|
+
# Examples taken from:
|
71
|
+
# https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/2nd-uncsgn-docs/E_Conf61_4_Add1_e.pdf
|
72
|
+
# page 31 (38 digital)
|
73
|
+
|
74
|
+
- source: خَيبَر
|
75
|
+
expected: k͟haybar
|
76
|
+
|
77
|
+
- source: ظَهران
|
78
|
+
expected: z͟hahrān
|
79
|
+
|
80
|
+
- source: القُدس
|
81
|
+
expected: al quds
|
82
|
+
|
83
|
+
map:
|
84
|
+
inherit: "un-ara-Arab-Latn-2017"
|
85
|
+
characters:
|
86
|
+
|
87
|
+
# sun letters
|
88
|
+
'\b\u0627\u0644\u062b' : 'at͟h t͟h' # الث
|
89
|
+
'\b\u0627\u0644\u0630' : 'ad͟h d͟h' # الذ
|
90
|
+
'\b\u0627\u0644\u0634' : 'as͟h s͟h' # الش
|
91
|
+
'\b\u0627\u0644\u0638' : 'az͟h z͟h' # الظ
|
92
|
+
|
93
|
+
# shadda
|
94
|
+
'\u062e\u0651' : 'k͟hk͟h' # خ
|
95
|
+
'\u0630\u0651' : 'd͟hd͟h' # ذ
|
96
|
+
'\u0634\u0651' : 's͟h' # ش
|
97
|
+
'\u0638\u0651' : 'z͟hz͟h' # ظ
|
98
|
+
'\u063a\u0651' : 'g͟hg͟h' # غ
|
99
|
+
|
100
|
+
'\u062b' : 't͟h' # ث
|
101
|
+
'\ufe9b' : 't͟h' # ﺛ
|
102
|
+
'\ufe9c' : 't͟h' # ﺜ
|
103
|
+
'\ufe9a' : 't͟h' # ﺚ
|
104
|
+
|
105
|
+
'\u062e' : 'k͟h' # خ
|
106
|
+
'\ufea7' : 'k͟h' # ﺧ
|
107
|
+
'\ufea8' : 'k͟h' # ﺨ
|
108
|
+
'\ufea6' : 'k͟h' # ﺦ
|
109
|
+
|
110
|
+
'\u063a' : 'g͟h' # غ
|
111
|
+
'\ufecf' : 'g͟h' # ﻏ
|
112
|
+
'\ufed0' : 'g͟h' # ﻐ
|
113
|
+
'\ufece' : 'g͟h' # ﻎ
|
114
|
+
|
115
|
+
'\u0630' : 'd͟h' # ذ
|
116
|
+
'\ufeac' : 'd͟h' # ﺬ
|
117
|
+
|
118
|
+
'\u0634' : 's͟h' # ش
|
119
|
+
'\ufeb7' : 's͟h' # ﺷ
|
120
|
+
'\ufeb8' : 's͟h' # ﺸ
|
121
|
+
'\ufeb6' : 's͟h' # ﺶ
|
122
|
+
|
123
|
+
'\u0638' : 'z͟h' # ظ
|
124
|
+
'\ufec7' : 'z͟h' # ﻇ
|
125
|
+
'\ufec8' : 'z͟h' # ﻈ
|
126
|
+
'\ufec6' : 'z͟h' # ﻆ
|
127
|
+
|
@@ -0,0 +1,152 @@
|
|
1
|
+
---
|
2
|
+
authority_id: ungegn
|
3
|
+
id: 1972
|
4
|
+
language: ara
|
5
|
+
source_script: Arab
|
6
|
+
destination_script: Latn
|
7
|
+
name: ROMANIZATION OF ARABIC -- UNGEGN 1972 System
|
8
|
+
url: http://www.eki.ee/wgrs/obs_rom_vers/rom1_ar_v4_0.pdf
|
9
|
+
creation_date: 1972
|
10
|
+
confirmation date: 2018-06
|
11
|
+
description: |
|
12
|
+
The United Nations recommended romanization
|
13
|
+
system was approved in 1972 (resolution II/8),
|
14
|
+
based on the system adopted by Arabic experts at
|
15
|
+
the conference held at Beirut in 1971 with the
|
16
|
+
practical amendments carried out and agreed upon
|
17
|
+
by the representatives of the Arabic-speaking
|
18
|
+
countries at their conference. The table was
|
19
|
+
published in volume II of the conference report1
|
20
|
+
. In the UN resolution it was specifically
|
21
|
+
pointed out that the system was recommended "for
|
22
|
+
the romanization of the geographical names within
|
23
|
+
those Arabic-speaking countries where this system
|
24
|
+
is officially acknowledged". It cannot be
|
25
|
+
definitely ascertained which of the
|
26
|
+
Arabicspeaking countries have adopted this system
|
27
|
+
officially, especially since 2007 when there are
|
28
|
+
efforts by the Arabic Division to promote a
|
29
|
+
modification of the UN system (ADEGN
|
30
|
+
romanization, see the section on other
|
31
|
+
romanization systems below), with varying
|
32
|
+
success2 . Judging by the use of names in
|
33
|
+
international cartographic products which rely
|
34
|
+
mostly on national sources it appears that the UN
|
35
|
+
system or its modification is more or less
|
36
|
+
current in Iraq, Kuwait, Libya, Saudi Arabia3 ,
|
37
|
+
United Arab Emirates and Yemen, there and in some
|
38
|
+
other countries the system is often used without
|
39
|
+
diacritical marks. For the geographical names of
|
40
|
+
the Syrian Arab Republic international maps
|
41
|
+
favour the UN system while the local usage seems
|
42
|
+
to prefer a French-oriented romanization. Also in
|
43
|
+
Egypt and Sudan there exist local romanization
|
44
|
+
schemes or practices side by side with the UN
|
45
|
+
system. The geographical names of Algeria,
|
46
|
+
Djibouti, Mauritania, Morocco and Tunisia are
|
47
|
+
generally rendered in the traditional manner
|
48
|
+
which conforms to the principles of the French
|
49
|
+
orthography. Resolution 7 of the Seventh UN
|
50
|
+
Conference on the Standardization of Geographical
|
51
|
+
Names (1998) recommended that "the League of Arab
|
52
|
+
States should, through its specialized
|
53
|
+
structures, continue its efforts to organize a
|
54
|
+
conference with a view to considering the
|
55
|
+
difficulties encountered in applying the amended
|
56
|
+
Beirut system of 1972 for the romanization of
|
57
|
+
Arabic script, and submit, as soon as possible, a
|
58
|
+
solution to the United Nations Group of Experts
|
59
|
+
on Geographical Names". At the Eighth UN
|
60
|
+
Conference on the Standardization of Geographical
|
61
|
+
Names (2002), the Arabic Division of the UN Group
|
62
|
+
of Experts announced that it had finalised
|
63
|
+
proposed modifications to the UN recommended
|
64
|
+
romanization system. These proposals would be
|
65
|
+
submitted to the League of Arab States for
|
66
|
+
approval. Arabic is written from right to left.
|
67
|
+
The Arabic script usually omits vowel points and
|
68
|
+
diacritical marks from writing which makes it
|
69
|
+
difficult to obtain uniform results in the
|
70
|
+
romanization of Arabic. It is essential to
|
71
|
+
identify correctly the words which appear in any
|
72
|
+
particular name and to know the standard Arabic-
|
73
|
+
script spelling including proper pointing. One
|
74
|
+
must also take into account dialectal and
|
75
|
+
idiosyncratic deviations. The romanization is
|
76
|
+
generally reversible though there are some
|
77
|
+
ambiguous letter sequences (dh, kh, sh, th) which
|
78
|
+
may also point to combinations of Arabic
|
79
|
+
characters in addition to the respective single
|
80
|
+
characters.
|
81
|
+
notes:
|
82
|
+
- |
|
83
|
+
The previous UN 1972 System had the following differences:
|
84
|
+
the character (ظ) was romanized as z̧ instead of d͟h;
|
85
|
+
ح, ص, ض the cedilla (¸) was used instead of sub-macron (_) in all characters with sub-macrons. - |
|
86
|
+
When the definite article al precedes a word beginning with one of the "sun letters" (t,
|
87
|
+
th, d, dh, r, z, s, sh, ş, ḑ, ţ, z, l, n ̧ ) the l of the definite article is assimilated with the first
|
88
|
+
consonant of the word: ash-Sh الشارقة āriqah.
|
89
|
+
|
90
|
+
|
91
|
+
tests:
|
92
|
+
|
93
|
+
# Examples taken from:
|
94
|
+
# https://unstats.un.org/unsd/geoinfo/geonames/
|
95
|
+
|
96
|
+
- source: مِصر
|
97
|
+
expected: mişr
|
98
|
+
|
99
|
+
- source: قَطَر
|
100
|
+
expected: qaţar
|
101
|
+
|
102
|
+
- source: الجُمهُورِيَّة العِراقِيَّة
|
103
|
+
expected: al jumhūrīyah al ‘irāqīyah
|
104
|
+
|
105
|
+
- source: جُمهُورِيَّة مِصر العَرَبِيَّة
|
106
|
+
expected: jumhūrīyat mişr al ‘arabīyah
|
107
|
+
|
108
|
+
- source: الرِيَاض
|
109
|
+
expected: ar riyāḑ
|
110
|
+
|
111
|
+
- source: الشارِقة
|
112
|
+
expected: ash shāriqah
|
113
|
+
|
114
|
+
map:
|
115
|
+
inherit: "un-ara-Arab-Latn-2017"
|
116
|
+
characters:
|
117
|
+
|
118
|
+
'\b\u0627\u0644\u0635' : 'aş ş' # الص
|
119
|
+
'\b\u0627\u0644\u0636' : 'aḑ ḑ' # الض
|
120
|
+
'\b\u0627\u0644\u0637' : 'aţ ţ' # الط
|
121
|
+
|
122
|
+
'\u062d\u0651' : 'ḩḩ' # ح
|
123
|
+
'\u0635\u0651' : 'şş' # ص
|
124
|
+
'\u0636\u0651' : 'ḑḑ' # ض
|
125
|
+
'\u0637\u0651' : 'ţţ' # ط
|
126
|
+
'\u0638\u0651' : 'z̧z̧' # ظ
|
127
|
+
|
128
|
+
'\u062d' : 'ḩ' # ح
|
129
|
+
'\ufea3' : 'ḩ' # ﺣ
|
130
|
+
'\ufea4' : 'ḩ' # ﺤ
|
131
|
+
'\ufea2' : 'ḩ' # ﺢ
|
132
|
+
|
133
|
+
'\u0635' : 'ş' # ص
|
134
|
+
'\ufebb' : 'ş' # ﺻ
|
135
|
+
'\ufebc' : 'ş' # ﺼ
|
136
|
+
'\ufeba' : 'ş' # ﺺ
|
137
|
+
|
138
|
+
'\u0636' : 'ḑ' # ض
|
139
|
+
'\ufebf' : 'ḑ' # ﺿ
|
140
|
+
'\ufec0' : 'ḑ' # ﻀ
|
141
|
+
'\ufebe' : 'ḑ' # ﺾ
|
142
|
+
|
143
|
+
'\u0637' : 'ţ' # ط
|
144
|
+
'\ufec3' : 'ţ' # ﻃ
|
145
|
+
'\ufec4' : 'ţ' # ﻄ
|
146
|
+
'\ufec2' : 'ţ' # ﻂ
|
147
|
+
|
148
|
+
'\u0638' : 'z̧' # ظ
|
149
|
+
'\ufec7' : 'z̧' # ﻇ
|
150
|
+
'\ufec8' : 'z̧' # ﻈ
|
151
|
+
'\ufec6' : 'z̧' # ﻆ
|
152
|
+
|