interscript 0.1.3 → 0.1.4

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,110 @@
1
+ ---
2
+ authority_id: mvd
3
+ id: 19678
4
+ language: rus
5
+ source_script: Cyrl
6
+ destination_script: Latn
7
+ name: 8/19678 On approval of the Instructions for transliteration of surnames and proper names of citizens of the Republic of Belarus when their personal data is included in the population register
8
+ url: https://www.icao.int/publications/pages/publication.aspx?docnum=9303
9
+ creation_date: 2008
10
+
11
+ notes:
12
+ - check notes from mvd-bel-Cyrl-Latn-2008
13
+
14
+ tests:
15
+ - source: Ева
16
+ expected: Eva
17
+ - source: Васiльева
18
+ expected: Vasiĺeva
19
+ - source: Адъютантов
20
+ expected: Adjutantov
21
+
22
+ map:
23
+ rules:
24
+ # note[5]
25
+ - pattern: (?<=[ЗзЛлНнСсЦц])\u044C # ь after consonants
26
+ result: "\\1\u0301"
27
+ - pattern: (?<=[ЗзЛлНнСсЦц])\u02B9 # Ь after consonants
28
+ result: "\\1\u0301"
29
+ - pattern: ([’Ъъ]\u042E)
30
+ result: Ju
31
+ - pattern: ([’Ъъ]\u044E)
32
+ result: ju
33
+ - pattern: ([’Ъъ]\u042F)
34
+ result: Ja
35
+ - pattern: ([’Ъъ]\u044F)
36
+ result: ja
37
+
38
+ characters:
39
+ '’' : 'j'
40
+
41
+ '\u0410' : 'A' # А
42
+ '\u0411' : 'B' # Б
43
+ '\u0412' : 'V' # B
44
+ '\u0413' : 'G' # Г
45
+ '\u0414' : 'D' # Д
46
+ '\u0415' : 'E' # Е
47
+ '\u0401' : 'E' # Ё
48
+ '\u0416' : 'Zh' # Ж
49
+ '\u0417' : 'Z' # З
50
+ '\u0406' : 'I' # І
51
+ '\u0419' : "J" # Й
52
+ '\u041A' : 'K' # К
53
+ '\u041B' : 'L' # Л
54
+ '\u041C' : 'M' # М
55
+ '\u041D' : 'N' # Н
56
+ '\u041E' : 'O' # О
57
+ '\u041F' : 'P' # П
58
+ '\u0420' : 'R' # Р
59
+ '\u0421' : 'S' # С
60
+ '\u0422' : 'T' # Т
61
+ '\u0423' : 'U' # У
62
+ '\U040E' : 'W' # Ў
63
+ '\u0424' : 'F' # Ф
64
+ '\u0425' : 'Kh' # Х
65
+ '\u0426' : 'Ts' # Ц
66
+ '\u0427' : 'Ch' # Ч
67
+ '\u0428' : 'Sh' # Ш
68
+ '\u0429' : 'Shch' # Щ
69
+ '\u042A' : 'J' # Ъ
70
+ '\u042B' : 'Y' # Ы
71
+ '\u042C' : '' # Ь
72
+ '\u042D' : 'E' # Э
73
+ '\u042E' : 'Iu' # Ю
74
+ '\u042F' : 'Ia' # Я
75
+
76
+ '\u0430' : 'a' # а
77
+ '\u0431' : 'b' # б
78
+ '\u0432' : 'v' # в
79
+ '\u0433' : 'g' # г
80
+ '\u0434' : 'd' # д
81
+ '\u0435' : 'e' # е
82
+ '\u0451' : 'e' # ё
83
+ '\u0436' : 'zh' # ж
84
+ '\u0437' : 'z' # з
85
+ '\u0456' : 'i' # і
86
+ '\u0439' : 'j' # й
87
+ '\u043A' : 'k' # к
88
+ '\u043B' : 'l' # л
89
+ '\u043C' : 'm' # м
90
+ '\u043D' : 'n' # н
91
+ '\u043E' : 'o' # о
92
+ '\u043F' : 'p' # п
93
+ '\u0440' : 'r' # р
94
+ '\u0441' : 's' # с
95
+ '\u0442' : 't' # т
96
+ '\u0443' : 'u' # у
97
+ '\u045E' : 'w' # ў
98
+ '\u0444' : 'f' # ф
99
+ '\u0445' : 'kh' # х
100
+ '\u0446' : 'ts' # Ц
101
+ '\u0447' : 'ch' # ч
102
+ '\u0448' : 'sh' # ш
103
+ '\u0449' : 'shch' # щ
104
+ '\u044A' : 'j' # ъ
105
+ '\u044B' : 'y' # ы
106
+ '\u044C' : '' # ь
107
+ '\u044D' : 'e' # э
108
+ '\u044E' : 'iu' # ю
109
+ '\u044F' : 'ia' # я
110
+
@@ -0,0 +1,37 @@
1
+ ---
2
+ authority_id: mvd
3
+ id: 22721
4
+ language: bel
5
+ source_script: Cyrl
6
+ destination_script: Latn
7
+ name: |
8
+ 8/22721 On approval of the Instructions on the organization of work of units of citizenship
9
+ and migration of internal affairs bodies on the issuance, registration, exchange,
10
+ invalidation, seizure, storage and destruction of a passport of a citizen of the Republic of Belarus
11
+ url: https://pravo.by/document/?guid=3871&p0=W21022721
12
+ creation_date: 2010
13
+
14
+ description: |
15
+ RESOLUTION OF THE MINISTRY OF INTERNAL AFFAIRS OF THE REPUBLIC OF BELARUS
16
+ June 28, 2010 No. 200
17
+ On approval of the Instructions on the organization of work of units of citizenship
18
+ and migration of internal affairs bodies on the issuance, registration, exchange,
19
+ invalidation, seizure, storage and destruction of a passport of a citizen of the Republic of Belarus
20
+
21
+ notes:
22
+ - check notes from mvd-rus-Cyrl-Latn-2008
23
+
24
+ tests:
25
+ - source: Ева
26
+ expected: Eva
27
+ - source: Васiльева
28
+ expected: Vasileva
29
+ - source: Адъютантов
30
+ expected: Adjutantov
31
+
32
+ map:
33
+ inherit: "mvd-rus-Cyrl-Latn-2008"
34
+
35
+ postrules:
36
+ - pattern: \u0301 # remove diacritics
37
+ result: ""
@@ -0,0 +1,148 @@
1
+ ---
2
+ authority_id: odni
3
+ id: 2015
4
+ language: bel
5
+ source_script: Cyrl
6
+ destination_script: Latn
7
+ name: Office of the Director Of National Intelligence Belarusian Personal Names 2015, ICS 630-01 Annex B
8
+ # url:
9
+ source: ICS 630-01, Annex B
10
+ creation_date: 2015
11
+ confirmation_date: 2015
12
+ description: |
13
+ This system is the Intelligence Community (IC) standard for the transliteration of Belarusian
14
+ names that will be applied to all final written reports and products for IC consumers. It is not
15
+ intended to eliminate variations of a name that can contribute forensic information. Rather, it is to
16
+ provide an IC standard Romanized (English) transliteration from Belarusian that can then be
17
+ linked to forensic information in ways that will help identify the referent of the name.
18
+
19
+ In cases where an individual’s name has already been transliterated in a variant spelling, the IC
20
+ Standard spelling should appear first, followed by the variant spelling(s) in parentheses at the first
21
+ usage. In addition, if the original Cyrillic spelling is known, that spelling should also appear in
22
+ parentheses following the name, if possible, following best practices of the issuing organization
23
+ and taking into consideration information system capabilities. This convention is designed to
24
+ ensure that vital forensic information is not lost.
25
+
26
+ For names of persons who are known to not be part of the Belarusian-speaking community, use
27
+ the relevant IC transliteration standard for names from that language (e.g., Mikhail, Yitzhak). A
28
+ translator’s note may be used to clarify the known origin of the person. Spell names of
29
+ individuals from languages that are written in Roman letters as they are spelled in those
30
+ languages (e.g., George Clooney, Jorge Garcia, Georges Pompidou).
31
+
32
+ In the case of active senior government officials in the on-line CIA World Factbook and the online directory of Chiefs of State and Cabinet Members of Foreign Governments, the spellings
33
+ given in these on-line reference works should be used in place of the IC Standard. For any
34
+ individual who has at one time been listed in the Factbook or Chiefs of State directory but who no
35
+ longer appears in those resources (i.e. is no longer a government official), the IC Standard
36
+ spelling should appear first, with the spelling, if known, as it previously appeared in those
37
+ resources listed within parentheses at the first usage.
38
+
39
+ The primary goal is to produce a consistent Romanized transcription of names that is specifically
40
+ readable to the English-speaking non-specialist. The system uses the 26 letters of the standard
41
+ (English) Roman alphabet. Some ambiguities in the Romanized form will occur without the use
42
+ of diacritics. However, within the context of a report, where additional information about the
43
+ individual is provided, the referent will be clearly identified. This system will be used in
44
+ conjunction with on-line tools, name
45
+
46
+ notes:
47
+
48
+ tests:
49
+ - source: Міхаіл
50
+ expected: Mikhail
51
+ - source: Беларусь
52
+ expected: Byelarus
53
+ - source: Кастусь Каліноўскі
54
+ expected: Kastus Kalinowski
55
+ - source: Васіль Быкау
56
+ expected: Vasil Bykau
57
+ - source: Янка Купала
58
+ expected: Yanka Kupala
59
+ - source: Маланка
60
+ expected: Malanka
61
+ - source: Пакаранне
62
+ expected: Pakarannye
63
+ - source: Бэз
64
+ expected: Bez
65
+ - source: Чабор
66
+ expected: Chabor
67
+ - source: |
68
+ Дзяўчына, дзяўчыначка пасярод гісторыі
69
+ З прастадушнай шчырасьцю глядзіць на тэрыторыю.
70
+ У вакне заўсёды звыклая выява:
71
+ Шэры двор, шэры слуп, на слупе аб'явы.
72
+ expected: |
73
+ Dzyawchyna, dzyawchynachka pasyarod historyi
74
+ Z prastadushnay shchyrastsyu hlyadzits na terytoryyu.
75
+ U vaknye zawsyody zvyklaya vyyava:
76
+ Shery dvor, shery slup, na slupye abyavy.
77
+
78
+ map:
79
+ characters:
80
+ '\u0027' : '' # '
81
+
82
+ '\u0410' : 'A' # A
83
+ '\u0411' : 'B' # Б
84
+ '\u0412' : 'V' # B
85
+ '\u0413' : 'H' # Г
86
+ '\u0490' : 'G' # Ґ
87
+ '\u0414' : 'D' # Д
88
+ '\u0415' : 'Ye' # Е
89
+ '\u0401' : 'Yo' # Ё
90
+ '\u0416' : 'Zh' # Ж
91
+ '\u0417' : 'Z' # З
92
+ '\u0406' : 'I' # І
93
+ '\u0419' : 'Y' # Й
94
+ '\u041A' : 'K' # К
95
+ '\u041B' : 'L' # Л
96
+ '\u041C' : 'M' # М
97
+ '\u041D' : 'N' # Н
98
+ '\u041E' : 'O' # О
99
+ '\u041F' : 'P' # П
100
+ '\u0420' : 'R' # Р
101
+ '\u0421' : 'S' # С
102
+ '\u0422' : 'T' # Т
103
+ '\u0423' : 'U' # У
104
+ '\U040E' : 'W' # Ў
105
+ '\u0424' : 'F' # Ф
106
+ '\u0425' : 'Kh' # Х
107
+ '\u0426' : 'Ts' # Ц
108
+ '\u0427' : 'Ch' # Ч
109
+ '\u0428' : 'Sh' # Ш
110
+ '\u042B' : 'Y' # Ы
111
+ '\u042C' : '' # Ь
112
+ '\u042D' : 'E' # Э
113
+ '\u042E' : 'Yu' # Ю
114
+ '\u042F' : 'Ya' # Я
115
+
116
+ '\u0430' : 'a' # а
117
+ '\u0431' : 'b' # б
118
+ '\u0432' : 'v' # в
119
+ '\u0433' : 'h' # г
120
+ '\u0491' : 'g' # ґ
121
+ '\u0434' : 'd' # д
122
+ '\u0435' : 'ye' # е
123
+ '\u0451' : 'yo' # ё
124
+ '\u0436' : 'zh' # ж
125
+ '\u0437' : 'z' # з
126
+ '\u0456' : 'i' # і
127
+ '\u0439' : 'y' # й
128
+ '\u043A' : 'k' # к
129
+ '\u043B' : 'l' # л
130
+ '\u043C' : 'm' # м
131
+ '\u043D' : 'n' # н
132
+ '\u043E' : 'o' # о
133
+ '\u043F' : 'p' # п
134
+ '\u0440' : 'r' # р
135
+ '\u0441' : 's' # с
136
+ '\u0442' : 't' # т
137
+ '\u0443' : 'u' # у
138
+ '\u045E' : 'w' # ў
139
+ '\u0444' : 'f' # ф
140
+ '\u0445' : 'kh' # х
141
+ '\u0446' : 'ts' # ц
142
+ '\u0447' : 'ch' # ч
143
+ '\u0448' : 'sh' # ш
144
+ '\u044B' : 'y' # ы
145
+ '\u044c' : '' # Ь
146
+ '\u044D' : 'e' # э
147
+ '\u044E' : 'yu' # ю
148
+ '\u044F' : 'ya' # я
@@ -0,0 +1,96 @@
1
+ ---
2
+ authority_id: odni
3
+ id: 2015
4
+ language: bul
5
+ source_script: Cyrl
6
+ destination_script: Latn
7
+ name: Office of the Director Of National Intelligence Bulgarian Personal Names 2015, ICS-630-01 Annex O
8
+ # url:
9
+ source: ICS-630-01 Annex O
10
+ creation_date: 2015
11
+ confirmation_date: 2015
12
+ description: |
13
+ This system is the Intelligence Community standard for the transliteration of Bulgarian person
14
+ names that will be applied to all final written reports and products for IC consumers. This
15
+ standard matches both the Bulgarian national standard adopted in 2009 and the Board of
16
+ Geographic Names / Permanent Committee on Geographic Names standard adopted in 2013. It is
17
+ not intended to eliminate variations of a name that can contribute forensic information. Rather, it
18
+ is to provide an IC standard Romanized (English) transliteration from Bulgarian that can then be
19
+ linked to forensic information in ways that will help identify the referent of the name.
20
+
21
+ In cases where an individual’s name has already been transliterated in a variant spelling, the IC
22
+ Standard spelling should appear first, followed by the variant spelling(s) in parentheses at the first
23
+ usage. In addition, if the original Cyrillic-script spelling is known, that spelling should also
24
+ appear in parentheses following the name, if possible, following best practices of the issuing
25
+ organization and taking into consideration information system capabilities. For example: Dobri
26
+ Hristov (also seen as Dobri Khristov, Добри Христов). This convention is designed to ensure
27
+ that vital forensic information is not lost.
28
+
29
+ For names of persons who are known to not be part of the Bulgarian-speaking community, use
30
+ the relevant IC transliteration standard for names from that language (e.g., Yitzhak). A
31
+ translator’s note may be used to clarify the known origin of the person. Spell names of
32
+ individuals from languages that are written in Roman letters as they are spelled in those
33
+ languages (e.g., George Clooney, Jorge Garcia, Georges Pompidou).
34
+
35
+ In the case of active senior government officials in the on-line CIA World Factbook and the online directory of Chiefs of State and Cabinet Members of Foreign Governments, the spellings
36
+ given in these on-line reference works should be used in place of the IC Standard. For any
37
+ individual who has at one time been listed in the Factbook or Chiefs of State directory but who no
38
+ longer appears in those resources (i.e. is no longer a government official), the IC Standard
39
+ spelling should appear first, with the spelling, if known, as it previously appeared in those
40
+ resources listed within parentheses at the first usage.
41
+
42
+ The primary goal is to produce a consistent Romanized transcription of names that is specifically
43
+ readable to the English-speaking non-specialist. The system uses the 26 letters of the standard
44
+ (English) Roman alphabet. Some ambiguities in the Romanized form will occur without the use
45
+ of diacritics. However, within the context of a report, where additional information about the
46
+ individual is provided, the referent will be clearly identified. This system will be used in
47
+ conjunction with on-line tools, name dictionaries, and lists containing conventional spellings of
48
+ names of well-known individuals.
49
+
50
+ notes:
51
+ - Transliterate double digraphs as a single digraph i.e. шш -> sh, not shsh
52
+ - In the Roman, no distinction is made between digraphs such as 'sh' and single contiguous letters (e.g. 's' followed by 'h').
53
+
54
+ tests:
55
+
56
+ - source: Добри Христов
57
+ expected: Dobri Khristov
58
+ - source: болгарица
59
+ expected: bolgaritsa
60
+ - source: български език
61
+ expected: balgarski ezik
62
+ - source: българска азбука
63
+ expected: balgarska azbuka
64
+ - source: градъ
65
+ expected: grad
66
+ - source: аз държа
67
+ expected: az darzha
68
+ - source: Ядеш хляба с чубрица
69
+ expected: Yadesh khlyaba s chubritsa
70
+
71
+
72
+ # note[1]
73
+ - source: шш
74
+ expected: sh
75
+ - source: ччччч
76
+ expected: ch
77
+
78
+ map:
79
+ inherit: bgnpcgn-bul-Cyrl-Latn-2013
80
+
81
+ rules:
82
+ # note[1]
83
+ - pattern: "(.)\\1{1,}"
84
+ result: "\\1"
85
+
86
+ - pattern: \u042C# # Ь
87
+ result: "Y"
88
+
89
+ - pattern: \u042A # Ъ
90
+ result: "A"
91
+
92
+ - pattern: \u044C # ь
93
+ result: "y"
94
+
95
+ - pattern: \u044A # ъ
96
+ result: "a"
@@ -4,7 +4,7 @@ id: 2015
4
4
  language: kat
5
5
  source_script: Geor
6
6
  destination_script: Latn
7
- name: Office of the Director Of National Intelligence Georgian Personal Names 2015
7
+ name: Office of the Director Of National Intelligence Georgian Personal Names 2015, ICS 630-01 Annex E
8
8
  # url:
9
9
  source: ICS 630-01, Annex E
10
10
  creation_date: 2015
@@ -0,0 +1,77 @@
1
+ ---
2
+ authority_id: odni
3
+ id: 2015
4
+ language: rus
5
+ source_script: Cyrl
6
+ destination_script: Latn
7
+ name: Office of the Director Of National Intelligence Russian Personal Names 2015, ICS-630-01 Annex K
8
+ # url:
9
+ source: ICS-630-01 Annex K
10
+ creation_date: 2015
11
+ confirmation_date: 2015
12
+ description: |
13
+ This system, adapted from the Board of Geographic Names (BGN) Romanization system for Russian
14
+ (1947), is the Intelligence Community (IC) standard for the transliteration of Russian names that will be
15
+ applied to all final written reports and products for IC consumers. It is not intended to eliminate variations
16
+ of a name that can contribute forensic information. Rather, it is to provide an IC standard Romanized
17
+ (English) transliteration from Russian that can then be linked to forensic information in ways that will
18
+ help identify the referent of the name.
19
+
20
+ In cases where an individual’s name has already been transliterated in a variant spelling, the IC Standard
21
+ spelling should appear first, followed by the variant spelling(s) in parentheses at the first usage. E.g.,
22
+ Sergey Nikolayevich Tyurin (Serguei Nicolaivitch Tiourine). In addition, if the original Cyrillic spelling
23
+ is known, that spelling should also appear in parentheses following the name, if possible, following best
24
+ practices of the issuing organization and taking into consideration information system capabilities. This
25
+ convention is designed to ensure that vital forensic information is not lost.
26
+
27
+ For non-Russian names, use the relevant IC transliteration standard for names from that language. A
28
+ translator’s note may be used to clarify the known origin of the person. Spell names of individuals from
29
+ languages that are written in Roman letters as they are spelled in those languages (e.g., George Clooney,
30
+ Jorge Garcia, Georges Pompidou).
31
+
32
+ In the case of active senior government officials in the on-line CIA World Factbook and the on-line
33
+ directory of Chiefs of State and Cabinet Members of Foreign Governments, the spellings given in these
34
+ on-line reference works should be used in place of the IC Standard. For any individual who has at one
35
+ time been listed in the Factbook or Chiefs of State directory but who no longer appears in those resources
36
+ (i.e. is no longer a government official), the IC Standard spelling should appear first, with the spelling, if
37
+ known, as it previously appeared in those resources listed within parentheses at the first usage.
38
+
39
+ The primary goal is to produce a consistent Romanized transcription of names that is specifically readable
40
+ to the English-speaking non-specialist. The system uses the 26 letters of the standard (English) Roman
41
+ alphabet. Some ambiguities in the Romanized form will occur without the use of diacritics. However,
42
+ within the context of a report, where additional information about the individual is provided, the referent
43
+ will be clearly identified. This system will be used in conjunction with on-line tools, name dictionaries,
44
+ and lists containing conventional spellings of names of well-known individuals.
45
+
46
+ notes:
47
+
48
+ tests:
49
+ - source: Ирина Ивановна Никитина
50
+ expected: Irina Ivanovna Nikitina
51
+ - source: Николай Римский-Корсаков
52
+ expected: Nikolay Rimskiy-Korsakov
53
+ - source: Михаил Тимофеевич Калашников
54
+ expected: Mikhail Timofeyevich Kalashnikov
55
+ - source: Корж Василий Захарович
56
+ expected: Korzh Vasiliy Zakharovich
57
+ - source: Циолковский Константин Эдуардович
58
+ expected: Tsiolkovskiy Konstantin Eduardovich
59
+ - source: Лобачевский Николай Иванович
60
+ expected: Lobachevskiy Nikolay Ivanovich
61
+ - source: Пушкин Александр Сергеевич
62
+ expected: Pushkin Aleksandr Sergeyevich
63
+ - source: Гоголь Николай Васильевич
64
+ expected: Gogol Nikolay Vasilyevich
65
+ - source: Ломоносов Михаил Васильевич
66
+ expected: Lomonosov Mikhail Vasilyevich
67
+
68
+ map:
69
+ inherit: bgnpcgn-rus-Cyrl-Latn-1947
70
+
71
+ characters:
72
+ '\u042a': '' # Ъ
73
+ '\u042c': '' # Ь
74
+
75
+
76
+ '\u044a': '' # ъ
77
+ '\u044c': '' # ь