interscript 0.1.4 → 2.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (183) hide show
  1. checksums.yaml +4 -4
  2. data/.gitignore +11 -0
  3. data/.rspec +3 -0
  4. data/Gemfile +29 -0
  5. data/LICENSE.adoc +31 -0
  6. data/README.md +3 -0
  7. data/Rakefile +53 -0
  8. data/bin/console +14 -0
  9. data/bin/interscript +3 -39
  10. data/bin/maps_analyze_staging +168 -0
  11. data/bin/maps_debug_compilers +58 -0
  12. data/bin/maps_debug_ordering +88 -0
  13. data/bin/maps_debug_ruby_compile +24 -0
  14. data/bin/maps_debug_step_by_step +44 -0
  15. data/bin/maps_optimize_order +112 -0
  16. data/bin/maps_v1_analyze_regexps +45 -0
  17. data/bin/maps_v1_to_v2 +426 -0
  18. data/exe/interscript +6 -0
  19. data/interscript.gemspec +31 -0
  20. data/lib/interscript.rb +76 -128
  21. data/lib/interscript/command.rb +6 -5
  22. data/lib/interscript/compiler.rb +22 -0
  23. data/lib/interscript/compiler/javascript.rb +292 -0
  24. data/lib/interscript/compiler/ruby.rb +262 -0
  25. data/lib/interscript/dsl.rb +67 -0
  26. data/lib/interscript/dsl/aliases.rb +23 -0
  27. data/lib/interscript/dsl/document.rb +46 -0
  28. data/lib/interscript/dsl/group.rb +45 -0
  29. data/lib/interscript/dsl/group/parallel.rb +6 -0
  30. data/lib/interscript/dsl/items.rb +89 -0
  31. data/lib/interscript/dsl/metadata.rb +26 -0
  32. data/lib/interscript/dsl/stage.rb +6 -0
  33. data/lib/interscript/dsl/symbol_mm.rb +11 -0
  34. data/lib/interscript/dsl/tests.rb +12 -0
  35. data/lib/interscript/interpreter.rb +251 -0
  36. data/lib/interscript/node.rb +25 -0
  37. data/lib/interscript/node/alias_def.rb +15 -0
  38. data/lib/interscript/node/dependency.rb +13 -0
  39. data/lib/interscript/node/document.rb +45 -0
  40. data/lib/interscript/node/group.rb +34 -0
  41. data/lib/interscript/node/group/parallel.rb +9 -0
  42. data/lib/interscript/node/group/sequential.rb +2 -0
  43. data/lib/interscript/node/item.rb +52 -0
  44. data/lib/interscript/node/item/alias.rb +42 -0
  45. data/lib/interscript/node/item/any.rb +61 -0
  46. data/lib/interscript/node/item/capture.rb +50 -0
  47. data/lib/interscript/node/item/group.rb +51 -0
  48. data/lib/interscript/node/item/repeat.rb +40 -0
  49. data/lib/interscript/node/item/stage.rb +23 -0
  50. data/lib/interscript/node/item/string.rb +51 -0
  51. data/lib/interscript/node/metadata.rb +18 -0
  52. data/lib/interscript/node/rule.rb +6 -0
  53. data/lib/interscript/node/rule/funcall.rb +18 -0
  54. data/lib/interscript/node/rule/run.rb +15 -0
  55. data/lib/interscript/node/rule/sub.rb +65 -0
  56. data/lib/interscript/node/stage.rb +19 -0
  57. data/lib/interscript/node/tests.rb +15 -0
  58. data/lib/interscript/stdlib.rb +211 -0
  59. data/lib/interscript/utils/regexp_converter.rb +283 -0
  60. data/lib/interscript/version.rb +1 -1
  61. data/requirements.txt +1 -0
  62. metadata +73 -223
  63. data/README.adoc +0 -297
  64. data/bin/rspec +0 -29
  65. data/lib/g2pwrapper.py +0 -34
  66. data/lib/interscript/mapping.rb +0 -125
  67. data/lib/model-7 +0 -0
  68. data/lib/tha-pt-b-7 +0 -0
  69. data/maps/acadsin-zho-Hani-Latn-2002.yaml +0 -38912
  70. data/maps/alalc-aze-Cyrl-Latn-1997.yaml +0 -141
  71. data/maps/alalc-bel-cyrl-latn-1997.yaml +0 -125
  72. data/maps/alalc-ben-Beng-Latn-2017.yaml +0 -130
  73. data/maps/alalc-bul-Cyrl-Latn-1997.yaml +0 -94
  74. data/maps/alalc-ell-Grek-Latn-1997.yaml +0 -625
  75. data/maps/alalc-ell-Grek-Latn-2010.yaml +0 -628
  76. data/maps/alalc-kat-Geok-Latn-1997.yaml +0 -112
  77. data/maps/alalc-kat-Geor-Latn-1997.yaml +0 -146
  78. data/maps/alalc-kor-Hang-Latn-1997.yaml +0 -94
  79. data/maps/alalc-mkd-Cyrl-Latn-2013.yaml +0 -103
  80. data/maps/alalc-mkd-cyrl-latn-1997.yaml +0 -114
  81. data/maps/alalc-rus-Cyrl-Latn-1997.yaml +0 -222
  82. data/maps/alalc-rus-Cyrl-Latn-2012.yaml +0 -162
  83. data/maps/alalc-srp-Cyrl-Latn-1997.yaml +0 -114
  84. data/maps/alalc-srp-cyrl-latn-2013.yaml +0 -135
  85. data/maps/alalc-ukr-Cyrl-Latn-1997.yaml +0 -141
  86. data/maps/alalc-ukr-Cyrl-Latn-2011.yaml +0 -16
  87. data/maps/apcbg-bul-Cyrl-Latn-1995.yaml +0 -283
  88. data/maps/bas-rus-Cyrl-Latn-2017-bss.yaml +0 -175
  89. data/maps/bas-rus-Cyrl-Latn-2017-oss.yaml +0 -169
  90. data/maps/bgn-jpn-Hrkt-Latn-1962.yaml +0 -294
  91. data/maps/bgn-kor-Hang-Latn-1943.yaml +0 -31
  92. data/maps/bgn-kor-Kore-Latn-1943.yaml +0 -31
  93. data/maps/bgna-bul-Cyrl-Latn-2006.yaml +0 -208
  94. data/maps/bgna-bul-Cyrl-Latn-2009.yaml +0 -208
  95. data/maps/bgnpcgn-arm-Armn-Latn-1981.yaml +0 -108
  96. data/maps/bgnpcgn-aze-Cyrl-Latn-1993.yaml +0 -104
  97. data/maps/bgnpcgn-bak-Cyrl-Latn-2007.yaml +0 -184
  98. data/maps/bgnpcgn-bel-cyrl-latn-1979.yaml +0 -285
  99. data/maps/bgnpcgn-bul-Cyrl-Latn-1952.yaml +0 -115
  100. data/maps/bgnpcgn-bul-Cyrl-Latn-2013.yaml +0 -38
  101. data/maps/bgnpcgn-chn-Hans-Latn-1979.yaml +0 -7456
  102. data/maps/bgnpcgn-ell-Grek-Latn-1962.yaml +0 -702
  103. data/maps/bgnpcgn-ell-Grek-Latn-1996.yaml +0 -20
  104. data/maps/bgnpcgn-jpn-Hrkt-Latn-1976.yaml +0 -257
  105. data/maps/bgnpcgn-kat-Geor-Latn-1981.yaml +0 -127
  106. data/maps/bgnpcgn-kat-Geor-Latn-2009.yaml +0 -43
  107. data/maps/bgnpcgn-kor-Hang-Latn-kn-1945.yaml +0 -253
  108. data/maps/bgnpcgn-kor-Hang-Latn-rok-2011.yaml +0 -48
  109. data/maps/bgnpcgn-kor-Kore-Latn-rok-2011.yaml +0 -48
  110. data/maps/bgnpcgn-mkd-Cyrl-Latn-1981.yaml +0 -159
  111. data/maps/bgnpcgn-mkd-Cyrl-Latn-2013.yaml +0 -190
  112. data/maps/bgnpcgn-per-Arab-Latn-1956.yaml +0 -93
  113. data/maps/bgnpcgn-rus-Cyrl-Latn-1947.yaml +0 -314
  114. data/maps/bgnpcgn-srp-Cyrl-Latn-2005.yaml +0 -166
  115. data/maps/bgnpcgn-ukr-Cyrl-Latn-1965.yaml +0 -163
  116. data/maps/bgnpcgn-ukr-Cyrl-Latn-2019.yaml +0 -208
  117. data/maps/by-bel-Cyrl-Latn-1998.yaml +0 -168
  118. data/maps/by-bel-Cyrl-Latn-2007.yaml +0 -115
  119. data/maps/elot-ell-Grek-Latn-743-1982-tl.yaml +0 -685
  120. data/maps/elot-ell-Grek-Latn-743-1982-ts.yaml +0 -681
  121. data/maps/elot-ell-Grek-Latn-743-2001-tl.yaml +0 -20
  122. data/maps/elot-ell-Grek-Latn-743-2001-ts.yaml +0 -32
  123. data/maps/ggg-kat-Geor-Latn-2002.yaml +0 -89
  124. data/maps/gki-bel-cyrl-latn-1992.yaml +0 -33
  125. data/maps/gki-bel-cyrl-latn-2000.yaml +0 -201
  126. data/maps/gost-rus-cyrl-latn-16876-71-1983.yaml +0 -186
  127. data/maps/hk-yue-Hani-Latn-1888.yaml +0 -38497
  128. data/maps/icao-bel-Cyrl-Latn-9303.yaml +0 -141
  129. data/maps/icao-bul-Cyrl-Latn-9303.yaml +0 -122
  130. data/maps/icao-heb-Hebr-Latn-9303.yaml +0 -151
  131. data/maps/icao-mkd-Cyrl-Latn-9303.yaml +0 -117
  132. data/maps/icao-per-Arab-Latn-9303.yaml +0 -104
  133. data/maps/icao-rus-Cyrl-Latn-9303.yaml +0 -118
  134. data/maps/icao-srp-Cyrl-Latn-9303.yaml +0 -117
  135. data/maps/icao-ukr-Cyrl-Latn-9303.yaml +0 -120
  136. data/maps/iso-ell-Grek-Latn-843-1997-t1.yaml +0 -610
  137. data/maps/iso-ell-Grek-Latn-843-1997-t2.yaml +0 -41
  138. data/maps/iso-jpn-Hrkt-Latn-3602-1989.yaml +0 -62
  139. data/maps/iso-rus-Cyrl-Latn-9-1995.yaml +0 -272
  140. data/maps/iso-tha-Thai-Latn-11940-1998.yaml +0 -109
  141. data/maps/kp-kor-Hang-Latn-2002.yaml +0 -901
  142. data/maps/lshk-yue-Hani-Latn-jyutping-1993.yaml +0 -44820
  143. data/maps/mext-jpn-Hrkt-Latn-1954.yaml +0 -411
  144. data/maps/moct-kor-Hang-Latn-2000.yaml +0 -803
  145. data/maps/mofa-jpn-Hrkt-Latn-1989.yaml +0 -541
  146. data/maps/mvd-bel-Cyrl-Latn-2008.yaml +0 -225
  147. data/maps/mvd-bel-Cyrl-Latn-2010.yaml +0 -63
  148. data/maps/mvd-rus-Cyrl-Latn-2008.yaml +0 -110
  149. data/maps/mvd-rus-Cyrl-Latn-2010.yaml +0 -37
  150. data/maps/nil-kor-Hang-Hang-jamo.yaml +0 -11193
  151. data/maps/odni-bel-Cyrl-Latn-2015.yaml +0 -148
  152. data/maps/odni-bul-Cyrl-Latn-2015.yaml +0 -96
  153. data/maps/odni-kat-Geor-Latn-2015.yaml +0 -88
  154. data/maps/odni-rus-Cyrl-Latn-2015.yaml +0 -77
  155. data/maps/odni-srp-Cyrl-Latn-2015.yaml +0 -129
  156. data/maps/odni-ukr-Cyrl-Latn-2015.yaml +0 -157
  157. data/maps/odni-uzb-Cyrl-Latn-2015.yaml +0 -167
  158. data/maps/royin-tha-Thai-Latn-1939-generic.yaml +0 -90
  159. data/maps/royin-tha-Thai-Latn-1968.yaml +0 -179
  160. data/maps/royin-tha-Thai-Latn-1999-chained.yaml +0 -180
  161. data/maps/royin-tha-Thai-Latn-1999.yaml +0 -76
  162. data/maps/sac-zho-Hans-Latn-1979.yaml +0 -24759
  163. data/maps/stategeocadastre-ukr-Cyrl-Latn-1993.yaml +0 -222
  164. data/maps/ua-ukr-Cyrl-Latn-1996.yaml +0 -193
  165. data/maps/un-bel-Cyrl-Latn-2007.yaml +0 -114
  166. data/maps/un-ben-Beng-Latn-2016.yaml +0 -534
  167. data/maps/un-ell-Grek-Latn-1987-tl.yaml +0 -32
  168. data/maps/un-ell-Grek-Latn-1987-ts.yaml +0 -20
  169. data/maps/un-ell-Grek-Latn-phonetic-1987.yaml +0 -780
  170. data/maps/un-mon-Mong-Latn-2013.yaml +0 -93
  171. data/maps/un-rus-Cyrl-Latn-1987.yaml +0 -166
  172. data/maps/un-ukr-cyrl-latn-1998.yaml +0 -30
  173. data/maps/var-jpn-Hrkt-Latn-hepburn-1886.yaml +0 -406
  174. data/maps/var-jpn-Hrkt-Latn-hepburn-1954.yaml +0 -386
  175. data/maps/var-kor-Hang-Latn-mr-1939.yaml +0 -1054
  176. data/maps/var-kor-Kore-Hang-2013.yaml +0 -59754
  177. data/maps/var-kor-Kore-Latn-mr-1939.yaml +0 -37
  178. data/maps/var-tha-Thai-Thai-phonemic.yaml +0 -59
  179. data/maps/var-tha-Thai-Zsym-ipa.yaml +0 -301
  180. data/maps/var-zho-Hani-Latn-1979.yaml +0 -38908
  181. data/spec/interscript/mapping_spec.rb +0 -42
  182. data/spec/interscript_spec.rb +0 -26
  183. data/spec/spec_helper.rb +0 -3
@@ -1,3 +1,3 @@
1
1
  module Interscript
2
- VERSION = "0.1.4"
2
+ VERSION = "2.0.5"
3
3
  end
data/requirements.txt ADDED
@@ -0,0 +1 @@
1
+ torch
metadata CHANGED
@@ -1,12 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: interscript
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 2.0.5
5
5
  platform: ruby
6
6
  authors:
7
- - project_contibutors
7
+ - Ribose Inc.
8
8
  autorequire:
9
- bindir: bin
9
+ bindir: exe
10
10
  cert_chain: []
11
11
  date: 2019-11-17 00:00:00.000000000 Z
12
12
  dependencies:
@@ -25,97 +25,13 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: debase
28
+ name: interscript-maps
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - ">="
32
32
  - !ruby/object:Gem::Version
33
33
  version: '0'
34
- type: :development
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - ">="
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: pry
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: pycall
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- - !ruby/object:Gem::Dependency
70
- name: rambling-trie
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '0'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '0'
83
- - !ruby/object:Gem::Dependency
84
- name: rake
85
- requirement: !ruby/object:Gem::Requirement
86
- requirements:
87
- - - ">="
88
- - !ruby/object:Gem::Version
89
- version: '0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - ">="
95
- - !ruby/object:Gem::Version
96
- version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: rspec
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - ">="
102
- - !ruby/object:Gem::Version
103
- version: '0'
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - ">="
109
- - !ruby/object:Gem::Version
110
- version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: ruby-debug-ide
113
- requirement: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - ">="
116
- - !ruby/object:Gem::Version
117
- version: '0'
118
- type: :development
34
+ type: :runtime
119
35
  prerelease: false
120
36
  version_requirements: !ruby/object:Gem::Requirement
121
37
  requirements:
@@ -123,144 +39,80 @@ dependencies:
123
39
  - !ruby/object:Gem::Version
124
40
  version: '0'
125
41
  description: Interoperable script conversion systems
126
- email:
42
+ email:
43
+ - open.source@ribose.com
127
44
  executables:
128
45
  - interscript
129
- - rspec
130
- - setup
131
46
  extensions: []
132
47
  extra_rdoc_files: []
133
48
  files:
134
- - README.adoc
49
+ - ".gitignore"
50
+ - ".rspec"
51
+ - Gemfile
52
+ - LICENSE.adoc
53
+ - README.md
54
+ - Rakefile
55
+ - bin/console
135
56
  - bin/interscript
136
- - bin/rspec
57
+ - bin/maps_analyze_staging
58
+ - bin/maps_debug_compilers
59
+ - bin/maps_debug_ordering
60
+ - bin/maps_debug_ruby_compile
61
+ - bin/maps_debug_step_by_step
62
+ - bin/maps_optimize_order
63
+ - bin/maps_v1_analyze_regexps
64
+ - bin/maps_v1_to_v2
137
65
  - bin/setup
138
- - lib/g2pwrapper.py
66
+ - exe/interscript
67
+ - interscript.gemspec
139
68
  - lib/interscript.rb
140
69
  - lib/interscript/command.rb
141
- - lib/interscript/mapping.rb
70
+ - lib/interscript/compiler.rb
71
+ - lib/interscript/compiler/javascript.rb
72
+ - lib/interscript/compiler/ruby.rb
73
+ - lib/interscript/dsl.rb
74
+ - lib/interscript/dsl/aliases.rb
75
+ - lib/interscript/dsl/document.rb
76
+ - lib/interscript/dsl/group.rb
77
+ - lib/interscript/dsl/group/parallel.rb
78
+ - lib/interscript/dsl/items.rb
79
+ - lib/interscript/dsl/metadata.rb
80
+ - lib/interscript/dsl/stage.rb
81
+ - lib/interscript/dsl/symbol_mm.rb
82
+ - lib/interscript/dsl/tests.rb
83
+ - lib/interscript/interpreter.rb
84
+ - lib/interscript/node.rb
85
+ - lib/interscript/node/alias_def.rb
86
+ - lib/interscript/node/dependency.rb
87
+ - lib/interscript/node/document.rb
88
+ - lib/interscript/node/group.rb
89
+ - lib/interscript/node/group/parallel.rb
90
+ - lib/interscript/node/group/sequential.rb
91
+ - lib/interscript/node/item.rb
92
+ - lib/interscript/node/item/alias.rb
93
+ - lib/interscript/node/item/any.rb
94
+ - lib/interscript/node/item/capture.rb
95
+ - lib/interscript/node/item/group.rb
96
+ - lib/interscript/node/item/repeat.rb
97
+ - lib/interscript/node/item/stage.rb
98
+ - lib/interscript/node/item/string.rb
99
+ - lib/interscript/node/metadata.rb
100
+ - lib/interscript/node/rule.rb
101
+ - lib/interscript/node/rule/funcall.rb
102
+ - lib/interscript/node/rule/run.rb
103
+ - lib/interscript/node/rule/sub.rb
104
+ - lib/interscript/node/stage.rb
105
+ - lib/interscript/node/tests.rb
106
+ - lib/interscript/stdlib.rb
107
+ - lib/interscript/utils/regexp_converter.rb
142
108
  - lib/interscript/version.rb
143
- - lib/model-7
144
- - lib/tha-pt-b-7
145
- - maps/acadsin-zho-Hani-Latn-2002.yaml
146
- - maps/alalc-aze-Cyrl-Latn-1997.yaml
147
- - maps/alalc-bel-cyrl-latn-1997.yaml
148
- - maps/alalc-ben-Beng-Latn-2017.yaml
149
- - maps/alalc-bul-Cyrl-Latn-1997.yaml
150
- - maps/alalc-ell-Grek-Latn-1997.yaml
151
- - maps/alalc-ell-Grek-Latn-2010.yaml
152
- - maps/alalc-kat-Geok-Latn-1997.yaml
153
- - maps/alalc-kat-Geor-Latn-1997.yaml
154
- - maps/alalc-kor-Hang-Latn-1997.yaml
155
- - maps/alalc-mkd-Cyrl-Latn-2013.yaml
156
- - maps/alalc-mkd-cyrl-latn-1997.yaml
157
- - maps/alalc-rus-Cyrl-Latn-1997.yaml
158
- - maps/alalc-rus-Cyrl-Latn-2012.yaml
159
- - maps/alalc-srp-Cyrl-Latn-1997.yaml
160
- - maps/alalc-srp-cyrl-latn-2013.yaml
161
- - maps/alalc-ukr-Cyrl-Latn-1997.yaml
162
- - maps/alalc-ukr-Cyrl-Latn-2011.yaml
163
- - maps/apcbg-bul-Cyrl-Latn-1995.yaml
164
- - maps/bas-rus-Cyrl-Latn-2017-bss.yaml
165
- - maps/bas-rus-Cyrl-Latn-2017-oss.yaml
166
- - maps/bgn-jpn-Hrkt-Latn-1962.yaml
167
- - maps/bgn-kor-Hang-Latn-1943.yaml
168
- - maps/bgn-kor-Kore-Latn-1943.yaml
169
- - maps/bgna-bul-Cyrl-Latn-2006.yaml
170
- - maps/bgna-bul-Cyrl-Latn-2009.yaml
171
- - maps/bgnpcgn-arm-Armn-Latn-1981.yaml
172
- - maps/bgnpcgn-aze-Cyrl-Latn-1993.yaml
173
- - maps/bgnpcgn-bak-Cyrl-Latn-2007.yaml
174
- - maps/bgnpcgn-bel-cyrl-latn-1979.yaml
175
- - maps/bgnpcgn-bul-Cyrl-Latn-1952.yaml
176
- - maps/bgnpcgn-bul-Cyrl-Latn-2013.yaml
177
- - maps/bgnpcgn-chn-Hans-Latn-1979.yaml
178
- - maps/bgnpcgn-ell-Grek-Latn-1962.yaml
179
- - maps/bgnpcgn-ell-Grek-Latn-1996.yaml
180
- - maps/bgnpcgn-jpn-Hrkt-Latn-1976.yaml
181
- - maps/bgnpcgn-kat-Geor-Latn-1981.yaml
182
- - maps/bgnpcgn-kat-Geor-Latn-2009.yaml
183
- - maps/bgnpcgn-kor-Hang-Latn-kn-1945.yaml
184
- - maps/bgnpcgn-kor-Hang-Latn-rok-2011.yaml
185
- - maps/bgnpcgn-kor-Kore-Latn-rok-2011.yaml
186
- - maps/bgnpcgn-mkd-Cyrl-Latn-1981.yaml
187
- - maps/bgnpcgn-mkd-Cyrl-Latn-2013.yaml
188
- - maps/bgnpcgn-per-Arab-Latn-1956.yaml
189
- - maps/bgnpcgn-rus-Cyrl-Latn-1947.yaml
190
- - maps/bgnpcgn-srp-Cyrl-Latn-2005.yaml
191
- - maps/bgnpcgn-ukr-Cyrl-Latn-1965.yaml
192
- - maps/bgnpcgn-ukr-Cyrl-Latn-2019.yaml
193
- - maps/by-bel-Cyrl-Latn-1998.yaml
194
- - maps/by-bel-Cyrl-Latn-2007.yaml
195
- - maps/elot-ell-Grek-Latn-743-1982-tl.yaml
196
- - maps/elot-ell-Grek-Latn-743-1982-ts.yaml
197
- - maps/elot-ell-Grek-Latn-743-2001-tl.yaml
198
- - maps/elot-ell-Grek-Latn-743-2001-ts.yaml
199
- - maps/ggg-kat-Geor-Latn-2002.yaml
200
- - maps/gki-bel-cyrl-latn-1992.yaml
201
- - maps/gki-bel-cyrl-latn-2000.yaml
202
- - maps/gost-rus-cyrl-latn-16876-71-1983.yaml
203
- - maps/hk-yue-Hani-Latn-1888.yaml
204
- - maps/icao-bel-Cyrl-Latn-9303.yaml
205
- - maps/icao-bul-Cyrl-Latn-9303.yaml
206
- - maps/icao-heb-Hebr-Latn-9303.yaml
207
- - maps/icao-mkd-Cyrl-Latn-9303.yaml
208
- - maps/icao-per-Arab-Latn-9303.yaml
209
- - maps/icao-rus-Cyrl-Latn-9303.yaml
210
- - maps/icao-srp-Cyrl-Latn-9303.yaml
211
- - maps/icao-ukr-Cyrl-Latn-9303.yaml
212
- - maps/iso-ell-Grek-Latn-843-1997-t1.yaml
213
- - maps/iso-ell-Grek-Latn-843-1997-t2.yaml
214
- - maps/iso-jpn-Hrkt-Latn-3602-1989.yaml
215
- - maps/iso-rus-Cyrl-Latn-9-1995.yaml
216
- - maps/iso-tha-Thai-Latn-11940-1998.yaml
217
- - maps/kp-kor-Hang-Latn-2002.yaml
218
- - maps/lshk-yue-Hani-Latn-jyutping-1993.yaml
219
- - maps/mext-jpn-Hrkt-Latn-1954.yaml
220
- - maps/moct-kor-Hang-Latn-2000.yaml
221
- - maps/mofa-jpn-Hrkt-Latn-1989.yaml
222
- - maps/mvd-bel-Cyrl-Latn-2008.yaml
223
- - maps/mvd-bel-Cyrl-Latn-2010.yaml
224
- - maps/mvd-rus-Cyrl-Latn-2008.yaml
225
- - maps/mvd-rus-Cyrl-Latn-2010.yaml
226
- - maps/nil-kor-Hang-Hang-jamo.yaml
227
- - maps/odni-bel-Cyrl-Latn-2015.yaml
228
- - maps/odni-bul-Cyrl-Latn-2015.yaml
229
- - maps/odni-kat-Geor-Latn-2015.yaml
230
- - maps/odni-rus-Cyrl-Latn-2015.yaml
231
- - maps/odni-srp-Cyrl-Latn-2015.yaml
232
- - maps/odni-ukr-Cyrl-Latn-2015.yaml
233
- - maps/odni-uzb-Cyrl-Latn-2015.yaml
234
- - maps/royin-tha-Thai-Latn-1939-generic.yaml
235
- - maps/royin-tha-Thai-Latn-1968.yaml
236
- - maps/royin-tha-Thai-Latn-1999-chained.yaml
237
- - maps/royin-tha-Thai-Latn-1999.yaml
238
- - maps/sac-zho-Hans-Latn-1979.yaml
239
- - maps/stategeocadastre-ukr-Cyrl-Latn-1993.yaml
240
- - maps/ua-ukr-Cyrl-Latn-1996.yaml
241
- - maps/un-bel-Cyrl-Latn-2007.yaml
242
- - maps/un-ben-Beng-Latn-2016.yaml
243
- - maps/un-ell-Grek-Latn-1987-tl.yaml
244
- - maps/un-ell-Grek-Latn-1987-ts.yaml
245
- - maps/un-ell-Grek-Latn-phonetic-1987.yaml
246
- - maps/un-mon-Mong-Latn-2013.yaml
247
- - maps/un-rus-Cyrl-Latn-1987.yaml
248
- - maps/un-ukr-cyrl-latn-1998.yaml
249
- - maps/var-jpn-Hrkt-Latn-hepburn-1886.yaml
250
- - maps/var-jpn-Hrkt-Latn-hepburn-1954.yaml
251
- - maps/var-kor-Hang-Latn-mr-1939.yaml
252
- - maps/var-kor-Kore-Hang-2013.yaml
253
- - maps/var-kor-Kore-Latn-mr-1939.yaml
254
- - maps/var-tha-Thai-Thai-phonemic.yaml
255
- - maps/var-tha-Thai-Zsym-ipa.yaml
256
- - maps/var-zho-Hani-Latn-1979.yaml
257
- - spec/interscript/mapping_spec.rb
258
- - spec/interscript_spec.rb
259
- - spec/spec_helper.rb
260
- homepage: ''
109
+ - requirements.txt
110
+ homepage: https://www.interscript.com
261
111
  licenses:
262
- - MIT
263
- metadata: {}
112
+ - BSD-2-Clause
113
+ metadata:
114
+ homepage_uri: https://www.interscript.com
115
+ source_code_uri: https://github.com/interscript/interscript
264
116
  post_install_message:
265
117
  rdoc_options: []
266
118
  require_paths:
@@ -269,18 +121,16 @@ required_ruby_version: !ruby/object:Gem::Requirement
269
121
  requirements:
270
122
  - - ">="
271
123
  - !ruby/object:Gem::Version
272
- version: '0'
124
+ version: 2.3.0
273
125
  required_rubygems_version: !ruby/object:Gem::Requirement
274
126
  requirements:
275
127
  - - ">="
276
128
  - !ruby/object:Gem::Version
277
- version: 2.4.0
129
+ version: '0'
278
130
  requirements: []
279
- rubygems_version: 3.0.3
131
+ rubyforge_project:
132
+ rubygems_version: 2.7.8
280
133
  signing_key:
281
134
  specification_version: 4
282
135
  summary: Interoperable script conversion systems
283
- test_files:
284
- - spec/interscript/mapping_spec.rb
285
- - spec/interscript_spec.rb
286
- - spec/spec_helper.rb
136
+ test_files: []
data/README.adoc DELETED
@@ -1,297 +0,0 @@
1
- = Interscript: Interoperable Script Conversion Systems, with a Ruby implementation
2
-
3
- image:https://github.com/interscript/interscript/workflows/test/badge.svg["Build Status", link="https://github.com/interscript/interscript/actions?workflow=test"]
4
-
5
- == Introduction
6
-
7
- This repository contains interoperable transliteration schemes from:
8
-
9
- * ALA-LC
10
- * BGN/PCGN
11
- * ICAO
12
- * ISO
13
- * UN (by UNGEGN)
14
- * Many, many other script conversion system authorities.
15
-
16
- The goal is to achieve interoperable transliteration schemes allowing quality comparisons.
17
-
18
-
19
-
20
- == Demonstration
21
-
22
- These transliteration systems are used in the demo:
23
-
24
- `bgnpcgn-rus-Cyrl-Latn-1947`:: BGN/PCGN Romanization of Russian
25
- `iso-rus-Cyrl-Latn-iso9`:: ISO 9 Romanization of Russian
26
- `icao-rus-Cyrl-Latn-9303`:: ICAO MRZ Romanization of Russian
27
- `bas-rus-Cyrl-Latn-bss`:: Bulgaria Academy of Science Streamlined System for Russian
28
-
29
- image:demo/20191118-interscript-demo-cast.gif["interscript screencast"]
30
-
31
-
32
- == Installation
33
-
34
- === Prerequisites
35
-
36
- Linux:
37
-
38
- [source,sh]
39
- ----
40
- apt-get install swig python3-setuptools
41
- ----
42
-
43
- Windows:
44
-
45
- [source,sh]
46
- ----
47
- choco install --no-progress swig
48
- ----
49
-
50
- Interscript depends on Python and the https://github.com/sequitur-g2p/sequitur-g2p[`sequitur-g2p`] module
51
-
52
- [source,sh]
53
- ----
54
- pip3 install setuptools numpy
55
- curl -sSL -o sequitur-g2p.zip https://github.com/sequitur-g2p/sequitur-g2p/archive/806273f.zip
56
- pip3 install sequitur-g2p.zip
57
- ----
58
-
59
- Interscript depends on Ruby. Once you manage to install Ruby, it's easy.
60
-
61
- [source,sh]
62
- ----
63
- gem install interscript
64
- ----
65
-
66
- == Usage
67
-
68
- Assume you have a file ready in the source script like this:
69
-
70
- [source,sh]
71
- ----
72
- cat <<EOT > rus-Cyrl.txt
73
- Эх, тройка! птица тройка, кто тебя выдумал? знать, у бойкого народа ты
74
- могла только родиться, в той земле, что не любит шутить, а
75
- ровнем-гладнем разметнулась на полсвета, да и ступай считать версты,
76
- пока не зарябит тебе в очи. И не хитрый, кажись, дорожный снаряд, не
77
- железным схвачен винтом, а наскоро живьём с одним топором да долотом
78
- снарядил и собрал тебя ярославский расторопный мужик. Не в немецких
79
- ботфортах ямщик: борода да рукавицы, и сидит чёрт знает на чём; а
80
- привстал, да замахнулся, да затянул песню — кони вихрем, спицы в
81
- колесах смешались в один гладкий круг, только дрогнула дорога, да
82
- вскрикнул в испуге остановившийся пешеход — и вон она понеслась,
83
- понеслась, понеслась!
84
-
85
- Н.В. Гоголь
86
- EOT
87
- ----
88
-
89
- You can run `interscript` on this text using different transliteration systems.
90
-
91
- [source,sh]
92
- ----
93
- interscript rus-Cyrl.txt \
94
- --system=bgnpcgn-rus-Cyrl-Latn-1947 \
95
- --output=bgnpcgn-rus-Latn.txt
96
-
97
- interscript rus-Cyrl.txt \
98
- --system=iso-rus-Cyrl-Latn-iso9 \
99
- --output=iso-rus-Latn.txt
100
-
101
- interscript rus-Cyrl.txt \
102
- --system=icao-rus-Cyrl-Latn-9303 \
103
- --output=icao-rus-Latn.txt
104
-
105
- interscript rus-Cyrl.txt \
106
- --system=bas-rus-Cyrl-Latn-bss \
107
- --output=bas-rus-Latn.txt
108
- ----
109
-
110
- It is then easy to see the exact differences in rendering between the systems.
111
-
112
- [source,sh]
113
- ----
114
- diff bgnpcgn-rus-Latn.txt bas-rus-Latn.txt
115
- ----
116
-
117
- == Adding transliteration system
118
-
119
- Transliteration systems stored in a `maps/` directory as YAML files.
120
- You can create a new file and add it to the directory.
121
-
122
- The file should be named as `<system-code>.yaml`, where `system-code`
123
- is in accordance with
124
- http://calconnect.gitlab.io/tc-localization/csd-transcription-systems[ISO/CC 24229].
125
-
126
- === File structure
127
-
128
- [source,yaml]
129
- ----
130
- authority_id: bgnpcgn
131
- id: 1947
132
- language: rus
133
- source_script: Cyrl
134
- destination_script: Latn
135
- name: ROMANIZATION OF RUSSIAN, BGN/PCGN 1947 System
136
- url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/807920/ROMANIZATION_OF_RUSSIAN.pdf
137
- creation_date: 1947
138
- confirmation_date: 2019-06
139
- description: The BGN/PCGN system for Russian was adopted ...
140
-
141
- notes:
142
- - The character e should be romanized ye initially, after the vowel ...
143
-
144
- tests:
145
- - source: ДЛИННОЕ ПОКРЫВАЛО
146
- expected: DLINNOYE POKRYVALO
147
- - source: Еловая шишка
148
- expected: Yelovaya shishka
149
-
150
- map:
151
- rules:
152
- - pattern: (?<=[АаЕеЁёИиОоУуЫыЭэЮюЯяЙйЪъЬь])\u0415 # Е after a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь
153
- result: Ye
154
- - pattern: \b\u0415 # Е initially
155
- result: Ye
156
-
157
- characters:
158
- "\u0410": "A"
159
- "\u0411": "B"
160
- "\u0412": "V"
161
- ----
162
-
163
-
164
- === Rules
165
-
166
- The subsection `rules` is placed under the `map` key. All rules are applied in order they are placed before the subsection `characters` applying. Rules apply to an original text, not to a result of previous rules applying.
167
-
168
- Each rule has `pattern` and `result` elements.
169
-
170
- Pattern is a regex expression. It should be representing as a string without `//` or `%r{}` parentheses. For example `\b\u0415`. In case a rule is depend on previous or next content, lookahead or lookbehind could be used. For example a rule with the pattern `(?<=[АаЕеЁёИиОоУуЫыЭэЮюЯяЙйЪъЬь])\u0415` find every Е after upper or lower case symbols a, e, ё, и, о, у, ы, э, ю, я, й, ъ, ь.
171
-
172
- Result is a replacement a for pattern's match. It can contain a string, an Unicode characters specified by a hexadecimal number, a captured group reference. String with hexadecimal number or captured group reference should be double quoted. For example `"Y\u00eb"` or `"\\1\u00b7\\2"`. Captured group are referred by double backslash and group's number.
173
-
174
- Because rules are applied in order, multiple rules applicable to the same segment of a string can be addressed by rule ordering, and rules can be used as priority over characters. For example:
175
-
176
- [source,yaml]
177
- ----
178
- map:
179
- rules:
180
- - pattern: \u03B3\u03B3 # γ (before Γ, Ξ, Χ)
181
- result: ng
182
- - pattern: (?<![Γγ])\u03B3(?=[ΕεέΗηήΙιίΥυύ]) # γ (before front vowels)
183
- result: y
184
- ----
185
-
186
- (γι maps to `yi`; but γγ maps to `ng`. In the case of γγι, the first rule takes priority, and the transliteration is `ngi`: it makes the second rule impossible.)
187
-
188
- [source,yaml]
189
- ----
190
- map:
191
- rules:
192
- - pattern: (?<=\b)\u03BC[πΠ] # μπ (initially)
193
- result: b
194
- - pattern: \u03BC[πΠ] # μπ (medially)
195
- result: mb
196
- ----
197
-
198
- (The first rule applies at the start of a word; the second rule does not specify a context, as it applies in all other cases not covered by the first rule.)
199
-
200
- [source,yaml]
201
- ----
202
- map:
203
- rules:
204
- - pattern: ";"
205
- result: "?"
206
-
207
- characters
208
- "\u00B7": ";
209
- ----
210
-
211
- (This guarantees that any `;` are converted to `?` before any new `;` are introduced; because all three are Latin script, they could be mixed up in ordering.)
212
-
213
- Normally rules "`bleed`" each other: once a rule applies to a segment, that segment cannot trigger other rules, because it is already converted to Roman. Exceptionally, it will be necessary to have a rule add or remove characters in the original script, rather than transliterate them, so that the same context can be invoked by two rules in succession:
214
-
215
- [source,yaml]
216
- ----
217
- map:
218
- rules:
219
- - pattern: (?<=[АаЕеЁёИиОоУуЫыЭэЮюЯя])\u042b # Ы after any vowel character
220
- result: "\u00b7Ы"
221
- - pattern: \u042b(?=[АаУуЫыЭэ]) # Ы before а, у, ы, or э
222
- result: "Ы\u00b7"
223
- ----
224
-
225
- (If the result were `\u00B7Y`, the second rule could not be applied afterwards; but we want ОЫУ to transliterate as `O·Y·U`. In order to make that happen, we preserve the Ы during the rules phase, resulting in О·Ы·У; we only convert the letters to Roman script in the `characters` phase.)
226
-
227
- === Testing transliteration systems
228
-
229
- To test all transliteration systems in the `maps/` directory, run:
230
-
231
- [source,sh]
232
- ----
233
- bundle exec rspec
234
- ----
235
-
236
- The command takes `source` texts from the `test` section, transforms
237
- them using `rules` and `charmaps` from the `map` key, and compares the
238
- results with `expected:` text from the `source:` section.
239
-
240
- To test a specific transliteration system, set the environment variable
241
- `TRANSLIT_SYSTEM` to the system code of the desired system
242
- (i.e. the "`basename`" of the system's YAML file):
243
-
244
- [source,sh]
245
- ----
246
- TRANSLIT_SYSTEM=bgnpcgn-rus-Cyrl-Latn-1947 bundle exec rspec
247
- ----
248
-
249
-
250
- == ISCS system codes
251
-
252
- In accordance with
253
- http://calconnect.gitlab.io/tc-localization/csd-transcription-systems[ISO/CC 24229],
254
- the system code identifying a script conversion system has the following components:
255
-
256
- e.g. `bgnpcgn-rus-Cyrl-Latn-1947`:
257
-
258
- `bgnpcgn`:: the authority identifier
259
- `rus`:: an ISO 639-2 3-letter language code that this system applies to
260
- `Cyrl`:: an ISO 15924 script code, identifying the source script
261
- `Latn`:: an ISO 15924 script code, identifying the target script
262
- `1947`:: an identifier unit within the authority to identify this system
263
-
264
-
265
- == Covered languages
266
-
267
- Currently the schemes cover Cyrillic, Armenian, Greek, Arabic and Hebrew.
268
-
269
-
270
- == Samples to play with
271
-
272
- * `rus-Cyrl-1.txt`: Copied from the XLS output from http://www.primorsk.vybory.izbirkom.ru/region/primorsk?action=show&global=true&root=254017025&tvd=4254017212287&vrn=100100067795849&prver=0&pronetvd=0&region=25&sub_region=25&type=242&vibid=4254017212287
273
-
274
- * `rus-Cyrl-2.txt`: Copied from the XLS output from http://www.yaroslavl.vybory.izbirkom.ru/region/yaroslavl?action=show&root=764013001&tvd=4764013188704&vrn=4764013188693&prver=0&pronetvd=0&region=76&sub_region=76&type=426&vibid=4764013188704
275
-
276
-
277
- == References
278
-
279
- Reference documents are located at the
280
- https://github.com/interscript/interscript-references[interscript-references repository].
281
- Some specifications that have distribution limitations may not be reproduced there.
282
-
283
-
284
- == Links to system definitions
285
-
286
- * https://www.iso.org/committee/48750.html[ISO/TC 46 (see standards published by WG 3)]
287
- * http://geonames.nga.mil/gns/html/romanization.html[BGN/PCGN and BGN Romanization systems (BGN)]
288
- * https://www.gov.uk/government/publications/romanization-systems[BGN/PCGN Romanization systems (PCGN)]
289
- * https://www.loc.gov/catdir/cpso/roman.html[ALA-LC Romanization systems in current use]
290
- * http://catdir.loc.gov/catdir/cpso/roman.html[ALA-LC Romanization systems from 1997]
291
- * http://www.eki.ee/wgrs/[UN Romanization systems]
292
- * http://www.eki.ee/knab/kblatyl2.htm[EKI KNAB systems]
293
-
294
- == Copyright and license
295
-
296
- This is a Ribose project. Copyright Ribose.
297
-