myanmar-tools 0.1.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7fe5046c36d1fa57bb056fdd103f13c693342990e5df99b25b47a4e2330fd2ee
4
- data.tar.gz: d73e7f38c95b093c74c185353040ee02436e9afd86e5ed2c21bd548586e5151c
3
+ metadata.gz: 0dbc0284872a618fddefb0d2a887f6fc5b94d4c90a7848ce0f5c72f00a6066da
4
+ data.tar.gz: 38b2ae3dd0df9a6225401675acf2288ed7690c11d597366e5666cc8b884a61e9
5
5
  SHA512:
6
- metadata.gz: 10a8cc29a764b8c95eb14b4ca497a16ba4e54577c58860785f87ec019e2bc968f8ac41cedd8c88fd303429da22c8ee3f500af2240d635d273ee5e52a75f40738
7
- data.tar.gz: df22cc42616d9e2ce6499f74fbe9e5bdb9f184c01be85d2ec3702ab7ba3a5716a6c822c0da7f7d55cd912a15a40f0f6e9f407a0eb57d7156cb66a5a108a6ad89
6
+ metadata.gz: 652287b770fba31364a0e2bc82f2d88e36c08fadcf91a4c1137609dccb674d19492f201e9cb31dd813c27fc64a26b9ae36d976621fe6f37996c65c47bc0dda02
7
+ data.tar.gz: 2766900bf91ff26e26b8b3235895e15ecd5b426023c7d43f91bca932c53fa1f502de8abd1d20e0231b0101560461fb53a13ff8e92689fd481583eb767802d457
data/.gitignore CHANGED
@@ -6,5 +6,6 @@
6
6
  /pkg/
7
7
  /spec/reports/
8
8
  /tmp/
9
+ /vendor/
9
10
  *.gem
10
11
  *.lock
data/README.md CHANGED
@@ -10,6 +10,8 @@ Add the dependency to your project:
10
10
 
11
11
  ```bash
12
12
  $ gem install myanmar-tools
13
+ or
14
+ $ bundle add myanmar-tools
13
15
  ```
14
16
 
15
17
  To detect Zawgyi, create an instance of ZawgyiDetector, and call `get_zawgyi_probability` with your string.
@@ -21,10 +23,28 @@ score = detector.get_zawgyi_probability('အျပည္ျပည္ဆို
21
23
  # score is 1.0 (The input is definitely Zawgyi)
22
24
  ```
23
25
 
26
+ To convert from Zawgyi to Unicode, you can use [the Ruby wrapper over icu4c](https://github.com/fantasticfears/icu4r). Install it like this:
27
+
28
+ ```bash
29
+ $ gem install icu
30
+ or
31
+ $ bundle add icu
32
+ ```
33
+
34
+ Then convert from Z to U like this:
35
+
36
+ ```ruby
37
+ require 'icu'
38
+ converter = ICU::Transliterator.new "Zawgyi-my"
39
+ output = converter.transliterate("မ္း")
40
+ # output is now "မ်း"
41
+ ```
42
+
24
43
  For a complete working example, see [samples/ruby/demo.rb](../../samples/ruby/demo.rb).
44
+
25
45
  ## Contributing
26
46
 
27
- Bug reports and pull requests are welcome on GitHub at https://github.com/googlei18n/myanmar-tools . This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
47
+ Bug reports and pull requests are welcome on GitHub at https://github.com/google/myanmar-tools . This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
28
48
 
29
49
  ## License
30
50
 
@@ -18,7 +18,7 @@ require 'myanmar-tools/zawgyi_unicode_markov_model'
18
18
 
19
19
  module MyanmarTools
20
20
  # Uses a machine learning model to determine whether a string of text is Zawgyi or Unicode.
21
- # For more details and documentation, see https://github.com/googlei18n/myanmar-tools
21
+ # For more details and documentation, see https://github.com/google/myanmar-tools
22
22
  class ZawgyiDetector
23
23
  def initialize
24
24
  @model = ZawgyiUnicodeMarkovModel.new(stream_markov_model)
@@ -54,4 +54,4 @@ class ZawgyiDetector
54
54
  end
55
55
  end
56
56
  end
57
- end
57
+ end
@@ -1,42 +1,50 @@
1
1
  -Infinity all ascii
2
- 2.938530756915008E-11 ascii mixed with စင်္ကာပူမှာ unicode
3
- 1.0 ascii mixed with ဘဂၤလားေဒရွ္႕ zawgyi
2
+ 1.7578816048593336E-11 ascii mixed with စင်္ကာပူမှာ unicode
3
+ 0.9999999999720051 ascii mixed with ဘဂၤလားေဒရွ္႕ zawgyi
4
4
  -Infinity digits ၅၂၀
5
- 0.7892929982998882 လူခပ်သိမ်း၏
6
- 3.6119816361186886E-12 မျိုးရိုးဂုဏ်သိက္ခာနှင့်တကွ
7
- 0.4663329700445173 လူတို
8
- 4.9665655888566266E-6 င်း
9
- 0.23084563016290943 အညီ
10
- 0.0014668938180250133 အမျှခံ
11
- 3.5209271742118754E-16 စားခွင့်ရှိသည့်
12
- 2.3390629414495122E-15 အခွင့်အရေးများကို
13
- 1.5319184949968255E-21 အသိအမှတ်ပြုခြင်းသည်
14
- 0.058716887399297764 လူခပ်
15
- 0.07438825572428456 သိမ်း
16
- 0.9998813892221834 ၏လွ
17
- 3.8690513464199966E-12 တ်လပ်မှု၊
18
- 0.15418652297321492
19
- 0.646170406728457 ရား
20
- 4.1284781607674815E-10 မျှတမှု၊
21
- 5.985589691142374E-13 ငြိမ်းချမ်းမှုတို့၏
22
- 2.897899546774602E-42 အ ခြေခံ အုတ် မြစ်ဖြစ်သော ကြောင့် လည်းကော င်း၊
23
- 1.0 လူ႔အခြင့္ေရး
24
- 0.9999946000796034 မ်ားကို
25
- 0.9999970383777537 အေရးမ
26
- 0.9977308862728237 ထား မထီေ
27
- 0.9997356775204073 လး
28
- 0.7807064594381038 စား
29
- 0.9999999999926983 ျပဳျခ
5
+ 0.26211359963003 လူခပ်သိမ်း၏
6
+ 5.31323436438016E-17 မျိုးရိုးဂုဏ်သိက္ခာနှင့်တကွ
7
+ 0.4889728131312674 လူတို
8
+ 1.6642496914651517E-5 င်း
9
+ 0.11237116076150185 အညီ
10
+ 6.707728524305754E-4 အမျှခံ
11
+ 7.574188856564602E-16 စားခွင့်ရှိသည့်
12
+ 4.638074119608733E-16 အခွင့်အရေးများကို
13
+ 1.5487280820343697E-21 အသိအမှတ်ပြုခြင်းသည်
14
+ 0.016794132891985412 လူခပ်
15
+ 0.14479549474886305 သိမ်း
16
+ 0.9934952187669185 ၏လွ
17
+ 1.0003119485747386E-12 တ်လပ်မှု၊
18
+ 0.1341121039650929
19
+ 0.688567924875057 ရား
20
+ 3.606036708928397E-11 မျှတမှု၊
21
+ 3.2639839874100116E-13 ငြိမ်းချမ်းမှုတို့၏
22
+ 1.7756935949309309E-43 အ ခြေခံ အုတ် မြစ်ဖြစ်သော ကြောင့် လည်းကော င်း၊
23
+ 0.9999999999999065 လူ႔အခြင့္ေရး
24
+ 0.999992967794266 မ်ားကို
25
+ 0.9988885380499896 အေရးမ
26
+ 0.9161983282020472 ထား မထီေ
27
+ 0.9186801971030496 လး
28
+ 0.785076109857006 စား
29
+ 0.9999999994387101 ျပဳျခ
30
30
  0.9999999999999998 င္းသည္ လူခ
31
- 0.9999622852008538 ပ္
32
- 0.9998249697467181 သိမ္း
33
- 0.9964205019608555
34
- 0.9525739114028678
35
- 0.9999995405990166 က်င့္သိ
31
+ 0.9995136549000326 ပ္
32
+ 0.9997519059251643 သိမ္း
33
+ 0.9840808937738904
34
+ 0.4867781535978122
35
+ 0.999997345195692 က်င့္သိ
36
36
  1.0 ကၡာကို ခ်ိဳးေဖာက္
37
37
  1.0 ဖ်က္ဆီးတတ္သည့္ ရက္စက္ၾကမ္းၾကဳတ္ေသာ
38
- 0.9999999999933542 အျပဳအမွု
38
+ 0.99999994404868 အျပဳအမွု
39
39
  1.0 မ်ားကို ျဖစ္ေပၚေ
40
- 0.6167785052019471
40
+ 0.18826884503304805
41
41
  1.0 ခဲ့ေသာေၾကာင့္
42
42
  1.0 လည္းေကာင္း၊
43
+ 1.7562339552675883E-12 ထၢမ်တွပ်ႇ (shan language)
44
+ 2.0362467174145114E-13 ၶူးႁိုၼ် (shan language)
45
+ 1.825753617221113E-15 ဂလာန်ညးဒါန်လိက် (mon language)
46
+ 7.802298132172665E-15 ဍုၚ်မွဲမွဲ (mon language)
47
+ 3.859871377455363E-6 ဝိနယပိဋကေ (pali language)
48
+ 8.004254357250203E-4 ပါရာဇိကပါဠိ (pali language)
49
+ 1.3507380767721152E-12 တၢ်ကစီၣ် (karen language)
50
+ 1.0297557056138798E-21 တၢ်ထံၣ်တၢ်ပၥ်သး (karen language)
@@ -1,3 +1,3 @@
1
1
  module MyanmarTools
2
- VERSION = '0.1.1'
2
+ VERSION = '1.2.0'
3
3
  end
@@ -30,9 +30,6 @@ class ZawgyiUnicodeMarkovModel
30
30
  # Magic number used to identify this object in byte streams. (Reads in ASCII as "UZMODEL ")
31
31
  BINARY_TAG = "555a4d4f44454c20"
32
32
 
33
- # Current serial format version number, used in association with the magic number.
34
- BINARY_VERSION = 1
35
-
36
33
  # Standard Myanmar code point range before digits
37
34
  STD_CP0 = 0x1000
38
35
  STD_CP1 = 0x103F
@@ -61,7 +58,7 @@ class ZawgyiUnicodeMarkovModel
61
58
  SPC_OFFSET = EXB_OFFSET + EXB_CP1 - EXB_CP0 + 1
62
59
  NUM_STATES = SPC_OFFSET + SPC_CP1 - SPC_CP0 + 1
63
60
 
64
- private_constant :BINARY_TAG, :BINARY_VERSION, :STD_CP0, :STD_CP1, :AFT_CP0, :AFT_CP1,
61
+ private_constant :BINARY_TAG, :STD_CP0, :STD_CP1, :AFT_CP0, :AFT_CP1,
65
62
  :EXA_CP0, :EXA_CP1, :EXB_CP0, :EXB_CP1, :SPC_CP0, :SPC_CP1, :STD_OFFSET,
66
63
  :AFT_OFFSET, :EXA_OFFSET, :EXB_OFFSET, :SPC_OFFSET, :NUM_STATES
67
64
 
@@ -74,8 +71,16 @@ class ZawgyiUnicodeMarkovModel
74
71
  end
75
72
 
76
73
  binary_version = stream.read(4).unpack('H*')[0].to_i
77
- if binary_version != BINARY_VERSION
78
- raise "Unexpected serial version number: expected #{BINARY_VERSION} but got #{binary_version}"
74
+ if binary_version == 1
75
+ @ssv = 0
76
+ elsif binary_version == 2
77
+ # TODO: Support nonzero SSV if needed in the future
78
+ @ssv = stream.read(4).unpack('H*')[0].to_i
79
+ if @ssv != 0
80
+ raise "Unsupported ssv: #{@ssv}"
81
+ end
82
+ else
83
+ raise "Unexpected serial version number: expected 1 or 2 but got #{binary_version}"
79
84
  end
80
85
 
81
86
  @classifier = BinaryMarkov.new(stream)
@@ -11,7 +11,7 @@ Gem::Specification.new do |spec|
11
11
 
12
12
  spec.summary = 'Tools for handling the Zawgyi font encoding in Myanmar.'
13
13
  spec.description = 'Tools for handling the Zawgyi font encoding in Myanmar.'
14
- spec.homepage = 'https://github.com/googlei18n/myanmar-tools'
14
+ spec.homepage = 'https://github.com/google/myanmar-tools'
15
15
  spec.required_ruby_version = '>= 2.0.0'
16
16
  spec.license = 'Apache-2.0'
17
17
 
@@ -23,7 +23,7 @@ Gem::Specification.new do |spec|
23
23
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
24
24
  spec.require_paths = ['lib']
25
25
 
26
- spec.add_development_dependency 'bundler', '~> 1.0'
26
+ spec.add_development_dependency 'bundler', '>= 1.0'
27
27
  spec.add_development_dependency 'rake', '~> 10.0'
28
28
  spec.add_development_dependency 'minitest', '~> 3.0'
29
29
  end
@@ -58,17 +58,17 @@ class ZawgyiDetectorTest < Test::Unit::TestCase
58
58
  cases = []
59
59
  # STRINGS IDENTICAL IN UNICODE/ZAWGYI
60
60
  cases << {cp: "အသံကို အစားထိုးလိုပါသလား။", score: 0.995}
61
- cases << {cp: "နမူနာ", score: 0.90}
62
- cases << {cp: " ဦး", score: 0.995}
61
+ cases << {cp: "နမူနာ", score: 0.26}
62
+ cases << {cp: " ဦး", score: 0.35}
63
63
 
64
64
  # UNICODE STRINGS WITH HIGH ZAWGYI SCORES
65
65
  cases << {cp: "အစားထိုး အထူးအက္ခရာ", score: 0.995}
66
66
  cases << {cp: "ယခု မိုးရွာနေပါသလား။", score: 0.995}
67
- cases << {cp: "အခြား", score: 0.63}
67
+ cases << {cp: "အခြား", score: 0.74}
68
68
 
69
69
  # DIFFICULT STRINGS THAT DETECT CORRECTLY
70
70
  cases << {cp: "ကာမစာအုပ္မ်ား(ေစာက္ပတ္စာအုပ္မ်ား)", score: 1.0}
71
- cases << {cp: "ညႇပ္စရာမလို", score: 1.0}
71
+ cases << {cp: "ညႇပ္စရာမလို", score: 0.82}
72
72
 
73
73
  cases.each{|c|
74
74
  result = @detector.get_zawgyi_probability(c[:cp])
metadata CHANGED
@@ -1,27 +1,27 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: myanmar-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Aung Kyaw Phyo
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-03-19 00:00:00.000000000 Z
11
+ date: 2020-07-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
19
  version: '1.0'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.0'
27
27
  - !ruby/object:Gem::Dependency
@@ -75,7 +75,7 @@ files:
75
75
  - lib/myanmar-tools/zawgyi_unicode_markov_model.rb
76
76
  - myanmar-tools.gemspec
77
77
  - test/zawgyidetector_test.rb
78
- homepage: https://github.com/googlei18n/myanmar-tools
78
+ homepage: https://github.com/google/myanmar-tools
79
79
  licenses:
80
80
  - Apache-2.0
81
81
  metadata: {}
@@ -95,7 +95,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
95
95
  version: '0'
96
96
  requirements: []
97
97
  rubyforge_project:
98
- rubygems_version: 2.7.3
98
+ rubygems_version: 2.7.6.2
99
99
  signing_key:
100
100
  specification_version: 4
101
101
  summary: Tools for handling the Zawgyi font encoding in Myanmar.