gimchi 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.ko.rdoc CHANGED
@@ -1,5 +1,3 @@
1
- # encoding: UTF-8
2
-
3
1
  = gimchi
4
2
 
5
3
  == 개요
@@ -13,9 +11,14 @@ Gimchi는 한글 스트링을 다롭니다.
13
11
  - 한글을 초성, 중성, 종성으로 분리하고, 이를 다시 합치는 기능
14
12
  - 숫자 표기를 한글 표현으로 변환
15
13
 
14
+ == 설치
15
+ gem install gimchi
16
+
16
17
  == 사용법
17
18
 
18
19
  === Gimchi::Korean 인스턴스의 생성
20
+ require 'gimchi'
21
+
19
22
  ko = Gimchi::Korean.new
20
23
 
21
24
  === 한글 캐릭터 여부 판단
@@ -54,7 +57,6 @@ Gimchi는 한글 스트링을 다롭니다.
54
57
  === 숫자 읽기
55
58
  ko.read_number(1999) # "천 구백 구십 구"
56
59
  ko.read_number(- 100.123) # "마이너스 백점일이삼"
57
- ko.read_number("153,101,202,333.321")
58
60
  ko.read_number("153,191,100,678.3214")
59
61
  # "천 오백 삼십 일억 구천 백 십만 육백 칠십 팔점삼이일사"
60
62
 
@@ -87,7 +89,7 @@ Gimchi는 한글 스트링을 다롭니다.
87
89
  ko.romanize str, :as_pronounced => false
88
90
  # "Dwaet-eo dwaet-eo ije geureon gareuchim-eun dwaet-eo mae-il achim ilgop si samsip bunkkaji uril jogeuman gyosillo mol-aneogo"
89
91
  ko.romanize str, :number => false
90
- # "Dwaesseo dwaesseo ije geureon gareuchimeun dwaesseo mae-il achim 7 si 30 bunkkaji uril jogeuman gyosillo moraneoko"
92
+ # "Dwaesseo dwaesseo ije geureon gareuchimeun dwaesseo mae-il achim 7 si 30 bunkkaji uril jogeuman gyosillo moraneoko"
91
93
 
92
94
  == 구현의 한계
93
95
 
data/README.rdoc CHANGED
@@ -1,5 +1,3 @@
1
- # encoding: UTF-8
2
-
3
1
  = gimchi
4
2
 
5
3
  Gimchi is a simple Ruby gem which knows how to handle Korean strings. It knows
@@ -10,11 +8,16 @@ and how they're written in roman alphabet.
10
8
  Gimchi (only partially) implements the following rules dictated by
11
9
  The National Institute of The Korean Language (http://www.korean.go.kr)
12
10
  * Korean Standard Pronunciation
13
- * Korean romanization
11
+ * Korean Romanization
12
+
13
+ == Installation
14
+ gem install gimchi
14
15
 
15
16
  == Usage
16
17
 
17
18
  === Creating Gimchi::Korean instance
19
+ require 'gimchi'
20
+
18
21
  ko = Gimchi::Korean.new
19
22
 
20
23
  === Checks if the given character is in Korean alphabet
@@ -53,7 +56,6 @@ The National Institute of The Korean Language (http://www.korean.go.kr)
53
56
  === Reading numbers in Korean
54
57
  ko.read_number(1999) # "천 구백 구십 구"
55
58
  ko.read_number(- 100.123) # "마이너스 백점일이삼"
56
- ko.read_number("153,101,202,333.321")
57
59
  ko.read_number("153,191,100,678.3214")
58
60
  # "천 오백 삼십 일억 구천 백 십만 육백 칠십 팔점삼이일사"
59
61
 
@@ -91,13 +93,13 @@ The National Institute of The Korean Language (http://www.korean.go.kr)
91
93
  == Limitation of the implementation
92
94
 
93
95
  Unfortunately in order to implement the complete specification of Korean
94
- pronunciation and romanization, we need NLP, hugh Korean dictionaries and even
96
+ pronunciation and romanization, we need NLP, huge Korean dictionaries and even
95
97
  semantic analysis of the given string. And even with all those complex
96
98
  processing, we cannot guarantee 100% accuracy of the output. So yes, that is
97
99
  definitely not what this gem tries to achieve. Gimchi tries to achieve "some"
98
100
  level of accuracy with relatively simple code.
99
101
 
100
- Currently, Gimchi code containts a lot of ad-hoc (possibly invalid) patches
102
+ Currently, Gimchi code contains a lot of ad-hoc (possibly invalid) patches
101
103
  that try to improve the quality of the output, which should better be
102
104
  refactored anytime soon.
103
105
 
data/config/default.yml CHANGED
@@ -56,7 +56,11 @@ pronouncer:
56
56
  ㅎ:
57
57
  transformation:
58
58
  # changing the order affects the quality of the transformation
59
- sequence:
59
+ sequence for 1:
60
+ - rule_5_1
61
+ - rule_5_3
62
+
63
+ sequence for 2:
60
64
  - rule_16
61
65
  - rule_17
62
66
  - rule_18
@@ -96,7 +100,6 @@ number:
96
100
  digits: ["", 한, 두, 세, 네, 다섯, 여섯, 일곱, 여덟, 아홉]
97
101
  post substitution:
98
102
  물살: 무살
99
- 물시: 무살
100
103
 
101
104
  romanization:
102
105
  chosung:
@@ -143,7 +146,7 @@ romanization:
143
146
  ㅢ: ui
144
147
  jongsung:
145
148
  ㄱ: k
146
- ㄴ: n
149
+ ㄴ: n-
147
150
  ㄷ: t
148
151
  ㄹ: l
149
152
  ㅁ: m
@@ -153,4 +156,14 @@ romanization:
153
156
  # 제2항 [붙임 2]‘ㄹ’은 모음 앞에서는 ‘r’로, 자음 앞이나 어말에서는
154
157
  # ‘l’로 적는다. 단, ‘ㄹㄹ’은 ‘ll’로 적는다.
155
158
  lr: ll
156
-
159
+ kkk: k-kk
160
+ ttt: t-tt
161
+ ppp: p-pp
162
+ "--": "-"
163
+ ? !ruby/regexp /n-([^gaeiou])/
164
+ : "n\\1"
165
+ ? !ruby/regexp /-(\s)/
166
+ : "\\1"
167
+ ? !ruby/regexp /-$/
168
+ : ""
169
+
data/lib/gimchi/char.rb CHANGED
@@ -2,15 +2,20 @@
2
2
 
3
3
  module Gimchi
4
4
  class Korean
5
- # Class representing each Korean character.
6
- # chosung, jungsung and jongsung can be get and set.
5
+ # Class representing each Korean character. Its three components,
6
+ # `chosung', `jungsung' and `jongsung' can be get and set.
7
7
  #
8
- # to_s merges components into a String.
9
- # to_a returns the three components.
8
+ # `to_s' merges components into a String. `to_a' returns the three components.
10
9
  class Char
11
- attr_reader :org
12
- attr_reader :chosung, :jungsung, :jongsung
10
+ # @return [String] Chosung component of this character.
11
+ attr_reader :chosung
12
+ # @return [String] Jungsung component of this character.
13
+ attr_reader :jungsung
14
+ # @return [String] Jongsung component of this character.
15
+ attr_reader :jongsung
13
16
 
17
+ # @param [Gimchi::Korean] kor Gimchi::Korean instance
18
+ # @param [String] kchar Korean character string
14
19
  def initialize kor, kchar
15
20
  raise ArgumentError('Not a korean character') unless kor.korean_char? kchar
16
21
 
@@ -28,16 +33,17 @@ class Korean
28
33
  self.chosung = @kor.chosungs[n1]
29
34
  self.jungsung = @kor.jungsungs[n2]
30
35
  self.jongsung = ([nil] + @kor.jongsungs)[n3]
31
- elsif (@kor.chosungs + @kor.jongsungs).include? kchar
36
+ elsif @kor.chosungs.include? kchar
32
37
  self.chosung = kchar
33
38
  elsif @kor.jungsungs.include? kchar
34
39
  self.jungsung = kchar
40
+ elsif @kor.jongsungs.include? kchar
41
+ self.jongsung = kchar
35
42
  end
36
-
37
- @org = self.dup
38
43
  end
39
44
 
40
- # recombine components into korean character
45
+ # Recombines components into a korean character.
46
+ # @return [String] Combined korean character
41
47
  def to_s
42
48
  if chosung.nil? && jungsung.nil?
43
49
  ""
@@ -52,49 +58,61 @@ class Korean
52
58
  end
53
59
  end
54
60
 
61
+ # Sets the chosung component.
62
+ # @param [String]
55
63
  def chosung= c
56
64
  raise ArgumentError.new('Invalid chosung component') if
57
65
  c && @kor.chosungs.include?(c) == false
58
66
  @chosung = c && c.dup.extend(Component).tap { |e| e.kor = @kor }
59
67
  end
60
68
 
69
+ # Sets the jungsung component
70
+ # @param [String]
61
71
  def jungsung= c
62
72
  raise ArgumentError.new('Invalid jungsung component') if
63
73
  c && @kor.jungsungs.include?(c) == false
64
74
  @jungsung = c && c.dup.extend(Component).tap { |e| e.kor = @kor }
65
75
  end
66
76
 
77
+ # Sets the jongsung component
78
+ #
79
+ # @param [String]
67
80
  def jongsung= c
68
81
  raise ArgumentError.new('Invalid jongsung component') if
69
82
  c && @kor.jongsungs.include?(c) == false
70
83
  @jongsung = c && c.dup.extend(Component).tap { |e| e.kor = @kor }
71
84
  end
72
85
 
73
- # to_a returns the three components.
86
+ # Returns Array of three components.
87
+ #
88
+ # @return [Array] Array of three components
74
89
  def to_a
75
90
  [chosung, jungsung, jongsung]
76
91
  end
77
92
 
78
- # Check if this is a complete Korean character
93
+ # Checks if this is a complete Korean character.
79
94
  def complete?
80
95
  chosung.nil? == false && jungsung.nil? == false
81
96
  end
82
97
 
83
- # Check if this is a non-complete Korean character
98
+ # Checks if this is a non-complete Korean character.
84
99
  # e.g. ㅇ, ㅏ
85
100
  def partial?
86
101
  chosung.nil? || jungsung.nil?
87
102
  end
88
103
 
89
104
  private
90
- # nodoc #
105
+ # Three components of Korean::Char are extended to support #vowel? and #consonant? method.
91
106
  module Component
107
+ # @return [Korean] Hosting Korean instance
92
108
  attr_accessor :kor
93
109
 
110
+ # Is this component a vowel?
94
111
  def vowel?
95
112
  kor.jungsungs.include? self
96
113
  end
97
114
 
115
+ # Is this component a consonant?
98
116
  def consonant?
99
117
  self != 'ㅇ' && kor.chosungs.include?(self)
100
118
  end
@@ -102,3 +120,4 @@ class Korean
102
120
  end#Char
103
121
  end#Korean
104
122
  end#Gimchi
123
+
data/lib/gimchi/korean.rb CHANGED
@@ -5,35 +5,41 @@ class Korean
5
5
  DEFAULT_CONFIG_FILE_PATH =
6
6
  File.dirname(__FILE__) + '/../../config/default.yml'
7
7
 
8
+ # Returns the YAML configuration used by this Korean instance.
9
+ # @return [String]
8
10
  attr_reader :config
9
- attr_accessor :pronouncer
10
11
 
11
12
  # Initialize Gimchi::Korean.
12
- # You can override many part of the implementation with customized config file.
13
+ # @param [String] config_file You can override many parts of the implementation by customizing config file
13
14
  def initialize config_file = DEFAULT_CONFIG_FILE_PATH
14
15
  require 'yaml'
15
16
  @config = YAML.load(File.read config_file)
16
17
  @config.freeze
17
18
 
18
- @pronouncer = Korean::Pronouncer.new(self)
19
+ @pronouncer = Korean::Pronouncer.send :new, self
19
20
  end
20
21
 
21
- # Array of chosung's
22
+ # Array of chosung's.
23
+ #
24
+ # @return [Array] Array of chosung strings
22
25
  def chosungs
23
26
  config['structure']['chosung']
24
27
  end
25
28
 
26
- # Array of jungsung's
29
+ # Array of jungsung's.
30
+ # @return [Array] Array of jungsung strings
27
31
  def jungsungs
28
32
  config['structure']['jungsung']
29
33
  end
30
34
 
31
- # Array of jongsung's
35
+ # Array of jongsung's.
36
+ # @return [Array] Array of jongsung strings
32
37
  def jongsungs
33
38
  config['structure']['jongsung']
34
39
  end
35
40
 
36
- # Checks if the given character is a korean character
41
+ # Checks if the given character is a korean character.
42
+ # @param [String] ch A string of size 1
37
43
  def korean_char? ch
38
44
  raise ArgumentError.new('Lengthy input') if ch.length > 1
39
45
 
@@ -43,6 +49,7 @@ class Korean
43
49
 
44
50
  # Checks if the given character is a "complete" korean character.
45
51
  # "Complete" Korean character must have chosung and jungsung, with optional jongsung.
52
+ # @param [String] ch A string of size 1
46
53
  def complete_korean_char? ch
47
54
  raise ArgumentError.new('Lengthy input') if ch.length > 1
48
55
 
@@ -50,14 +57,18 @@ class Korean
50
57
  ch.unpack('U').all? { | c | c >= 0xAC00 && c <= 0xD7A3 }
51
58
  end
52
59
 
53
- # Splits the given string into an array of Korean::Char's and strings.
60
+ # Splits the given string into an array of Korean::Char's and Strings of length 1.
61
+ # @param [String] str Input string.
62
+ # @return [Array] Mixed array of Korean::Char instances and Strings of length 1 (for non-korean characters)
54
63
  def dissect str
55
64
  str.each_char.map { |c|
56
65
  korean_char?(c) ? Korean::Char.new(self, c) : c
57
66
  }
58
67
  end
59
68
 
60
- # Reads a string with numbers in Korean way.
69
+ # Reads numeric expressions in Korean way.
70
+ # @param [String, Number] str Numeric type or String containing numeric expressions
71
+ # @return [String] Output string
61
72
  def read_number str
62
73
  nconfig = config['number']
63
74
 
@@ -68,10 +79,13 @@ class Korean
68
79
 
69
80
  # Returns the pronunciation of the given string containing Korean characters.
70
81
  # Takes optional options hash.
71
- # - If :pronounce_each_char is true, each character of the string is pronounced respectively.
72
- # - If :slur is true, characters separated by whitespaces are treated as if they were contiguous.
73
- # - If :number is true, numberic parts of the string is also pronounced in Korean.
74
- # - :except array allows you to skip certain transformations.
82
+ #
83
+ # @param [String] Input string
84
+ # @param [Boolean] options[:pronounce_each_char] Each character of the string is pronounced respectively.
85
+ # @param [Boolean] options[:slur] Strings separated by whitespaces are processed again as if they were contiguous.
86
+ # @param [Boolean] options[:number] Numberic parts of the string is also pronounced in Korean.
87
+ # @param [Array] options[:except] Allows you to skip certain transformations.
88
+ # @return [String] Output string
75
89
  def pronounce str, options = {}
76
90
  options = {
77
91
  :pronounce_each_char => false,
@@ -82,46 +96,24 @@ class Korean
82
96
  }.merge options
83
97
 
84
98
  str = read_number(str) if options[:number]
85
- chars = dissect str
86
99
 
87
- transforms = []
88
- idx = -1
89
- while (idx += 1) < chars.length
90
- c = chars[idx]
91
-
92
- next if c.is_a?(Korean::Char) == false
93
-
94
- next_c = chars[idx + 1]
95
- next_kc = (options[:pronounce_each_char] == false &&
96
- next_c.is_a?(Korean::Char) &&
97
- next_c.complete?) ? next_c : nil
98
-
99
- transforms += @pronouncer.transform(c, next_kc, :except => options[:except])
100
-
101
- # Slur (TBD)
102
- if options[:slur] && options[:pronounce_each_char] == false && next_c =~ /\s/
103
- chars[(idx + 1)..-1].each_with_index do | nc, new_idx |
104
- next if nc =~ /\s/
105
-
106
- if nc.is_a?(Korean::Char) && nc.complete?
107
- transforms += @pronouncer.transform(c, nc, :except => options[:except])
108
- end
109
-
110
- idx = idx + 1 + new_idx - 1
111
- break
112
- end
113
- end
114
- end
100
+ result, transforms = @pronouncer.send :pronounce!, str, options
115
101
 
116
102
  if options[:debug]
117
- return chars.join, transforms
103
+ return result, transforms
118
104
  else
119
- chars.join
105
+ return result
120
106
  end
121
107
  end
122
108
 
123
109
  # Returns the romanization (alphabetical notation) of the given Korean string.
124
110
  # http://en.wikipedia.org/wiki/Korean_romanization
111
+ # @param [String] str Input Korean string
112
+ # @param [Boolean] options[:as_pronounced] If true, #pronounce is internally called before romanize
113
+ # @param [Boolean] options[:number] Whether to read numeric expressions in the string
114
+ # @param [Boolean] options[:slur] Same as :slur in #pronounce
115
+ # @return [String] Output string in Roman Alphabet
116
+ # @see Korean#pronounce
125
117
  def romanize str, options = {}
126
118
  options = {
127
119
  :as_pronounced => true,
@@ -142,23 +134,37 @@ class Korean
142
134
  :except => %w[rule_5_3]
143
135
  dash = rdata[0]["ㅇ"]
144
136
  romanization = ""
145
- (chars = str.each_char.to_a).each_with_index do | kc, cidx |
146
- if korean_char? kc
147
- Korean::Char.new(self, kc).to_a.each_with_index do | comp, idx |
137
+
138
+ romanize_chunk = lambda do | chunk |
139
+ dissect(chunk).each do | kc |
140
+ kc.to_a.each_with_index do | comp, idx |
148
141
  next if comp.nil?
149
142
  comp = rdata[idx][comp] || comp
150
143
  comp = comp[1..-1] if comp[0] == dash &&
151
144
  (romanization.empty? || romanization[-1] =~ /\s/ || comp[1] == 'w')
152
145
  romanization += comp
153
146
  end
154
- else
155
- romanization += kc
156
147
  end
148
+
149
+ return post_subs.keys.inject(romanization) { | output, pattern |
150
+ output.gsub(pattern, post_subs[pattern])
151
+ }
157
152
  end
158
153
 
159
- post_subs.keys.inject(romanization) { | output, pattern |
160
- output.gsub(pattern, post_subs[pattern])
161
- }.capitalize
154
+ k_chunk = ""
155
+ str.each_char do | c |
156
+ if korean_char? c
157
+ k_chunk += c
158
+ else
159
+ unless k_chunk.empty?
160
+ romanization = romanize_chunk.call k_chunk
161
+ k_chunk = ""
162
+ end
163
+ romanization += c
164
+ end
165
+ end
166
+ romanization = romanize_chunk.call k_chunk unless k_chunk.empty?
167
+ romanization
162
168
  end
163
169
 
164
170
  private
@@ -2,47 +2,98 @@
2
2
 
3
3
  module Gimchi
4
4
  class Korean
5
- private
5
+ # Private class.
6
6
  # Partial implementation of Korean pronouncement pronunciation rules specified in
7
7
  # http://http://www.korean.go.kr/
8
8
  class Pronouncer
9
- attr_reader :applied
10
-
11
- def initialize(korean)
9
+ private
10
+ def initialize korean
12
11
  @korean = korean
13
12
  @pconfig = korean.config['pronouncer']
14
- @applied = []
15
13
  end
16
14
 
17
- def transform kc, next_kc, options = {}
18
- options = { :except => [] }.merge options
19
- @applied.clear
15
+ def pronounce! str, options = {}
16
+ @sequence = @pconfig['transformation']['sequence for ' +
17
+ (options[:pronounce_each_char] ? '1' : '2')] - options[:except]
20
18
 
21
- # Cannot properly pronounce
22
- return if kc.chosung.nil? && kc.jungsung.nil?
19
+ # Dissecting
20
+ @chars = @korean.dissect str
21
+ @orig_chars = @chars.dup
23
22
 
24
23
  # Padding
25
- kc.chosung = 'ㅇ' if kc.chosung.nil?
26
- kc.jungsung = 'ㅡ' if kc.jungsung.nil?
24
+ @chars.each { |c| pad c }
27
25
 
28
- if next_kc.nil?
29
- rule_single kc, :except => options[:except]
30
- else
31
- not_todo = []
32
- blocking_rule = @pconfig['transformation']['blocking rule']
33
- @pconfig['transformation']['sequence'].each do | rule |
34
- next if not_todo.include?(rule) || options[:except].include?(rule)
35
-
36
- if self.send(rule, kc, next_kc)
37
- @applied << rule
38
- not_todo += blocking_rule[rule] if blocking_rule.has_key?(rule)
39
- end
26
+ # Two-phase processing
27
+ # - For `slur'
28
+ applied = []
29
+ 2.times do | phase |
30
+ @chars = @chars.reject { |c| c =~ /\s/ } if phase == 1
31
+
32
+ # Deep-fried...no copied backup
33
+ @initial_chars = @chars.map { |c| c.dup }
34
+
35
+ # Transform one by one
36
+ applied += (0...@chars.length).inject([]) { | arr, i | arr + transform(i); }
37
+
38
+ # Post-processing (actually just for :pronounce_each_char option)
39
+ @chars.select { |c| c.is_a?(Korean::Char) && c.jongsung }.each do | c |
40
+ c.jongsung = @pconfig['jongsung sound'][c.jongsung]
40
41
  end
42
+
43
+ break unless options[:slur]
41
44
  end
42
- @applied
45
+
46
+ return @orig_chars.join, applied
43
47
  end
44
48
 
45
49
  private
50
+ def transform idx
51
+ @cursor = idx
52
+
53
+ # Not korean
54
+ return [] unless kc.is_a? Korean::Char
55
+
56
+ # Cannot properly pronounce
57
+ return [] if kc.chosung.nil? && kc.jungsung.nil? && kc.jongsung.nil?
58
+
59
+ applied = []
60
+ not_todo = []
61
+ blocking_rule = @pconfig['transformation']['blocking rule']
62
+ @sequence.each do | rule |
63
+ next if not_todo.include?(rule)
64
+
65
+ if self.send(rule,)
66
+ applied << rule
67
+ not_todo += blocking_rule[rule] if blocking_rule.has_key?(rule)
68
+ end
69
+ end
70
+ applied
71
+ end
72
+
73
+ def pad c
74
+ return unless c.is_a? Korean::Char
75
+
76
+ c.chosung = 'ㅇ' if c.chosung.nil?
77
+ c.jungsung = 'ㅡ' if c.jungsung.nil?
78
+ end
79
+
80
+ def kc
81
+ @chars[@cursor]
82
+ end
83
+
84
+ def next_kc
85
+ nkc = @chars[@cursor + 1]
86
+ nkc.is_a?(Korean::Char) ? nkc : nil
87
+ end
88
+
89
+ def kc_org
90
+ @initial_chars[@cursor]
91
+ end
92
+
93
+ def next_kc_org
94
+ @initial_chars[@cursor + 1]
95
+ end
96
+
46
97
  # shortcut
47
98
  def fortis_map
48
99
  @korean.config['structure']['fortis map']
@@ -53,20 +104,10 @@ private
53
104
  @korean.config['structure']['double consonant map']
54
105
  end
55
106
 
56
- def rule_single kc, options = {}
57
- options = {:except => []}.merge options
58
- rule_5_1 kc, nil unless options[:except].include? 'rule_5_1'
59
- rule_5_3 kc, nil unless options[:except].include? 'rule_5_3'
60
-
61
- if kc.jongsung
62
- kc.jongsung = @pconfig['jongsung sound'][kc.jongsung]
63
- end
64
- end
65
-
66
107
  # 제5항: ‘ㅑ ㅒ ㅕ ㅖ ㅘ ㅙ ㅛ ㅝ ㅞ ㅠ ㅢ’는 이중 모음으로 발음한다.
67
108
  # 다만 1. 용언의 활용형에 나타나는 ‘져, 쪄, 쳐’는 [저, 쩌, 처]로 발음한다.
68
109
  # 다만 3. 자음을 첫소리로 가지고 있는 음절의 ‘ㅢ’는 [ㅣ]로 발음한다.
69
- def rule_5_1 kc, next_kc
110
+ def rule_5_1
70
111
  if %w[져 쪄 쳐].include? kc.to_s
71
112
  kc.jungsung = 'ㅓ'
72
113
 
@@ -74,8 +115,8 @@ private
74
115
  end
75
116
  end
76
117
 
77
- def rule_5_3 kc, next_kc
78
- if kc.jungsung == 'ㅢ' && kc.org.chosung.consonant?
118
+ def rule_5_3
119
+ if kc.jungsung == 'ㅢ' && kc_org.chosung.consonant?
79
120
  kc.jungsung = 'ㅣ'
80
121
 
81
122
  true
@@ -84,7 +125,7 @@ private
84
125
 
85
126
  # 제9항: 받침 ‘ㄲ, ㅋ’, ‘ㅅ, ㅆ, ㅈ, ㅊ, ㅌ’, ‘ㅍ’은 어말 또는 자음 앞에서
86
127
  # 각각 대표음 [ㄱ, ㄷ, ㅂ]으로 발음한다.
87
- def rule_9 kc, next_kc
128
+ def rule_9
88
129
  map = {
89
130
  %w[ㄲ ㅋ] => 'ㄱ',
90
131
  %w[ㅅ ㅆ ㅈ ㅊ ㅌ] => 'ㄷ',
@@ -99,7 +140,7 @@ private
99
140
 
100
141
  # 제10항: 겹받침 ‘ㄳ’, ‘ㄵ’, ‘ㄼ, ㄽ, ㄾ’, ‘ㅄ’은 어말 또는 자음 앞에서
101
142
  # 각각 [ㄱ, ㄴ, ㄹ, ㅂ]으로 발음한다.
102
- def rule_10 kc, next_kc
143
+ def rule_10
103
144
  map = {
104
145
  %w[ㄳ] => 'ㄱ',
105
146
  %w[ㄵ] => 'ㄴ',
@@ -110,7 +151,7 @@ private
110
151
  # Exceptions
111
152
  if next_kc && (
112
153
  (kc.to_s == '밟' && next_kc.chosung.consonant?) ||
113
- (kc.to_s == '넓' && next_kc && %w[적 죽 둥].include?(next_kc.org.to_s))) # PATCH
154
+ (kc.to_s == '넓' && next_kc && %w[적 죽 둥].include?(next_kc_org.to_s))) # PATCH
114
155
  kc.jongsung = 'ㅂ'
115
156
  else
116
157
  kc.jongsung = map[ map.keys.find { |e| e.include? kc.jongsung } ]
@@ -121,7 +162,7 @@ private
121
162
  end
122
163
 
123
164
  # 제11항: 겹받침 ‘ㄺ, ㄻ, ㄿ’은 어말 또는 자음 앞에서 각각 [ㄱ, ㅁ, ㅂ]으로 발음한다.
124
- def rule_11 kc, next_kc
165
+ def rule_11
125
166
  map = {
126
167
  'ㄺ' => 'ㄱ',
127
168
  'ㄻ' => 'ㅁ',
@@ -131,7 +172,7 @@ private
131
172
  # 다만, 용언의 어간 말음 ‘ㄺ’은 ‘ㄱ’ 앞에서 [ㄹ]로 발음한다.
132
173
  # - 용언 여부 판단은?: 중성으로 판단 (PATCH)
133
174
  if next_kc && kc.jongsung == 'ㄺ' &&
134
- next_kc.org.chosung == 'ㄱ' &&
175
+ next_kc_org.chosung == 'ㄱ' &&
135
176
  %w[맑 얽 섥 밝 늙 묽 넓].include?(kc.to_s) # PATCH
136
177
  kc.jongsung = 'ㄹ'
137
178
  else
@@ -155,7 +196,7 @@ private
155
196
  # [붙임]‘ㄶ, ㅀ’ 뒤에 ‘ㄴ’이 결합되는 경우에는, ‘ㅎ’을 발음하지 않는다.
156
197
  #
157
198
  # 4. ‘ㅎ(ㄶ, ㅀ)’ 뒤에 모음으로 시작된 어미나 접미사가 결합되는 경우에는, ‘ㅎ’을 발음하지 않는다.
158
- def rule_12 kc, next_kc
199
+ def rule_12
159
200
  return if next_kc.nil?
160
201
 
161
202
  map_12_1 = {
@@ -218,17 +259,18 @@ private
218
259
 
219
260
  # 제13항: 홑받침이나 쌍받침이 모음으로 시작된 조사나 어미, 접미사와
220
261
  # 결합되는 경우에는, 제 음가대로 뒤 음절 첫소리로 옮겨 발음한다.
221
- def rule_13 kc, next_kc
262
+ def rule_13
222
263
  return if kc.jongsung.nil? || kc.jongsung == 'ㅇ' || next_kc.nil? || next_kc.chosung != 'ㅇ'
223
264
  next_kc.chosung = kc.jongsung
224
265
  kc.jongsung = nil
225
266
 
226
267
  true
227
268
  end
269
+
228
270
  # 제14항: 겹받침이 모음으로 시작된 조사나 어미, 접미사와 결합되는 경우에는,
229
271
  # 뒤엣것만을 뒤 음절 첫소리로 옮겨 발음한다.(이 경우, ‘ㅅ’은 된소리로 발음함.)
230
272
  #
231
- def rule_14 kc, next_kc
273
+ def rule_14
232
274
  return if kc.jongsung.nil? || kc.jongsung == 'ㅇ' || next_kc.nil? || next_kc.chosung != 'ㅇ'
233
275
  if consonants = double_consonant_map[kc.jongsung]
234
276
  consonants[1] = 'ㅆ' if consonants[1] == 'ㅅ'
@@ -237,9 +279,10 @@ private
237
279
  true
238
280
  end
239
281
  end
282
+
240
283
  # 제15항: 받침 뒤에 모음 ‘ㅏ, ㅓ, ㅗ, ㅜ, ㅟ’들로 시작되는 __실질 형태소__가 연결되는
241
284
  # 경우에는, 대표음으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.
242
- def rule_15 kc, next_kc
285
+ def rule_15
243
286
  return if kc.jongsung.nil? || kc.jongsung == 'ㅇ' || next_kc.nil? || next_kc.chosung != 'ㅇ'
244
287
 
245
288
  if false && %w[ㅏ ㅓ ㅗ ㅜ ㅟ].include?(next_kc.jungsung) &&
@@ -253,7 +296,7 @@ private
253
296
 
254
297
  # 제16항: 한글 자모의 이름은 그 받침소리를 연음하되, ‘ㄷ, ㅈ, ㅊ, ㅋ, ㅌ,
255
298
  # ㅍ, ㅎ’의 경우에는 특별히 다음과 같이 발음한다.
256
- def rule_16 kc, next_kc
299
+ def rule_16
257
300
  return if next_kc.nil?
258
301
 
259
302
  map = {'디귿' => '디긋',
@@ -278,7 +321,7 @@ private
278
321
  # [ㅈ, ㅊ]으로 바꾸어서 뒤 음절 첫소리로 옮겨 발음한다.
279
322
  #
280
323
  # [붙임] ‘ㄷ’ 뒤에 접미사 ‘히’가 결합되어 ‘티’를 이루는 것은 [치]로 발음한다.
281
- def rule_17 kc, next_kc
324
+ def rule_17
282
325
  return if next_kc.nil? || %w[ㄷ ㅌ ㄾ].include?(kc.jongsung) == false
283
326
 
284
327
  if next_kc.to_s == '이'
@@ -296,7 +339,7 @@ private
296
339
 
297
340
  # 제18항: 받침 ‘ㄱ(ㄲ, ㅋ, ㄳ, ㄺ), ㄷ(ㅅ, ㅆ, ㅈ, ㅊ, ㅌ, ㅎ), ㅂ(ㅍ, ㄼ,
298
341
  # ㄿ, ㅄ)’은 ‘ㄴ, ㅁ’ 앞에서 [ㅇ, ㄴ, ㅁ]으로 발음한다.
299
- def rule_18 kc, next_kc
342
+ def rule_18
300
343
  map = {
301
344
  %w[ㄱ ㄲ ㅋ ㄳ ㄺ] => 'ㅇ',
302
345
  %w[ㄷ ㅅ ㅆ ㅈ ㅊ ㅌ ㅎ] => 'ㄴ',
@@ -311,7 +354,7 @@ private
311
354
 
312
355
  # 제19항: 받침 ‘ㅁ, ㅇ’ 뒤에 연결되는 ‘ㄹ’은 [ㄴ]으로 발음한다.
313
356
  # [붙임]받침 ‘ㄱ, ㅂ’ 뒤에 연결되는 ‘ㄹ’도 [ㄴ]으로 발음한다.
314
- def rule_19 kc, next_kc
357
+ def rule_19
315
358
  if next_kc && next_kc.chosung == 'ㄹ' && %w[ㅁ ㅇ ㄱ ㅂ].include?(kc.jongsung)
316
359
  next_kc.chosung = 'ㄴ'
317
360
 
@@ -325,11 +368,11 @@ private
325
368
  end
326
369
 
327
370
  # 제20항: ‘ㄴ’은 ‘ㄹ’의 앞이나 뒤에서 [ㄹ]로 발음한다.
328
- def rule_20 kc, next_kc
371
+ def rule_20
329
372
  return if next_kc.nil?
330
373
 
331
374
  to = if %w[견란 진란 산량 단력 권력 원령 견례
332
- 문로 단로 원론 원료 근류].include?(kc.org.to_s + next_kc.org.to_s)
375
+ 문로 단로 원론 원료 근류].include?(kc_org.to_s + next_kc_org.to_s)
333
376
  'ㄴ'
334
377
  else
335
378
  'ㄹ'
@@ -348,7 +391,7 @@ private
348
391
 
349
392
  # 제23항: 받침 ‘ㄱ(ㄲ, ㅋ, ㄳ, ㄺ), ㄷ(ㅅ, ㅆ, ㅈ, ㅊ, ㅌ), ㅂ(ㅍ, ㄼ, ㄿ,ㅄ)’
350
393
  # 뒤에 연결되는 ‘ㄱ, ㄷ, ㅂ, ㅅ, ㅈ’은 된소리로 발음한다.
351
- def rule_23 kc, next_kc
394
+ def rule_23
352
395
  return if next_kc.nil?
353
396
  if fortis_map.keys.include?(next_kc.chosung) &&
354
397
  %w[ㄱ ㄲ ㅋ ㄳ ㄺ ㄷ ㅅ ㅆ ㅈ ㅊ ㅌ ㅂ ㅍ ㄼ ㄿ ㅄ].include?(kc.jongsung)
@@ -361,7 +404,7 @@ private
361
404
  # 제24항: 어간 받침 ‘ㄴ(ㄵ), ㅁ(ㄻ)’ 뒤에 결합되는 어미의 첫소리 ‘ㄱ, ㄷ, ㅅ, ㅈ’은 된소리로 발음한다.
362
405
  # 다만, 피동, 사동의 접미사 ‘-기-’는 된소리로 발음하지 않는다.
363
406
  # 용언 어간에만 적용.
364
- def rule_24 kc, next_kc
407
+ def rule_24
365
408
  return if next_kc.nil? ||
366
409
  next_kc.to_s == '기' # FIXME 피동/사동 여부 판단 불가. e.g. 줄넘기
367
410
 
@@ -385,7 +428,7 @@ private
385
428
 
386
429
  # 제25항: 어간 받침 ‘ㄼ, ㄾ’ 뒤에 결합되는 어미의 첫소리 ‘ㄱ, ㄷ, ㅅ, ㅈ’은
387
430
  # 된소리로 발음한다.
388
- def rule_25 kc, next_kc
431
+ def rule_25
389
432
  return if next_kc.nil?
390
433
 
391
434
  if %w[ㄱ ㄷ ㅅ ㅈ].include?(next_kc.chosung) &&
@@ -397,13 +440,13 @@ private
397
440
  end
398
441
 
399
442
  # 제26항: 한자어에서, ‘ㄹ’ 받침 뒤에 연결되는 ‘ㄷ, ㅅ, ㅈ’은 된소리로 발음한다.
400
- def rule_26 kc, next_kc
443
+ def rule_26
401
444
  # TODO
402
445
  end
403
446
 
404
447
  # 제27항: __관형사형__ ‘-(으)ㄹ’ 뒤에 연결되는 ‘ㄱ, ㄷ, ㅂ, ㅅ, ㅈ’은 된소리로 발음한다.
405
448
  # - ‘-(으)ㄹ’로 시작되는 어미의 경우에도 이에 준한다.
406
- def rule_27 kc, next_kc
449
+ def rule_27
407
450
  # FIXME: NOT PROPERLY IMPLEMENTED
408
451
  return if next_kc.nil?
409
452
 
@@ -419,14 +462,14 @@ private
419
462
  # 제28항: 표기상으로는 사이시옷이 없더라도, 관형격 기능을 지니는 사이시옷이
420
463
  # 있어야 할(휴지가 성립되는) 합성어의 경우에는, 뒤 단어의 첫소리 ‘ㄱ, ㄷ,
421
464
  # ㅂ, ㅅ, ㅈ’을 된소리로 발음한다.
422
- def rule_26_28 kc, next_kc
465
+ def rule_26_28
423
466
  # TODO
424
467
  end
425
468
 
426
469
  # 제29항: 합성어 및 파생어에서, 앞 단어나 접두사의 끝이 자음이고 뒤 단어나
427
470
  # 접미사의 첫음절이 ‘이, 야, 여, 요, 유’인 경우에는, ‘ㄴ’ 음을 첨가하여
428
471
  # [니, 냐, 녀, 뇨, 뉴]로 발음한다.
429
- def rule_29 kc, next_kc
472
+ def rule_29
430
473
  # TODO
431
474
  end
432
475
 
@@ -436,7 +479,7 @@ private
436
479
  # 발음하는 것도 허용한다.
437
480
  # 2. 사이시옷 뒤에 ‘ㄴ, ㅁ’이 결합되는 경우에는 [ㄴ]으로 발음한다.
438
481
  # 3. 사이시옷 뒤에 ‘이’ 음이 결합되는 경우에는 [ㄴㄴ]으로 발음한다.
439
- def rule_30 kc, next_kc
482
+ def rule_30
440
483
  return if next_kc.nil? || kc.jongsung != 'ㅅ'
441
484
 
442
485
  if %w[ㄱ ㄷ ㅂ ㅅ ㅈ].include? next_kc.chosung
@@ -1,4 +1,6 @@
1
1
  ---
2
+ "반기문": "Ban-gimun"
3
+ "방이문": "Bang-imun"
2
4
  "구미": "Gumi"
3
5
  "영동": "Yeongdong"
4
6
  "백암": "Baegam"
data/test/test_gimchi.rb CHANGED
@@ -97,18 +97,30 @@ class TestGimchi < Test::Unit::TestCase
97
97
  test_set.each do | k, v |
98
98
  cnt += 1
99
99
  k = k.gsub(/[-]/, '')
100
- t, tfs = ko.pronounce(k, :pronounce_each_char => false, :slur => k.include?(' '), :debug => true)
101
- if v.include? t.gsub(/\s/, '')
100
+
101
+ t1, tfs1 = ko.pronounce(k, :pronounce_each_char => false, :slur => true, :debug => true)
102
+ t2, tfs2 = ko.pronounce(k, :pronounce_each_char => false, :slur => false, :debug => true)
103
+
104
+ path = ""
105
+ if (with_slur = v.include?(t1.gsub(/\s/, ''))) || v.include?(t2.gsub(/\s/, ''))
102
106
  r = ANSI::Code::BLUE + ANSI::Code::BOLD + v.join(' / ') + ANSI::Code::RESET if v.length > 1
107
+ path = (with_slur ? tfs1 : tfs2).map { |e| e.sub 'rule_', '' }.join(' > ')
108
+ t = with_slur ? t1 : t2
103
109
  s += 1
104
110
  else
105
111
  r = ANSI::Code::RED + ANSI::Code::BOLD + v.join(' / ') + ANSI::Code::RESET
112
+ t = [t1, t2].join ' | '
106
113
  end
107
- puts "#{k} => #{t} (#{ko.romanize t}) [#{tfs.join(' > ')}] #{r}"
114
+ puts "#{k} => #{t} (#{ko.romanize t, :as_pronounced => false}) [#{path}] #{r}"
108
115
  end
109
116
  puts "#{s} / #{cnt}"
110
117
  # FIXME
111
- assert s >= 410
118
+ assert s >= 411
119
+ end
120
+
121
+ def test_romanize_preservce_non_korean
122
+ ko = Gimchi::Korean.new
123
+ assert_equal 'ttok-kkateun kkk', ko.romanize('똑같은 kkk')
112
124
  end
113
125
 
114
126
  def test_romanize
@@ -130,6 +142,6 @@ class TestGimchi < Test::Unit::TestCase
130
142
  end
131
143
  puts "#{s} / #{cnt}"
132
144
  # FIXME
133
- assert s >= 55
145
+ assert s >= 57
134
146
  end
135
147
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gimchi
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,12 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-04-05 00:00:00.000000000 +09:00
13
- default_executable:
12
+ date: 2011-04-07 00:00:00.000000000Z
14
13
  dependencies:
15
14
  - !ruby/object:Gem::Dependency
16
15
  name: bundler
17
- requirement: &2156236300 !ruby/object:Gem::Requirement
16
+ requirement: &2153047260 !ruby/object:Gem::Requirement
18
17
  none: false
19
18
  requirements:
20
19
  - - ~>
@@ -22,10 +21,10 @@ dependencies:
22
21
  version: 1.0.0
23
22
  type: :development
24
23
  prerelease: false
25
- version_requirements: *2156236300
24
+ version_requirements: *2153047260
26
25
  - !ruby/object:Gem::Dependency
27
26
  name: jeweler
28
- requirement: &2156235820 !ruby/object:Gem::Requirement
27
+ requirement: &2153046780 !ruby/object:Gem::Requirement
29
28
  none: false
30
29
  requirements:
31
30
  - - ~>
@@ -33,10 +32,10 @@ dependencies:
33
32
  version: 1.5.2
34
33
  type: :development
35
34
  prerelease: false
36
- version_requirements: *2156235820
35
+ version_requirements: *2153046780
37
36
  - !ruby/object:Gem::Dependency
38
37
  name: rcov
39
- requirement: &2156235340 !ruby/object:Gem::Requirement
38
+ requirement: &2153046300 !ruby/object:Gem::Requirement
40
39
  none: false
41
40
  requirements:
42
41
  - - ! '>='
@@ -44,10 +43,10 @@ dependencies:
44
43
  version: '0'
45
44
  type: :development
46
45
  prerelease: false
47
- version_requirements: *2156235340
46
+ version_requirements: *2153046300
48
47
  - !ruby/object:Gem::Dependency
49
48
  name: ansi
50
- requirement: &2156234860 !ruby/object:Gem::Requirement
49
+ requirement: &2153045820 !ruby/object:Gem::Requirement
51
50
  none: false
52
51
  requirements:
53
52
  - - ! '>='
@@ -55,8 +54,8 @@ dependencies:
55
54
  version: 1.2.2
56
55
  type: :development
57
56
  prerelease: false
58
- version_requirements: *2156234860
59
- description: Gimchi knows how to pronounce Korean string and how to write them in
57
+ version_requirements: *2153045820
58
+ description: Gimchi knows how to pronounce Korean strings and how to write them in
60
59
  roman alphabet.
61
60
  email: junegunn.c@gmail.com
62
61
  executables: []
@@ -78,7 +77,6 @@ files:
78
77
  - test/pronunciation.yml
79
78
  - test/romanization.yml
80
79
  - test/test_gimchi.rb
81
- has_rdoc: true
82
80
  homepage: http://github.com/junegunn/gimchi
83
81
  licenses:
84
82
  - MIT
@@ -100,7 +98,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
100
98
  version: '0'
101
99
  requirements: []
102
100
  rubyforge_project:
103
- rubygems_version: 1.6.2
101
+ rubygems_version: 1.7.2
104
102
  signing_key:
105
103
  specification_version: 3
106
104
  summary: Gimchi reads Korean.
@@ -109,3 +107,4 @@ test_files:
109
107
  - test/pronunciation.yml
110
108
  - test/romanization.yml
111
109
  - test/test_gimchi.rb
110
+ has_rdoc: