estem 0.2.4 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
data/ChangeLog CHANGED
@@ -1,3 +1,30 @@
1
+ Version 0.2.5
2
+
3
+ 2012-09-02 MaG <maguilamo.c@gmail.com>
4
+ *
5
+ - bin/ directory, removed.
6
+
7
+ * README.rdoc:
8
+ - cleanups
9
+ - Thanks section, removed.
10
+
11
+ * examples/usage.rb:
12
+ - cleanups
13
+
14
+ * estem.rb:
15
+ - (es_stem): rewritten.
16
+ - (safe_es_stem): deprecated Iconv, removed.
17
+
18
+ * bin/es_stem.rb:
19
+ - removed.
20
+
21
+ * test/:
22
+ - new test file added.
23
+ - rename file diffs.txt.
24
+
25
+ * test/test_estem.rb:
26
+ - one more test added.
27
+
1
28
  Version 0.2.4
2
29
 
3
30
  2012-06-25 MaG <maguilamo.c@gmail.com>
@@ -9,19 +36,19 @@ Version 0.2.4
9
36
  - examples/usage.rb: new file
10
37
 
11
38
  * README.rdoc:
12
- - max 80 cols per line.
13
- - recomendation about using safe_es_stem().
14
- - Fix Spanish typos.
39
+ - max 80 cols per line.
40
+ - recomendation about using safe_es_stem().
41
+ - Fix Spanish typos.
15
42
 
16
43
  * estem.gemspec:
17
- - cleanups.
18
- - (required_ruby_version): Ruby 1.9.1.
44
+ - cleanups.
45
+ - (required_ruby_version): Ruby 1.9.1.
19
46
 
20
47
  * bin/es_stem.rb:
21
- - chmod a+x .
22
- - (es_stem.rb:80): fix case sensitive comparation.
23
- - (es_stem.rb:25): removed .rb ext.
24
- - (es_stem.rb:29): new version.
48
+ - chmod a+x .
49
+ - (es_stem.rb:80): fix case sensitive comparation.
50
+ - (es_stem.rb:25): removed .rb ext.
51
+ - (es_stem.rb:29): new version.
25
52
 
26
53
  * estem.rb:
27
- - (safe_es_stem): new method.
54
+ - (safe_es_stem): new method.
data/README.rdoc CHANGED
@@ -1,7 +1,7 @@
1
1
  = Spanish Stem Gem
2
2
 
3
3
  == Description
4
- This gem is for reducing Spanish words to their roots. It uses an algorithm
4
+ This gem reduces Spanish words to their respective roots. It uses an algorithm
5
5
  based on Martin Porter's specifications.
6
6
 
7
7
  For more information, visit:
@@ -21,15 +21,14 @@ or
21
21
  $ gem install estem
22
22
 
23
23
  == Usage
24
- As a reminder, take in consideration that the Spanish language have several non
24
+ As a reminder, take in consideration that the Spanish language has several non
25
25
  US-ASCII characters, and because of that, the same data may varied from one
26
26
  codeset to another.
27
27
 
28
28
  Please remember to use a UTF-8 compatible encoding while using EStem. Please do
29
- not use String#force_encoding() to convert from one codeset to another, you
30
- might try using String#encode() but this later is more likely to fail, consider
31
- using String#safe_es_stem() when handling incompatibles codesets or the codeset
32
- type is unknown.
29
+ not use String#force_encoding to convert from one codeset to another, you may
30
+ try using String#encode alone but, instead, consider using String#safe_es_stem
31
+ when handling incompatibles codesets or the codeset type varies.
33
32
 
34
33
  require 'estem'
35
34
 
@@ -41,19 +40,6 @@ type is unknown.
41
40
  puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
42
41
  puts "Hacinamiento".es_stem # ==> "Hacin"
43
42
 
44
- You can use <tt>EStem</tt> as a command line tool:
45
- $ es_stem --in-enc ISO-8859-1 -f input_file.txt
46
-
47
- for more information type
48
- $ es_stem --help
49
-
50
- The <tt>es_stem</tt> program do his best trying to tokenized the lines from
51
- the file, you might consider finding an Spanish tokenizer, either way this
52
- program do what it is suppose to do, stem Spanish words.
53
-
54
- NOTE: For excellent results, consider replacing one word per line on the files
55
- the program handles.
56
-
57
43
  == Uso
58
44
  Como recordatorio, ten en cosideración que el Castellano posee muchos
59
45
  carácteres que están fuera del código ASCII, y por esta razón, los datos pueden
@@ -62,9 +48,8 @@ variar de un conjunto de codificación a otro.
62
48
  Por favor recuerda utilizar sistemas de condificación compatibles con UTF-8
63
49
  cuando se trabaje con EStem. Por favor no use String#force_encoding para
64
50
  convertir de un conjunto de codificación a otro, podría utilizar String#encode
65
- pero este último es más probable que falle en el intento, considere utilizar
66
- String#safe_es_stem() si está manejando conjuntos de codificación incompatibles
67
- o se desconoce el tipo.
51
+ solo, pero en su lugar, considere utilizar String#safe_es_stem() si está
52
+ manejando conjuntos de codificación incompatibles o se desconoce el tipo.
68
53
 
69
54
  require 'estem'
70
55
 
@@ -75,17 +60,6 @@ o se desconoce el tipo.
75
60
  puts "ALbeRGues".es_stem # ==> "ALbeRG"
76
61
  puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
77
62
  puts "Hacinamiento".es_stem # ==> "Hacin"
78
-
79
- Para más información ejecuta:
80
- $ es_stem --help
81
-
82
- El programa <tt>es_stem</tt> hará lo posible para separar las palabras de cada
83
- línea del fichero. Sería sensato utilizar otro programa más especializado para
84
- este propósito, de todas maneras, es_stem hace lo que se supone debe hacer,
85
- optener las raíces de las palabras.
86
-
87
- NOTA: Para resultados excelentes, considere poner una palabra por línea en los
88
- ficheros que pasará el programa.
89
63
 
90
64
  == Test
91
65
 
@@ -101,11 +75,6 @@ Incluye 28390 palabras de prueba con sus resultado esperados. Para realizar
101
75
  la prueba, ejecuta:
102
76
  rake test
103
77
 
104
- == Thanks -- Agradecimientos
105
-
106
- Ray Pereda https://github.com/raypereda/stemmify/ I used his gem as a guide to
107
- package mine. http://guides.rubygems.org/make-your-own-gem/ as well.
108
-
109
78
  == License -- Licencia
110
79
 
111
80
  Copyright (c) 2012 Manuel A. Güílamo
data/examples/usage.rb CHANGED
@@ -1,7 +1,5 @@
1
1
  require 'estem'
2
2
 
3
- hsh = Hash.new
4
-
5
3
  words = ['albergues','habitaciones','Albergues','ALbeRGues','HaBiTaCiOnEs',
6
4
  'Hacinamiento','mujeres','muchedumbre','ocasionalmente']
7
5
 
@@ -0,0 +1,11 @@
1
+ require 'estem'
2
+
3
+ hsh = Hash.new
4
+
5
+ words = ['albergues','habitaciones','Albergues','ALbeRGues','HaBiTaCiOnEs',
6
+ 'Hacinamiento','mujeres','muchedumbre','ocasionalmente']
7
+
8
+ words.each do|w|
9
+ stem = w.es_stem
10
+ puts "Word: #{w}\nStem: #{stem}\n\n"
11
+ end
data/lib/estem.rb CHANGED
@@ -22,8 +22,6 @@
22
22
  # * Manuel A. Güílamo maguilamo.c@gmail.com
23
23
  #
24
24
 
25
- require 'iconv'
26
-
27
25
  module EStem
28
26
  ##
29
27
  # For more information, please refer to <b>String#es_stem</b> method, also <b>EStem</b>.
@@ -38,61 +36,59 @@ module EStem
38
36
  # "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
39
37
  # "Hacinamiento".es_stem # ==> "Hacin"
40
38
  #
41
- #If you are not aware of the codeset the data has, then use
39
+ #If you are not aware of the codeset the data have, try using
42
40
  #String#safe_es_stem instead.
43
41
  #
44
42
  #:call-seq:
45
43
  # str.es_stem => "new_str"
46
44
  def es_stem
47
45
  str = self.dup
48
- return remove_accent(str) if str.length == 1
49
- tmp = step0(str)
50
- str = tmp ? tmp : str
51
-
52
- unless tmp = step1(str)
53
- unless tmp = step2a(str)
54
- tmp = step2b(str)
55
- str = tmp ? tmp : str
56
- else
57
- str = tmp
58
- end
46
+ case str.length
47
+ when 0
48
+ return str
49
+ when 1
50
+ return remove_accent(str)
59
51
  end
60
- tmp = step3(str)
61
- str = tmp.nil? ? str : tmp
52
+
53
+ step0(str)
54
+ unless step1(str)
55
+ step2b(str) unless step2a(str)
56
+ end
57
+
58
+ step3(str)
62
59
  remove_accent(str)
63
60
  end
64
61
 
65
62
  ##
66
63
  #Use this method in case you are not aware of the codeset the data being
67
- #handle has. This method returns a new string with the same codeset as
68
- #the original. Be aware that this method is slower than String#es_stem()
64
+ #handle have. This method returns a new string with the same codeset as
65
+ #the original. Be aware that this method is a bit slower than String#es_stem
69
66
  #:call-seq:
70
67
  # str.safe_es_stem => "new_str"
71
68
  def safe_es_stem
72
- return self.es_stem if self.encoding == Encoding::UTF_8
73
-
74
- default_enc = self.encoding.name
75
-
76
- str = self.dup.force_encoding('UTF-8')
77
-
78
- if str.valid_encoding?
79
- begin
80
- tmp = str.es_stem
81
- return tmp.force_encoding(default_enc)
82
- rescue
83
- end
69
+ if self.encoding == Encoding::UTF_8
70
+ # remove invalid characters
71
+ return self.chars.select{|c| c.valid_encoding? }.join.es_stem
84
72
  end
85
73
 
86
- if enc = Encoding.compatible?(self, VOWEL)
87
- begin
88
- return self.encode(enc).es_stem
89
- rescue
74
+ unless self.valid_encoding?
75
+ tmp = self.dup
76
+ if tmp.force_encoding('UTF-8').valid_encoding?
77
+ begin
78
+ return tmp.es_stem
79
+ rescue
80
+ end
90
81
  end
91
82
  end
92
83
 
84
+ default_enc = self.encoding.name
85
+ str = self.chars.select{|c| c.valid_encoding? }.join
86
+
87
+ return nil if str.empty?
88
+
93
89
  begin
94
- tmp = Iconv.conv('UTF-8', self.encoding.name, self).es_stem
95
- return Iconv.conv(default_enc, 'UTF-8', tmp);
90
+ tmp = str.encode('UTF-8', str.encoding.name).es_stem
91
+ return tmp.encode(default_enc, 'UTF-8');
96
92
  rescue
97
93
  return nil
98
94
  end
@@ -145,8 +141,9 @@ module EStem
145
141
  [r1,r2]
146
142
  end
147
143
 
144
+ #=> true or false
148
145
  def step0(str)
149
- return nil unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
146
+ return false unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
150
147
 
151
148
  suffix = $&
152
149
  rv_text = str[rv(str)..-1]
@@ -154,21 +151,21 @@ module EStem
154
151
  case rv_text
155
152
  when %r{((?<=i[éÉ]ndo|[áÁ]ndo|[áéíÁÉÍ]r)#{suffix})$}ui
156
153
  str[%r{#$&$}]=''
157
- str = remove_accent(str)
158
- return str
154
+ str.replace(remove_accent(str))
155
+ return true
159
156
  when %r{((?<=iendo|ando|[aei]r)#{suffix})$}i
160
157
  str[%r{#$&$}]=''
161
- return str
158
+ return true
162
159
  end
163
160
 
164
161
  if rv_text =~ /yendo/i and str =~ /uyendo/i
165
162
  str[suffix]=''
166
- return str
163
+ return true
167
164
  end
168
- nil
165
+ false
169
166
  end
170
167
 
171
- #=> new_str or nil
168
+ #=> true or false
172
169
  def step1(str)
173
170
  r1,r2 = r12(str)
174
171
  r1_text = str[r1..-1]
@@ -177,46 +174,46 @@ module EStem
177
174
  case r2_text
178
175
  when /(anzas?|ic[oa]s?|ismos?|[ai]bles?|istas?|os[oa]s?|[ai]mientos?)$/i
179
176
  str[%r{#$&$}]=''
180
- return str
177
+ return true
181
178
  when /(ic)?(ador([ae]s?)?|aci[óÓ]n|aciones|antes?|ancias?)$/ui
182
179
  str[%r{#$&$}]=''
183
- return str
180
+ return true
184
181
  when /log[íÍ]as?/ui
185
182
  str[%r{#$&$}]='log'
186
- return str
183
+ return true
187
184
  when /(uci([óÓ]n|ones))$/ui
188
185
  str[%r{#$&$}]='u'
189
- return str
186
+ return true
190
187
  when /(encias?)$/i
191
188
  str[%r{#$&$}]='ente'
192
- return str
189
+ return true
193
190
  end
194
191
 
195
192
  if r2_text =~ /(ativ|iv|os|ic|ad)amente$/i or r1_text =~ /amente$/i
196
193
  str[%r{#$&$}]=''
197
- return str
194
+ return true
198
195
  end
199
196
 
200
197
  case r2_text
201
198
  when /((ante|[ai]ble)?mente)$/i, /((abil|i[cv])?idad(es)?)$/i, /((at)?iv[ao]s?)$/i
202
199
  str[%r{#$&$}]=''
203
- return str
200
+ return true
204
201
  end
205
- nil
202
+ false
206
203
  end
207
204
 
208
- #=> nil or new_str
205
+ #=> true or false
209
206
  def step2a(str)
210
207
  rv_pos = rv(str)
211
208
  idx = str[rv_pos..-1] =~ /(y[oóÓ]|ye(ron|ndo)|y[ae][ns]?|ya(is|mos))$/ui
212
209
 
213
- return nil unless idx
210
+ return false unless idx
214
211
 
215
212
  if 'u' == str[rv_pos+idx-1].downcase
216
213
  str[%r{#$&$}] = ''
217
- return str
214
+ return true
218
215
  end
219
- nil
216
+ false
220
217
  end
221
218
 
222
219
  STEP2B_REGEXP = /(
@@ -229,6 +226,7 @@ module EStem
229
226
  en|es|[éÉ]is|emos
230
227
  )$/xiu
231
228
 
229
+ #=> true or false
232
230
  def step2b(str)
233
231
  rv_pos = rv(str)
234
232
 
@@ -240,27 +238,28 @@ module EStem
240
238
  else
241
239
  str[%r{#{suffix}$}]=''
242
240
  end
243
- return str
241
+ return true
244
242
  end
245
- nil
243
+ false
246
244
  end
247
245
 
246
+ #=> true or false
248
247
  def step3(str)
249
248
  rv_pos = rv(str)
250
249
  rv_text = str[rv_pos..-1]
251
250
 
252
251
  if rv_text =~ /(os|[aoáíóÁÍÓ])$/ui
253
252
  str[%r{#$&$}]=''
254
- return str
253
+ return true
255
254
  elsif idx = rv_text =~ /(u?[eéÉ])$/i
256
255
  if $&[0].downcase == 'u' and str[rv_pos+idx-1].downcase == 'g'
257
256
  str[%r{#$&$}]=''
258
257
  else
259
258
  str.chop!
260
259
  end
261
- return str
260
+ return true
262
261
  end
263
- nil
262
+ false
264
263
  end
265
264
 
266
265
  VOWEL = 'aeiouáéíóúüAEIOUÁÉÍÓÚÜ'
data/lib/estem.rb~ ADDED
@@ -0,0 +1,271 @@
1
+ # encoding: UTF-8
2
+ #
3
+ # :title: Spanish Stemming
4
+ # = Description
5
+ # This gem is for reducing Spanish words to their roots. It uses an algorithm
6
+ # based on Martin Porter's specifications.
7
+ #
8
+ # For more information, visit:
9
+ # http://snowball.tartarus.org/algorithms/spanish/stemmer.html
10
+ #
11
+ # = Descripción
12
+ # Esta gema está para reducir las palabras del Español en sus respectivas raíces,
13
+ # para ello ultiliza un algoritmo basado en las especificaciones de Martin Porter
14
+ #
15
+ # Para más información, visite:
16
+ # http://snowball.tartarus.org/algorithms/spanish/stemmer.html
17
+ #
18
+ # = License -- Licencia
19
+ # This code is provided under the terms of the {MIT License.}[http://www.opensource.org/licenses/mit-license.php]
20
+ #
21
+ # = Authors
22
+ # * Manuel A. Güílamo maguilamo.c@gmail.com
23
+ #
24
+
25
+ module EStem
26
+ ##
27
+ # For more information, please refer to <b>String#es_stem</b> method, also <b>EStem</b>.
28
+ # :method: estem
29
+
30
+ ##
31
+ #This method stem Spanish words.
32
+ #
33
+ # "albergues".es_stem # ==> "alberg"
34
+ # "habitaciones".es_stem # ==> "habit"
35
+ # "ALbeRGues".es_stem # ==> "ALbeRG"
36
+ # "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
37
+ # "Hacinamiento".es_stem # ==> "Hacin"
38
+ #
39
+ #If you are not aware of the codeset the data have, try using
40
+ #String#safe_es_stem instead.
41
+ #
42
+ #:call-seq:
43
+ # str.es_stem => "new_str"
44
+ def es_stem
45
+ str = self.dup
46
+ case str.length
47
+ when 0
48
+ return str
49
+ when 1
50
+ return remove_accent(str)
51
+ end
52
+
53
+ step0(str)
54
+ unless step1(str)
55
+ step2b(str) unless step2a(str)
56
+ end
57
+
58
+ step3(str)
59
+ remove_accent(str)
60
+ end
61
+
62
+ ##
63
+ #Use this method in case you are not aware of the codeset the data being
64
+ #handle have. This method returns a new string with the same codeset as
65
+ #the original. Be aware that this method is a bit slower than String#es_stem
66
+ #:call-seq:
67
+ # str.safe_es_stem => "new_str"
68
+ def safe_es_stem
69
+ if str.encoding == Encoding::UTF_8
70
+ # remove invalid characters
71
+ return self.chars.select{|c| c.valid_encoding? }.join.es_stem
72
+ end
73
+
74
+ unless self.valid_encoding?
75
+ tmp = self.dup
76
+ if tmp.force_encoding('UTF-8').valid_encoding?
77
+ begin
78
+ return tmp.es_stem
79
+ rescue
80
+ end
81
+ end
82
+ end
83
+
84
+ default_enc = self.encoding.name
85
+ str = self.chars.select{|c| c.valid_encoding? }.join
86
+
87
+ return nil if str.empty?
88
+
89
+ begin
90
+ tmp = str.encode('UTF-8', str.encoding.name).es_stem
91
+ return tmp.encode(default_enc, 'UTF-8');
92
+ rescue
93
+ return nil
94
+ end
95
+ end
96
+
97
+ # :stopdoc:
98
+
99
+ private
100
+
101
+ def vowel?(c)
102
+ VOWEL.include?(c)
103
+ end
104
+
105
+ def consonant?(c)
106
+ CONSONANT.include?(c)
107
+ end
108
+
109
+ def remove_accent(str)
110
+ str.tr('áéíóúÁÉÍÓÚ','aeiouAEIOU')
111
+ end
112
+
113
+ def rv(str)
114
+ if consonant? str[1]
115
+ i=2
116
+ i+=1 while str[i] and consonant? str[i]
117
+ return str.nil? ? str.length-1 : i+1
118
+ end
119
+
120
+ if vowel? str[0] and vowel? str[1]
121
+ i=2
122
+ i+=1 while str[i] and vowel? str[i]
123
+ return str.nil? ? str.length-1 : i+1
124
+ end
125
+
126
+ return 3 if consonant? str[0] and vowel? str[1]
127
+
128
+ str.length - 1
129
+ end
130
+
131
+ def r(str, i=0)
132
+ i+=1 while str[i] and consonant?(str[i])
133
+ i+=1
134
+ i+=1 while str[i] and vowel? str[i]
135
+ str[i].nil? ? str.length : i+1
136
+ end
137
+
138
+ def r12(str)
139
+ r1 = r(str)
140
+ r2 = r(str,r1)
141
+ [r1,r2]
142
+ end
143
+
144
+ #=> true or false
145
+ def step0(str)
146
+ return false unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
147
+
148
+ suffix = $&
149
+ rv_text = str[rv(str)..-1]
150
+
151
+ case rv_text
152
+ when %r{((?<=i[éÉ]ndo|[áÁ]ndo|[áéíÁÉÍ]r)#{suffix})$}ui
153
+ str[%r{#$&$}]=''
154
+ str.replace(remove_accent(str))
155
+ return true
156
+ when %r{((?<=iendo|ando|[aei]r)#{suffix})$}i
157
+ str[%r{#$&$}]=''
158
+ return true
159
+ end
160
+
161
+ if rv_text =~ /yendo/i and str =~ /uyendo/i
162
+ str[suffix]=''
163
+ return true
164
+ end
165
+ false
166
+ end
167
+
168
+ #=> true or false
169
+ def step1(str)
170
+ r1,r2 = r12(str)
171
+ r1_text = str[r1..-1]
172
+ r2_text = str[r2..-1]
173
+
174
+ case r2_text
175
+ when /(anzas?|ic[oa]s?|ismos?|[ai]bles?|istas?|os[oa]s?|[ai]mientos?)$/i
176
+ str[%r{#$&$}]=''
177
+ return true
178
+ when /(ic)?(ador([ae]s?)?|aci[óÓ]n|aciones|antes?|ancias?)$/ui
179
+ str[%r{#$&$}]=''
180
+ return true
181
+ when /log[íÍ]as?/ui
182
+ str[%r{#$&$}]='log'
183
+ return true
184
+ when /(uci([óÓ]n|ones))$/ui
185
+ str[%r{#$&$}]='u'
186
+ return true
187
+ when /(encias?)$/i
188
+ str[%r{#$&$}]='ente'
189
+ return true
190
+ end
191
+
192
+ if r2_text =~ /(ativ|iv|os|ic|ad)amente$/i or r1_text =~ /amente$/i
193
+ str[%r{#$&$}]=''
194
+ return true
195
+ end
196
+
197
+ case r2_text
198
+ when /((ante|[ai]ble)?mente)$/i, /((abil|i[cv])?idad(es)?)$/i, /((at)?iv[ao]s?)$/i
199
+ str[%r{#$&$}]=''
200
+ return true
201
+ end
202
+ false
203
+ end
204
+
205
+ #=> true or false
206
+ def step2a(str)
207
+ rv_pos = rv(str)
208
+ idx = str[rv_pos..-1] =~ /(y[oóÓ]|ye(ron|ndo)|y[ae][ns]?|ya(is|mos))$/ui
209
+
210
+ return false unless idx
211
+
212
+ if 'u' == str[rv_pos+idx-1].downcase
213
+ str[%r{#$&$}] = ''
214
+ return true
215
+ end
216
+ false
217
+ end
218
+
219
+ STEP2B_REGEXP = /(
220
+ ar([áÁ][ns]?|a(n|s|is)?|on)? | ar([éÉ]is|emos|é|É) | ar[íÍ]a(n|s|is|mos)? |
221
+ er([áÁ][sn]?|[éÉ](is)?|emos|[íÍ]a(n|s|is|mos)?)? |
222
+ ir([íÍ]a(s|n|is|mos)?|[áÁ][ns]?|emos|[éÉ]|éis)? | aba(s|n|is)? |
223
+ ad([ao]s?)? | ed | id(a|as|o|os)? | [íÍ]a(n|s|is|mos)? | [íÍ]s |
224
+ as(e[ns]?|te|eis|teis)? | [áÁ](is|bamos|semos|ramos) | a(n|ndo|mos) |
225
+ ie(ra|se|ran|sen|ron|ndo|ras|ses|rais|seis) | i(ste|steis|[óÓ]|mos|[éÉ]ramos|[éÉ]semos) |
226
+ en|es|[éÉ]is|emos
227
+ )$/xiu
228
+
229
+ #=> true or false
230
+ def step2b(str)
231
+ rv_pos = rv(str)
232
+
233
+ if idx = str[rv_pos..-1] =~ STEP2B_REGEXP
234
+ suffix = $&
235
+ if suffix =~ /^(en|es|[éÉ]is|emos)$/ui
236
+ str[%r{#{suffix}$}]=''
237
+ str[rv_pos+idx-1]='' if str[rv_pos+idx-2] =~ /g/i and str[rv_pos+idx-1] =~ /u/i
238
+ else
239
+ str[%r{#{suffix}$}]=''
240
+ end
241
+ return true
242
+ end
243
+ false
244
+ end
245
+
246
+ #=> true or false
247
+ def step3(str)
248
+ rv_pos = rv(str)
249
+ rv_text = str[rv_pos..-1]
250
+
251
+ if rv_text =~ /(os|[aoáíóÁÍÓ])$/ui
252
+ str[%r{#$&$}]=''
253
+ return true
254
+ elsif idx = rv_text =~ /(u?[eéÉ])$/i
255
+ if $&[0].downcase == 'u' and str[rv_pos+idx-1].downcase == 'g'
256
+ str[%r{#$&$}]=''
257
+ else
258
+ str.chop!
259
+ end
260
+ return true
261
+ end
262
+ false
263
+ end
264
+
265
+ VOWEL = 'aeiouáéíóúüAEIOUÁÉÍÓÚÜ'
266
+ CONSONANT = "bcdfghjklmnñpqrstvwxyzABCDEFGHIJKLMNÑOPQRSTUVWXYZ"
267
+ end
268
+
269
+ class String
270
+ include EStem
271
+ end