estem 0.2.4 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/ChangeLog CHANGED
@@ -1,3 +1,30 @@
1
+ Version 0.2.5
2
+
3
+ 2012-09-02 MaG <maguilamo.c@gmail.com>
4
+ *
5
+ - bin/ directory, removed.
6
+
7
+ * README.rdoc:
8
+ - cleanups
9
+ - Thanks section, removed.
10
+
11
+ * examples/usage.rb:
12
+ - cleanups
13
+
14
+ * estem.rb:
15
+ - (es_stem): rewritten.
16
+ - (safe_es_stem): deprecated Iconv, removed.
17
+
18
+ * bin/es_stem.rb:
19
+ - removed.
20
+
21
+ * test/:
22
+ - new test file added.
23
+ - rename file diffs.txt.
24
+
25
+ * test/test_estem.rb:
26
+ - one more test added.
27
+
1
28
  Version 0.2.4
2
29
 
3
30
  2012-06-25 MaG <maguilamo.c@gmail.com>
@@ -9,19 +36,19 @@ Version 0.2.4
9
36
  - examples/usage.rb: new file
10
37
 
11
38
  * README.rdoc:
12
- - max 80 cols per line.
13
- - recomendation about using safe_es_stem().
14
- - Fix Spanish typos.
39
+ - max 80 cols per line.
40
+ - recomendation about using safe_es_stem().
41
+ - Fix Spanish typos.
15
42
 
16
43
  * estem.gemspec:
17
- - cleanups.
18
- - (required_ruby_version): Ruby 1.9.1.
44
+ - cleanups.
45
+ - (required_ruby_version): Ruby 1.9.1.
19
46
 
20
47
  * bin/es_stem.rb:
21
- - chmod a+x .
22
- - (es_stem.rb:80): fix case sensitive comparation.
23
- - (es_stem.rb:25): removed .rb ext.
24
- - (es_stem.rb:29): new version.
48
+ - chmod a+x .
49
+ - (es_stem.rb:80): fix case sensitive comparation.
50
+ - (es_stem.rb:25): removed .rb ext.
51
+ - (es_stem.rb:29): new version.
25
52
 
26
53
  * estem.rb:
27
- - (safe_es_stem): new method.
54
+ - (safe_es_stem): new method.
data/README.rdoc CHANGED
@@ -1,7 +1,7 @@
1
1
  = Spanish Stem Gem
2
2
 
3
3
  == Description
4
- This gem is for reducing Spanish words to their roots. It uses an algorithm
4
+ This gem reduces Spanish words to their respective roots. It uses an algorithm
5
5
  based on Martin Porter's specifications.
6
6
 
7
7
  For more information, visit:
@@ -21,15 +21,14 @@ or
21
21
  $ gem install estem
22
22
 
23
23
  == Usage
24
- As a reminder, take in consideration that the Spanish language have several non
24
+ As a reminder, take in consideration that the Spanish language has several non
25
25
  US-ASCII characters, and because of that, the same data may varied from one
26
26
  codeset to another.
27
27
 
28
28
  Please remember to use a UTF-8 compatible encoding while using EStem. Please do
29
- not use String#force_encoding() to convert from one codeset to another, you
30
- might try using String#encode() but this later is more likely to fail, consider
31
- using String#safe_es_stem() when handling incompatibles codesets or the codeset
32
- type is unknown.
29
+ not use String#force_encoding to convert from one codeset to another, you may
30
+ try using String#encode alone but, instead, consider using String#safe_es_stem
31
+ when handling incompatibles codesets or the codeset type varies.
33
32
 
34
33
  require 'estem'
35
34
 
@@ -41,19 +40,6 @@ type is unknown.
41
40
  puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
42
41
  puts "Hacinamiento".es_stem # ==> "Hacin"
43
42
 
44
- You can use <tt>EStem</tt> as a command line tool:
45
- $ es_stem --in-enc ISO-8859-1 -f input_file.txt
46
-
47
- for more information type
48
- $ es_stem --help
49
-
50
- The <tt>es_stem</tt> program do his best trying to tokenized the lines from
51
- the file, you might consider finding an Spanish tokenizer, either way this
52
- program do what it is suppose to do, stem Spanish words.
53
-
54
- NOTE: For excellent results, consider replacing one word per line on the files
55
- the program handles.
56
-
57
43
  == Uso
58
44
  Como recordatorio, ten en cosideración que el Castellano posee muchos
59
45
  carácteres que están fuera del código ASCII, y por esta razón, los datos pueden
@@ -62,9 +48,8 @@ variar de un conjunto de codificación a otro.
62
48
  Por favor recuerda utilizar sistemas de condificación compatibles con UTF-8
63
49
  cuando se trabaje con EStem. Por favor no use String#force_encoding para
64
50
  convertir de un conjunto de codificación a otro, podría utilizar String#encode
65
- pero este último es más probable que falle en el intento, considere utilizar
66
- String#safe_es_stem() si está manejando conjuntos de codificación incompatibles
67
- o se desconoce el tipo.
51
+ solo, pero en su lugar, considere utilizar String#safe_es_stem() si está
52
+ manejando conjuntos de codificación incompatibles o se desconoce el tipo.
68
53
 
69
54
  require 'estem'
70
55
 
@@ -75,17 +60,6 @@ o se desconoce el tipo.
75
60
  puts "ALbeRGues".es_stem # ==> "ALbeRG"
76
61
  puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
77
62
  puts "Hacinamiento".es_stem # ==> "Hacin"
78
-
79
- Para más información ejecuta:
80
- $ es_stem --help
81
-
82
- El programa <tt>es_stem</tt> hará lo posible para separar las palabras de cada
83
- línea del fichero. Sería sensato utilizar otro programa más especializado para
84
- este propósito, de todas maneras, es_stem hace lo que se supone debe hacer,
85
- optener las raíces de las palabras.
86
-
87
- NOTA: Para resultados excelentes, considere poner una palabra por línea en los
88
- ficheros que pasará el programa.
89
63
 
90
64
  == Test
91
65
 
@@ -101,11 +75,6 @@ Incluye 28390 palabras de prueba con sus resultado esperados. Para realizar
101
75
  la prueba, ejecuta:
102
76
  rake test
103
77
 
104
- == Thanks -- Agradecimientos
105
-
106
- Ray Pereda https://github.com/raypereda/stemmify/ I used his gem as a guide to
107
- package mine. http://guides.rubygems.org/make-your-own-gem/ as well.
108
-
109
78
  == License -- Licencia
110
79
 
111
80
  Copyright (c) 2012 Manuel A. Güílamo
data/examples/usage.rb CHANGED
@@ -1,7 +1,5 @@
1
1
  require 'estem'
2
2
 
3
- hsh = Hash.new
4
-
5
3
  words = ['albergues','habitaciones','Albergues','ALbeRGues','HaBiTaCiOnEs',
6
4
  'Hacinamiento','mujeres','muchedumbre','ocasionalmente']
7
5
 
@@ -0,0 +1,11 @@
1
+ require 'estem'
2
+
3
+ hsh = Hash.new
4
+
5
+ words = ['albergues','habitaciones','Albergues','ALbeRGues','HaBiTaCiOnEs',
6
+ 'Hacinamiento','mujeres','muchedumbre','ocasionalmente']
7
+
8
+ words.each do|w|
9
+ stem = w.es_stem
10
+ puts "Word: #{w}\nStem: #{stem}\n\n"
11
+ end
data/lib/estem.rb CHANGED
@@ -22,8 +22,6 @@
22
22
  # * Manuel A. Güílamo maguilamo.c@gmail.com
23
23
  #
24
24
 
25
- require 'iconv'
26
-
27
25
  module EStem
28
26
  ##
29
27
  # For more information, please refer to <b>String#es_stem</b> method, also <b>EStem</b>.
@@ -38,61 +36,59 @@ module EStem
38
36
  # "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
39
37
  # "Hacinamiento".es_stem # ==> "Hacin"
40
38
  #
41
- #If you are not aware of the codeset the data has, then use
39
+ #If you are not aware of the codeset the data have, try using
42
40
  #String#safe_es_stem instead.
43
41
  #
44
42
  #:call-seq:
45
43
  # str.es_stem => "new_str"
46
44
  def es_stem
47
45
  str = self.dup
48
- return remove_accent(str) if str.length == 1
49
- tmp = step0(str)
50
- str = tmp ? tmp : str
51
-
52
- unless tmp = step1(str)
53
- unless tmp = step2a(str)
54
- tmp = step2b(str)
55
- str = tmp ? tmp : str
56
- else
57
- str = tmp
58
- end
46
+ case str.length
47
+ when 0
48
+ return str
49
+ when 1
50
+ return remove_accent(str)
59
51
  end
60
- tmp = step3(str)
61
- str = tmp.nil? ? str : tmp
52
+
53
+ step0(str)
54
+ unless step1(str)
55
+ step2b(str) unless step2a(str)
56
+ end
57
+
58
+ step3(str)
62
59
  remove_accent(str)
63
60
  end
64
61
 
65
62
  ##
66
63
  #Use this method in case you are not aware of the codeset the data being
67
- #handle has. This method returns a new string with the same codeset as
68
- #the original. Be aware that this method is slower than String#es_stem()
64
+ #handle have. This method returns a new string with the same codeset as
65
+ #the original. Be aware that this method is a bit slower than String#es_stem
69
66
  #:call-seq:
70
67
  # str.safe_es_stem => "new_str"
71
68
  def safe_es_stem
72
- return self.es_stem if self.encoding == Encoding::UTF_8
73
-
74
- default_enc = self.encoding.name
75
-
76
- str = self.dup.force_encoding('UTF-8')
77
-
78
- if str.valid_encoding?
79
- begin
80
- tmp = str.es_stem
81
- return tmp.force_encoding(default_enc)
82
- rescue
83
- end
69
+ if self.encoding == Encoding::UTF_8
70
+ # remove invalid characters
71
+ return self.chars.select{|c| c.valid_encoding? }.join.es_stem
84
72
  end
85
73
 
86
- if enc = Encoding.compatible?(self, VOWEL)
87
- begin
88
- return self.encode(enc).es_stem
89
- rescue
74
+ unless self.valid_encoding?
75
+ tmp = self.dup
76
+ if tmp.force_encoding('UTF-8').valid_encoding?
77
+ begin
78
+ return tmp.es_stem
79
+ rescue
80
+ end
90
81
  end
91
82
  end
92
83
 
84
+ default_enc = self.encoding.name
85
+ str = self.chars.select{|c| c.valid_encoding? }.join
86
+
87
+ return nil if str.empty?
88
+
93
89
  begin
94
- tmp = Iconv.conv('UTF-8', self.encoding.name, self).es_stem
95
- return Iconv.conv(default_enc, 'UTF-8', tmp);
90
+ tmp = str.encode('UTF-8', str.encoding.name).es_stem
91
+ return tmp.encode(default_enc, 'UTF-8');
96
92
  rescue
97
93
  return nil
98
94
  end
@@ -145,8 +141,9 @@ module EStem
145
141
  [r1,r2]
146
142
  end
147
143
 
144
+ #=> true or false
148
145
  def step0(str)
149
- return nil unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
146
+ return false unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
150
147
 
151
148
  suffix = $&
152
149
  rv_text = str[rv(str)..-1]
@@ -154,21 +151,21 @@ module EStem
154
151
  case rv_text
155
152
  when %r{((?<=i[éÉ]ndo|[áÁ]ndo|[áéíÁÉÍ]r)#{suffix})$}ui
156
153
  str[%r{#$&$}]=''
157
- str = remove_accent(str)
158
- return str
154
+ str.replace(remove_accent(str))
155
+ return true
159
156
  when %r{((?<=iendo|ando|[aei]r)#{suffix})$}i
160
157
  str[%r{#$&$}]=''
161
- return str
158
+ return true
162
159
  end
163
160
 
164
161
  if rv_text =~ /yendo/i and str =~ /uyendo/i
165
162
  str[suffix]=''
166
- return str
163
+ return true
167
164
  end
168
- nil
165
+ false
169
166
  end
170
167
 
171
- #=> new_str or nil
168
+ #=> true or false
172
169
  def step1(str)
173
170
  r1,r2 = r12(str)
174
171
  r1_text = str[r1..-1]
@@ -177,46 +174,46 @@ module EStem
177
174
  case r2_text
178
175
  when /(anzas?|ic[oa]s?|ismos?|[ai]bles?|istas?|os[oa]s?|[ai]mientos?)$/i
179
176
  str[%r{#$&$}]=''
180
- return str
177
+ return true
181
178
  when /(ic)?(ador([ae]s?)?|aci[óÓ]n|aciones|antes?|ancias?)$/ui
182
179
  str[%r{#$&$}]=''
183
- return str
180
+ return true
184
181
  when /log[íÍ]as?/ui
185
182
  str[%r{#$&$}]='log'
186
- return str
183
+ return true
187
184
  when /(uci([óÓ]n|ones))$/ui
188
185
  str[%r{#$&$}]='u'
189
- return str
186
+ return true
190
187
  when /(encias?)$/i
191
188
  str[%r{#$&$}]='ente'
192
- return str
189
+ return true
193
190
  end
194
191
 
195
192
  if r2_text =~ /(ativ|iv|os|ic|ad)amente$/i or r1_text =~ /amente$/i
196
193
  str[%r{#$&$}]=''
197
- return str
194
+ return true
198
195
  end
199
196
 
200
197
  case r2_text
201
198
  when /((ante|[ai]ble)?mente)$/i, /((abil|i[cv])?idad(es)?)$/i, /((at)?iv[ao]s?)$/i
202
199
  str[%r{#$&$}]=''
203
- return str
200
+ return true
204
201
  end
205
- nil
202
+ false
206
203
  end
207
204
 
208
- #=> nil or new_str
205
+ #=> true or false
209
206
  def step2a(str)
210
207
  rv_pos = rv(str)
211
208
  idx = str[rv_pos..-1] =~ /(y[oóÓ]|ye(ron|ndo)|y[ae][ns]?|ya(is|mos))$/ui
212
209
 
213
- return nil unless idx
210
+ return false unless idx
214
211
 
215
212
  if 'u' == str[rv_pos+idx-1].downcase
216
213
  str[%r{#$&$}] = ''
217
- return str
214
+ return true
218
215
  end
219
- nil
216
+ false
220
217
  end
221
218
 
222
219
  STEP2B_REGEXP = /(
@@ -229,6 +226,7 @@ module EStem
229
226
  en|es|[éÉ]is|emos
230
227
  )$/xiu
231
228
 
229
+ #=> true or false
232
230
  def step2b(str)
233
231
  rv_pos = rv(str)
234
232
 
@@ -240,27 +238,28 @@ module EStem
240
238
  else
241
239
  str[%r{#{suffix}$}]=''
242
240
  end
243
- return str
241
+ return true
244
242
  end
245
- nil
243
+ false
246
244
  end
247
245
 
246
+ #=> true or false
248
247
  def step3(str)
249
248
  rv_pos = rv(str)
250
249
  rv_text = str[rv_pos..-1]
251
250
 
252
251
  if rv_text =~ /(os|[aoáíóÁÍÓ])$/ui
253
252
  str[%r{#$&$}]=''
254
- return str
253
+ return true
255
254
  elsif idx = rv_text =~ /(u?[eéÉ])$/i
256
255
  if $&[0].downcase == 'u' and str[rv_pos+idx-1].downcase == 'g'
257
256
  str[%r{#$&$}]=''
258
257
  else
259
258
  str.chop!
260
259
  end
261
- return str
260
+ return true
262
261
  end
263
- nil
262
+ false
264
263
  end
265
264
 
266
265
  VOWEL = 'aeiouáéíóúüAEIOUÁÉÍÓÚÜ'
data/lib/estem.rb~ ADDED
@@ -0,0 +1,271 @@
1
+ # encoding: UTF-8
2
+ #
3
+ # :title: Spanish Stemming
4
+ # = Description
5
+ # This gem is for reducing Spanish words to their roots. It uses an algorithm
6
+ # based on Martin Porter's specifications.
7
+ #
8
+ # For more information, visit:
9
+ # http://snowball.tartarus.org/algorithms/spanish/stemmer.html
10
+ #
11
+ # = Descripción
12
+ # Esta gema está para reducir las palabras del Español en sus respectivas raíces,
13
+ # para ello ultiliza un algoritmo basado en las especificaciones de Martin Porter
14
+ #
15
+ # Para más información, visite:
16
+ # http://snowball.tartarus.org/algorithms/spanish/stemmer.html
17
+ #
18
+ # = License -- Licencia
19
+ # This code is provided under the terms of the {MIT License.}[http://www.opensource.org/licenses/mit-license.php]
20
+ #
21
+ # = Authors
22
+ # * Manuel A. Güílamo maguilamo.c@gmail.com
23
+ #
24
+
25
+ module EStem
26
+ ##
27
+ # For more information, please refer to <b>String#es_stem</b> method, also <b>EStem</b>.
28
+ # :method: estem
29
+
30
+ ##
31
+ #This method stem Spanish words.
32
+ #
33
+ # "albergues".es_stem # ==> "alberg"
34
+ # "habitaciones".es_stem # ==> "habit"
35
+ # "ALbeRGues".es_stem # ==> "ALbeRG"
36
+ # "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
37
+ # "Hacinamiento".es_stem # ==> "Hacin"
38
+ #
39
+ #If you are not aware of the codeset the data have, try using
40
+ #String#safe_es_stem instead.
41
+ #
42
+ #:call-seq:
43
+ # str.es_stem => "new_str"
44
+ def es_stem
45
+ str = self.dup
46
+ case str.length
47
+ when 0
48
+ return str
49
+ when 1
50
+ return remove_accent(str)
51
+ end
52
+
53
+ step0(str)
54
+ unless step1(str)
55
+ step2b(str) unless step2a(str)
56
+ end
57
+
58
+ step3(str)
59
+ remove_accent(str)
60
+ end
61
+
62
+ ##
63
+ #Use this method in case you are not aware of the codeset the data being
64
+ #handle have. This method returns a new string with the same codeset as
65
+ #the original. Be aware that this method is a bit slower than String#es_stem
66
+ #:call-seq:
67
+ # str.safe_es_stem => "new_str"
68
+ def safe_es_stem
69
+ if str.encoding == Encoding::UTF_8
70
+ # remove invalid characters
71
+ return self.chars.select{|c| c.valid_encoding? }.join.es_stem
72
+ end
73
+
74
+ unless self.valid_encoding?
75
+ tmp = self.dup
76
+ if tmp.force_encoding('UTF-8').valid_encoding?
77
+ begin
78
+ return tmp.es_stem
79
+ rescue
80
+ end
81
+ end
82
+ end
83
+
84
+ default_enc = self.encoding.name
85
+ str = self.chars.select{|c| c.valid_encoding? }.join
86
+
87
+ return nil if str.empty?
88
+
89
+ begin
90
+ tmp = str.encode('UTF-8', str.encoding.name).es_stem
91
+ return tmp.encode(default_enc, 'UTF-8');
92
+ rescue
93
+ return nil
94
+ end
95
+ end
96
+
97
+ # :stopdoc:
98
+
99
+ private
100
+
101
+ def vowel?(c)
102
+ VOWEL.include?(c)
103
+ end
104
+
105
+ def consonant?(c)
106
+ CONSONANT.include?(c)
107
+ end
108
+
109
+ def remove_accent(str)
110
+ str.tr('áéíóúÁÉÍÓÚ','aeiouAEIOU')
111
+ end
112
+
113
+ def rv(str)
114
+ if consonant? str[1]
115
+ i=2
116
+ i+=1 while str[i] and consonant? str[i]
117
+ return str.nil? ? str.length-1 : i+1
118
+ end
119
+
120
+ if vowel? str[0] and vowel? str[1]
121
+ i=2
122
+ i+=1 while str[i] and vowel? str[i]
123
+ return str.nil? ? str.length-1 : i+1
124
+ end
125
+
126
+ return 3 if consonant? str[0] and vowel? str[1]
127
+
128
+ str.length - 1
129
+ end
130
+
131
+ def r(str, i=0)
132
+ i+=1 while str[i] and consonant?(str[i])
133
+ i+=1
134
+ i+=1 while str[i] and vowel? str[i]
135
+ str[i].nil? ? str.length : i+1
136
+ end
137
+
138
+ def r12(str)
139
+ r1 = r(str)
140
+ r2 = r(str,r1)
141
+ [r1,r2]
142
+ end
143
+
144
+ #=> true or false
145
+ def step0(str)
146
+ return false unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
147
+
148
+ suffix = $&
149
+ rv_text = str[rv(str)..-1]
150
+
151
+ case rv_text
152
+ when %r{((?<=i[éÉ]ndo|[áÁ]ndo|[áéíÁÉÍ]r)#{suffix})$}ui
153
+ str[%r{#$&$}]=''
154
+ str.replace(remove_accent(str))
155
+ return true
156
+ when %r{((?<=iendo|ando|[aei]r)#{suffix})$}i
157
+ str[%r{#$&$}]=''
158
+ return true
159
+ end
160
+
161
+ if rv_text =~ /yendo/i and str =~ /uyendo/i
162
+ str[suffix]=''
163
+ return true
164
+ end
165
+ false
166
+ end
167
+
168
+ #=> true or false
169
+ def step1(str)
170
+ r1,r2 = r12(str)
171
+ r1_text = str[r1..-1]
172
+ r2_text = str[r2..-1]
173
+
174
+ case r2_text
175
+ when /(anzas?|ic[oa]s?|ismos?|[ai]bles?|istas?|os[oa]s?|[ai]mientos?)$/i
176
+ str[%r{#$&$}]=''
177
+ return true
178
+ when /(ic)?(ador([ae]s?)?|aci[óÓ]n|aciones|antes?|ancias?)$/ui
179
+ str[%r{#$&$}]=''
180
+ return true
181
+ when /log[íÍ]as?/ui
182
+ str[%r{#$&$}]='log'
183
+ return true
184
+ when /(uci([óÓ]n|ones))$/ui
185
+ str[%r{#$&$}]='u'
186
+ return true
187
+ when /(encias?)$/i
188
+ str[%r{#$&$}]='ente'
189
+ return true
190
+ end
191
+
192
+ if r2_text =~ /(ativ|iv|os|ic|ad)amente$/i or r1_text =~ /amente$/i
193
+ str[%r{#$&$}]=''
194
+ return true
195
+ end
196
+
197
+ case r2_text
198
+ when /((ante|[ai]ble)?mente)$/i, /((abil|i[cv])?idad(es)?)$/i, /((at)?iv[ao]s?)$/i
199
+ str[%r{#$&$}]=''
200
+ return true
201
+ end
202
+ false
203
+ end
204
+
205
+ #=> true or false
206
+ def step2a(str)
207
+ rv_pos = rv(str)
208
+ idx = str[rv_pos..-1] =~ /(y[oóÓ]|ye(ron|ndo)|y[ae][ns]?|ya(is|mos))$/ui
209
+
210
+ return false unless idx
211
+
212
+ if 'u' == str[rv_pos+idx-1].downcase
213
+ str[%r{#$&$}] = ''
214
+ return true
215
+ end
216
+ false
217
+ end
218
+
219
+ STEP2B_REGEXP = /(
220
+ ar([áÁ][ns]?|a(n|s|is)?|on)? | ar([éÉ]is|emos|é|É) | ar[íÍ]a(n|s|is|mos)? |
221
+ er([áÁ][sn]?|[éÉ](is)?|emos|[íÍ]a(n|s|is|mos)?)? |
222
+ ir([íÍ]a(s|n|is|mos)?|[áÁ][ns]?|emos|[éÉ]|éis)? | aba(s|n|is)? |
223
+ ad([ao]s?)? | ed | id(a|as|o|os)? | [íÍ]a(n|s|is|mos)? | [íÍ]s |
224
+ as(e[ns]?|te|eis|teis)? | [áÁ](is|bamos|semos|ramos) | a(n|ndo|mos) |
225
+ ie(ra|se|ran|sen|ron|ndo|ras|ses|rais|seis) | i(ste|steis|[óÓ]|mos|[éÉ]ramos|[éÉ]semos) |
226
+ en|es|[éÉ]is|emos
227
+ )$/xiu
228
+
229
+ #=> true or false
230
+ def step2b(str)
231
+ rv_pos = rv(str)
232
+
233
+ if idx = str[rv_pos..-1] =~ STEP2B_REGEXP
234
+ suffix = $&
235
+ if suffix =~ /^(en|es|[éÉ]is|emos)$/ui
236
+ str[%r{#{suffix}$}]=''
237
+ str[rv_pos+idx-1]='' if str[rv_pos+idx-2] =~ /g/i and str[rv_pos+idx-1] =~ /u/i
238
+ else
239
+ str[%r{#{suffix}$}]=''
240
+ end
241
+ return true
242
+ end
243
+ false
244
+ end
245
+
246
+ #=> true or false
247
+ def step3(str)
248
+ rv_pos = rv(str)
249
+ rv_text = str[rv_pos..-1]
250
+
251
+ if rv_text =~ /(os|[aoáíóÁÍÓ])$/ui
252
+ str[%r{#$&$}]=''
253
+ return true
254
+ elsif idx = rv_text =~ /(u?[eéÉ])$/i
255
+ if $&[0].downcase == 'u' and str[rv_pos+idx-1].downcase == 'g'
256
+ str[%r{#$&$}]=''
257
+ else
258
+ str.chop!
259
+ end
260
+ return true
261
+ end
262
+ false
263
+ end
264
+
265
+ VOWEL = 'aeiouáéíóúüAEIOUÁÉÍÓÚÜ'
266
+ CONSONANT = "bcdfghjklmnñpqrstvwxyzABCDEFGHIJKLMNÑOPQRSTUVWXYZ"
267
+ end
268
+
269
+ class String
270
+ include EStem
271
+ end