estem 0.2.3 → 0.2.4
Sign up to get free protection for your applications and to get access to all the features.
- data/COPYRIGHT +19 -0
- data/ChangeLog +27 -0
- data/README.rdoc +130 -0
- data/bin/es_stem.rb +4 -3
- data/examples/usage.rb +11 -0
- data/lib/estem.rb +54 -15
- metadata +7 -4
- data/lib/estem.rb~ +0 -233
data/COPYRIGHT
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Copyright (c) 2012 Manuel A. Güílamo
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
4
|
+
this software and associated documentation files (the "Software"), to deal in
|
5
|
+
the Software without restriction, including without limitation the rights to
|
6
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
7
|
+
of the Software, and to permit persons to whom the Software is furnished to do
|
8
|
+
so, subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in all
|
11
|
+
copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
19
|
+
SOFTWARE.
|
data/ChangeLog
ADDED
@@ -0,0 +1,27 @@
|
|
1
|
+
Version 0.2.4
|
2
|
+
|
3
|
+
2012-06-25 MaG <maguilamo.c@gmail.com>
|
4
|
+
|
5
|
+
*
|
6
|
+
- ChangeLog new file.
|
7
|
+
- fix README.rdoc added.
|
8
|
+
- fix COPYRIGHT added.
|
9
|
+
- examples/usage.rb: new file
|
10
|
+
|
11
|
+
* README.rdoc:
|
12
|
+
- max 80 cols per line.
|
13
|
+
- recomendation about using safe_es_stem().
|
14
|
+
- Fix Spanish typos.
|
15
|
+
|
16
|
+
* estem.gemspec:
|
17
|
+
- cleanups.
|
18
|
+
- (required_ruby_version): Ruby 1.9.1.
|
19
|
+
|
20
|
+
* bin/es_stem.rb:
|
21
|
+
- chmod a+x .
|
22
|
+
- (es_stem.rb:80): fix case sensitive comparation.
|
23
|
+
- (es_stem.rb:25): removed .rb ext.
|
24
|
+
- (es_stem.rb:29): new version.
|
25
|
+
|
26
|
+
* estem.rb:
|
27
|
+
- (safe_es_stem): new method.
|
data/README.rdoc
ADDED
@@ -0,0 +1,130 @@
|
|
1
|
+
= Spanish Stem Gem
|
2
|
+
|
3
|
+
== Description
|
4
|
+
This gem is for reducing Spanish words to their roots. It uses an algorithm
|
5
|
+
based on Martin Porter's specifications.
|
6
|
+
|
7
|
+
For more information, visit:
|
8
|
+
http://snowball.tartarus.org/algorithms/spanish/stemmer.html
|
9
|
+
|
10
|
+
== Descripción
|
11
|
+
Esta gema está para reducir las palabras del Español en sus respectivas raíces,
|
12
|
+
para ello ultiliza un algoritmo basado en las especificaciones de Martin Porter
|
13
|
+
|
14
|
+
Para más información, visite:
|
15
|
+
http://snowball.tartarus.org/algorithms/spanish/stemmer.html
|
16
|
+
|
17
|
+
== Install -- Instalar
|
18
|
+
|
19
|
+
$ sudo gem install estem
|
20
|
+
or
|
21
|
+
$ gem install estem
|
22
|
+
|
23
|
+
== Usage
|
24
|
+
As a reminder, take in consideration that the Spanish language have several non
|
25
|
+
US-ASCII characters, and because of that, the same data may varied from one
|
26
|
+
codeset to another.
|
27
|
+
|
28
|
+
Please remember to use a UTF-8 compatible encoding while using EStem. Please do
|
29
|
+
not use String#force_encoding() to convert from one codeset to another, you
|
30
|
+
might try using String#encode() but this later is more likely to fail, consider
|
31
|
+
using String#safe_es_stem() when handling incompatibles codesets or the codeset
|
32
|
+
type is unknown.
|
33
|
+
|
34
|
+
require 'estem'
|
35
|
+
|
36
|
+
puts "albergues".es_stem # ==> "alberg"
|
37
|
+
puts "habitaciones".es_stem # ==> "habit"
|
38
|
+
|
39
|
+
# EStem will never make unnecessary changes to your input data.
|
40
|
+
puts "ALbeRGues".es_stem # ==> "ALbeRG"
|
41
|
+
puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
|
42
|
+
puts "Hacinamiento".es_stem # ==> "Hacin"
|
43
|
+
|
44
|
+
You can use <tt>EStem</tt> as a command line tool:
|
45
|
+
$ es_stem --in-enc ISO-8859-1 -f input_file.txt
|
46
|
+
|
47
|
+
for more information type
|
48
|
+
$ es_stem --help
|
49
|
+
|
50
|
+
The <tt>es_stem</tt> program do his best trying to tokenized the lines from
|
51
|
+
the file, you might consider finding an Spanish tokenizer, either way this
|
52
|
+
program do what it is suppose to do, stem Spanish words.
|
53
|
+
|
54
|
+
NOTE: For excellent results, consider replacing one word per line on the files
|
55
|
+
the program handles.
|
56
|
+
|
57
|
+
== Uso
|
58
|
+
Como recordatorio, ten en cosideración que el Castellano posee muchos
|
59
|
+
carácteres que están fuera del código ASCII, y por esta razón, los datos pueden
|
60
|
+
variar de un conjunto de codificación a otro.
|
61
|
+
|
62
|
+
Por favor recuerda utilizar sistemas de condificación compatibles con UTF-8
|
63
|
+
cuando se trabaje con EStem. Por favor no use String#force_encoding para
|
64
|
+
convertir de un conjunto de codificación a otro, podría utilizar String#encode
|
65
|
+
pero este último es más probable que falle en el intento, considere utilizar
|
66
|
+
String#safe_es_stem() si está manejando conjuntos de codificación incompatibles
|
67
|
+
o se desconoce el tipo.
|
68
|
+
|
69
|
+
require 'estem'
|
70
|
+
|
71
|
+
puts "albergues".es_stem # ==> "alberg"
|
72
|
+
puts "habitaciones".es_stem # ==> "habit"
|
73
|
+
|
74
|
+
# EStem nunca hará cambios innecesarios a tus datos.
|
75
|
+
puts "ALbeRGues".es_stem # ==> "ALbeRG"
|
76
|
+
puts "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
|
77
|
+
puts "Hacinamiento".es_stem # ==> "Hacin"
|
78
|
+
|
79
|
+
Para más información ejecuta:
|
80
|
+
$ es_stem --help
|
81
|
+
|
82
|
+
El programa <tt>es_stem</tt> hará lo posible para separar las palabras de cada
|
83
|
+
línea del fichero. Sería sensato utilizar otro programa más especializado para
|
84
|
+
este propósito, de todas maneras, es_stem hace lo que se supone debe hacer,
|
85
|
+
optener las raíces de las palabras.
|
86
|
+
|
87
|
+
NOTA: Para resultados excelentes, considere poner una palabra por línea en los
|
88
|
+
ficheros que pasará el programa.
|
89
|
+
|
90
|
+
== Test
|
91
|
+
|
92
|
+
This test is based on the sample input and output text from Martin Porter
|
93
|
+
website. It includes 28390 test words and their expected stem results.
|
94
|
+
To run the test, just type:
|
95
|
+
rake test
|
96
|
+
|
97
|
+
== Pruebas
|
98
|
+
|
99
|
+
Esta prueba está basada en un archivo de prueba provisto por Martin Porter.
|
100
|
+
Incluye 28390 palabras de prueba con sus resultado esperados. Para realizar
|
101
|
+
la prueba, ejecuta:
|
102
|
+
rake test
|
103
|
+
|
104
|
+
== Thanks -- Agradecimientos
|
105
|
+
|
106
|
+
Ray Pereda https://github.com/raypereda/stemmify/ I used his gem as a guide to
|
107
|
+
package mine. http://guides.rubygems.org/make-your-own-gem/ as well.
|
108
|
+
|
109
|
+
== License -- Licencia
|
110
|
+
|
111
|
+
Copyright (c) 2012 Manuel A. Güílamo
|
112
|
+
|
113
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
114
|
+
a copy of this software and associated documentation files (the
|
115
|
+
"Software"), to deal in the Software without restriction, including
|
116
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
117
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
118
|
+
permit persons to whom the Software is furnished to do so, subject to
|
119
|
+
the following conditions:
|
120
|
+
|
121
|
+
The above copyright notice and this permission notice shall be
|
122
|
+
included in all copies or substantial portions of the Software.
|
123
|
+
|
124
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
125
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
126
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
127
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
128
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
129
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
130
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/bin/es_stem.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
# encoding: UTF-8
|
3
|
+
# :stopdoc:
|
3
4
|
|
4
5
|
# Copyright (c) 2012 Manuel A. Güílamo
|
5
6
|
#
|
@@ -21,11 +22,11 @@
|
|
21
22
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
22
23
|
# SOFTWARE.
|
23
24
|
|
24
|
-
require 'estem
|
25
|
+
require 'estem'
|
25
26
|
require 'getoptlong'
|
26
27
|
require 'iconv'
|
27
28
|
|
28
|
-
$version = "0.1.
|
29
|
+
$version = "0.1.10"
|
29
30
|
|
30
31
|
def usage(error=false)
|
31
32
|
out = error ? $stderr : $stdout
|
@@ -76,7 +77,7 @@ end
|
|
76
77
|
|
77
78
|
if filename
|
78
79
|
begin
|
79
|
-
if ienc and ienc!='UTF-8'
|
80
|
+
if ienc and ienc.upcase !='UTF-8'
|
80
81
|
file = File.open(filename, "r:#{ienc}:UTF-8")
|
81
82
|
else
|
82
83
|
file = File.open(filename, 'r:UTF-8')
|
data/examples/usage.rb
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
require 'estem'
|
2
|
+
|
3
|
+
hsh = Hash.new
|
4
|
+
|
5
|
+
words = ['albergues','habitaciones','Albergues','ALbeRGues','HaBiTaCiOnEs',
|
6
|
+
'Hacinamiento','mujeres','muchedumbre','ocasionalmente']
|
7
|
+
|
8
|
+
words.each do|w|
|
9
|
+
stem = w.es_stem
|
10
|
+
puts "Word: #{w}\nStem: #{stem}\n\n"
|
11
|
+
end
|
data/lib/estem.rb
CHANGED
@@ -19,27 +19,30 @@
|
|
19
19
|
# This code is provided under the terms of the {MIT License.}[http://www.opensource.org/licenses/mit-license.php]
|
20
20
|
#
|
21
21
|
# = Authors
|
22
|
-
# * Manuel A. Güílamo
|
22
|
+
# * Manuel A. Güílamo maguilamo.c@gmail.com
|
23
23
|
#
|
24
24
|
|
25
|
+
require 'iconv'
|
26
|
+
|
25
27
|
module EStem
|
26
28
|
##
|
29
|
+
# For more information, please refer to <b>String#es_stem</b> method, also <b>EStem</b>.
|
27
30
|
# :method: estem
|
28
|
-
# For more information, please see <b>String#es_stem</b> method, also <b>EStem</b>.
|
29
|
-
|
30
|
-
|
31
|
-
##
|
32
|
-
#This method stem Spanish words.
|
33
|
-
#
|
34
|
-
# "albergues".es_stem # ==> "alberg"
|
35
|
-
# "habitaciones".es_stem # ==> "habit"
|
36
|
-
# "ALbeRGues".es_stem # ==> "ALbeRG"
|
37
|
-
# "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
|
38
|
-
# "Hacinamiento".es_stem # ==> "Hacin"
|
39
|
-
#
|
40
|
-
#:call-seq:
|
41
|
-
# str.es_stem => "new_str"
|
42
31
|
|
32
|
+
##
|
33
|
+
#This method stem Spanish words.
|
34
|
+
#
|
35
|
+
# "albergues".es_stem # ==> "alberg"
|
36
|
+
# "habitaciones".es_stem # ==> "habit"
|
37
|
+
# "ALbeRGues".es_stem # ==> "ALbeRG"
|
38
|
+
# "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
|
39
|
+
# "Hacinamiento".es_stem # ==> "Hacin"
|
40
|
+
#
|
41
|
+
#If you are not aware of the codeset the data has, then use
|
42
|
+
#String#safe_es_stem instead.
|
43
|
+
#
|
44
|
+
#:call-seq:
|
45
|
+
# str.es_stem => "new_str"
|
43
46
|
def es_stem
|
44
47
|
str = self.dup
|
45
48
|
return remove_accent(str) if str.length == 1
|
@@ -59,6 +62,42 @@ module EStem
|
|
59
62
|
remove_accent(str)
|
60
63
|
end
|
61
64
|
|
65
|
+
##
|
66
|
+
#Use this method in case you are not aware of the codeset the data being
|
67
|
+
#handle has. This method returns a new string with the same codeset as
|
68
|
+
#the original. Be aware that this method is slower than String#es_stem()
|
69
|
+
#:call-seq:
|
70
|
+
# str.safe_es_stem => "new_str"
|
71
|
+
def safe_es_stem
|
72
|
+
return self.es_stem if self.encoding == Encoding::UTF_8
|
73
|
+
|
74
|
+
default_enc = self.encoding.name
|
75
|
+
|
76
|
+
str = self.dup.force_encoding('UTF-8')
|
77
|
+
|
78
|
+
if str.valid_encoding?
|
79
|
+
begin
|
80
|
+
tmp = str.es_stem
|
81
|
+
return tmp.force_encoding(default_enc)
|
82
|
+
rescue
|
83
|
+
end
|
84
|
+
end
|
85
|
+
|
86
|
+
if enc = Encoding.compatible?(self, VOWEL)
|
87
|
+
begin
|
88
|
+
return self.encode(enc).es_stem
|
89
|
+
rescue
|
90
|
+
end
|
91
|
+
end
|
92
|
+
|
93
|
+
begin
|
94
|
+
tmp = Iconv.conv('UTF-8', self.encoding.name, self).es_stem
|
95
|
+
return Iconv.conv(default_enc, 'UTF-8', tmp);
|
96
|
+
rescue
|
97
|
+
return nil
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
62
101
|
# :stopdoc:
|
63
102
|
|
64
103
|
private
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: estem
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.4
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-
|
12
|
+
date: 2012-06-25 00:00:00.000000000 Z
|
13
13
|
dependencies: []
|
14
14
|
description: Spanish stemming. Based on Martin Porter's specifications. See README
|
15
15
|
file for more information.
|
@@ -21,7 +21,10 @@ files:
|
|
21
21
|
- Rakefile
|
22
22
|
- bin/es_stem.rb
|
23
23
|
- lib/estem.rb
|
24
|
-
-
|
24
|
+
- examples/usage.rb
|
25
|
+
- COPYRIGHT
|
26
|
+
- README.rdoc
|
27
|
+
- ChangeLog
|
25
28
|
- test/diffs.txt
|
26
29
|
- test/test_estem.rb
|
27
30
|
homepage: https://github.com/MaG21/estem
|
@@ -35,7 +38,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
35
38
|
requirements:
|
36
39
|
- - ! '>='
|
37
40
|
- !ruby/object:Gem::Version
|
38
|
-
version: 1.9.
|
41
|
+
version: 1.9.1
|
39
42
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
40
43
|
none: false
|
41
44
|
requirements:
|
data/lib/estem.rb~
DELETED
@@ -1,233 +0,0 @@
|
|
1
|
-
# encoding: UTF-8
|
2
|
-
#
|
3
|
-
# :title: Spanish Stemming
|
4
|
-
# = Description
|
5
|
-
# This gem is for reducing Spanish words to their roots. It uses an algorithm
|
6
|
-
# based on Martin Porter's specifications.
|
7
|
-
#
|
8
|
-
# For more information, visit:
|
9
|
-
# http://snowball.tartarus.org/algorithms/spanish/stemmer.html
|
10
|
-
#
|
11
|
-
# = Descripción
|
12
|
-
# Esta gema está para reducir las palabras del Español en sus respectivas raíces,
|
13
|
-
# para ello ultiliza un algoritmo basado en las especificaciones de Martin Porter
|
14
|
-
#
|
15
|
-
# Para más información, visite:
|
16
|
-
# http://snowball.tartarus.org/algorithms/spanish/stemmer.html
|
17
|
-
#
|
18
|
-
# = License -- Licencia
|
19
|
-
# This code is provided under the terms of the {MIT License.}[http://www.opensource.org/licenses/mit-license.php]
|
20
|
-
#
|
21
|
-
# = Authors
|
22
|
-
# * Manuel A. Güílamo
|
23
|
-
#
|
24
|
-
|
25
|
-
module EStem
|
26
|
-
##
|
27
|
-
# :method: estem
|
28
|
-
# For more information, please see <b>String#es_stem</b> method, also <b>EStem</b>.
|
29
|
-
|
30
|
-
|
31
|
-
##
|
32
|
-
#This method reduces Spanish words to their root.
|
33
|
-
#
|
34
|
-
# "albergues".es_stem # ==> "alberg"
|
35
|
-
# "habitaciones".es_stem # ==> "habit"
|
36
|
-
# "ALbeRGues".es_stem # ==> "ALbeRG"
|
37
|
-
# "HaBiTaCiOnEs".es_stem # ==> "HaBiT"
|
38
|
-
# "Hacinamiento".es_stem # ==> "Hacin"
|
39
|
-
#
|
40
|
-
#:call-seq:
|
41
|
-
# str.es_stem => "new_str"
|
42
|
-
|
43
|
-
def es_stem
|
44
|
-
str = self.dup
|
45
|
-
return remove_accent(str) if str.length == 1
|
46
|
-
tmp = step0(str)
|
47
|
-
str = tmp ? tmp : str
|
48
|
-
|
49
|
-
unless tmp = step1(str)
|
50
|
-
unless tmp = step2a(str)
|
51
|
-
tmp = step2b(str)
|
52
|
-
str = tmp ? tmp : str
|
53
|
-
else
|
54
|
-
str = tmp
|
55
|
-
end
|
56
|
-
end
|
57
|
-
tmp = step3(str)
|
58
|
-
str = tmp.nil? ? str : tmp
|
59
|
-
remove_accent(str)
|
60
|
-
end
|
61
|
-
|
62
|
-
# :stopdoc:
|
63
|
-
|
64
|
-
private
|
65
|
-
|
66
|
-
def vowel?(c)
|
67
|
-
VOWEL.include?(c)
|
68
|
-
end
|
69
|
-
|
70
|
-
def consonant?(c)
|
71
|
-
CONSONANT.include?(c)
|
72
|
-
end
|
73
|
-
|
74
|
-
def remove_accent(str)
|
75
|
-
str.tr('áéíóúÁÉÍÓÚ','aeiouAEIOU')
|
76
|
-
end
|
77
|
-
|
78
|
-
def rv(str)
|
79
|
-
if consonant? str[1]
|
80
|
-
i=2
|
81
|
-
i+=1 while str[i] and consonant? str[i]
|
82
|
-
return str.nil? ? str.length-1 : i+1
|
83
|
-
end
|
84
|
-
|
85
|
-
if vowel? str[0] and vowel? str[1]
|
86
|
-
i=2
|
87
|
-
i+=1 while str[i] and vowel? str[i]
|
88
|
-
return str.nil? ? str.length-1 : i+1
|
89
|
-
end
|
90
|
-
|
91
|
-
return 3 if consonant? str[0] and vowel? str[1]
|
92
|
-
|
93
|
-
str.length - 1
|
94
|
-
end
|
95
|
-
|
96
|
-
def r(str, i=0)
|
97
|
-
i+=1 while str[i] and consonant?(str[i])
|
98
|
-
i+=1
|
99
|
-
i+=1 while str[i] and vowel? str[i]
|
100
|
-
str[i].nil? ? str.length : i+1
|
101
|
-
end
|
102
|
-
|
103
|
-
def r12(str)
|
104
|
-
r1 = r(str)
|
105
|
-
r2 = r(str,r1)
|
106
|
-
[r1,r2]
|
107
|
-
end
|
108
|
-
|
109
|
-
def step0(str)
|
110
|
-
return nil unless str =~ /(se(l[ao]s?)?|l([aeo]s?)|me|nos)$/i
|
111
|
-
|
112
|
-
suffix = $&
|
113
|
-
rv_text = str[rv(str)..-1]
|
114
|
-
|
115
|
-
case rv_text
|
116
|
-
when %r{((?<=i[éÉ]ndo|[áÁ]ndo|[áéíÁÉÍ]r)#{suffix})$}ui
|
117
|
-
str[%r{#$&$}]=''
|
118
|
-
str = remove_accent(str)
|
119
|
-
return str
|
120
|
-
when %r{((?<=iendo|ando|[aei]r)#{suffix})$}i
|
121
|
-
str[%r{#$&$}]=''
|
122
|
-
return str
|
123
|
-
end
|
124
|
-
|
125
|
-
if rv_text =~ /yendo/i and str =~ /uyendo/i
|
126
|
-
str[suffix]=''
|
127
|
-
return str
|
128
|
-
end
|
129
|
-
nil
|
130
|
-
end
|
131
|
-
|
132
|
-
#=> new_str or nil
|
133
|
-
def step1(str)
|
134
|
-
r1,r2 = r12(str)
|
135
|
-
r1_text = str[r1..-1]
|
136
|
-
r2_text = str[r2..-1]
|
137
|
-
|
138
|
-
case r2_text
|
139
|
-
when /(anzas?|ic[oa]s?|ismos?|[ai]bles?|istas?|os[oa]s?|[ai]mientos?)$/i
|
140
|
-
str[%r{#$&$}]=''
|
141
|
-
return str
|
142
|
-
when /(ic)?(ador([ae]s?)?|aci[óÓ]n|aciones|antes?|ancias?)$/ui
|
143
|
-
str[%r{#$&$}]=''
|
144
|
-
return str
|
145
|
-
when /log[íÍ]as?/ui
|
146
|
-
str[%r{#$&$}]='log'
|
147
|
-
return str
|
148
|
-
when /(uci([óÓ]n|ones))$/ui
|
149
|
-
str[%r{#$&$}]='u'
|
150
|
-
return str
|
151
|
-
when /(encias?)$/i
|
152
|
-
str[%r{#$&$}]='ente'
|
153
|
-
return str
|
154
|
-
end
|
155
|
-
|
156
|
-
if r2_text =~ /(ativ|iv|os|ic|ad)amente$/i or r1_text =~ /amente$/i
|
157
|
-
str[%r{#$&$}]=''
|
158
|
-
return str
|
159
|
-
end
|
160
|
-
|
161
|
-
case r2_text
|
162
|
-
when /((ante|[ai]ble)?mente)$/i, /((abil|i[cv])?idad(es)?)$/i, /((at)?iv[ao]s?)$/i
|
163
|
-
str[%r{#$&$}]=''
|
164
|
-
return str
|
165
|
-
end
|
166
|
-
nil
|
167
|
-
end
|
168
|
-
|
169
|
-
#=> nil or new_str
|
170
|
-
def step2a(str)
|
171
|
-
rv_pos = rv(str)
|
172
|
-
idx = str[rv_pos..-1] =~ /(y[oóÓ]|ye(ron|ndo)|y[ae][ns]?|ya(is|mos))$/ui
|
173
|
-
|
174
|
-
return nil unless idx
|
175
|
-
|
176
|
-
if 'u' == str[rv_pos+idx-1].downcase
|
177
|
-
str[%r{#$&$}] = ''
|
178
|
-
return str
|
179
|
-
end
|
180
|
-
nil
|
181
|
-
end
|
182
|
-
|
183
|
-
STEP2B_REGEXP = /(
|
184
|
-
ar([áÁ][ns]?|a(n|s|is)?|on)? | ar([éÉ]is|emos|é|É) | ar[íÍ]a(n|s|is|mos)? |
|
185
|
-
er([áÁ][sn]?|[éÉ](is)?|emos|[íÍ]a(n|s|is|mos)?)? |
|
186
|
-
ir([íÍ]a(s|n|is|mos)?|[áÁ][ns]?|emos|[éÉ]|éis)? | aba(s|n|is)? |
|
187
|
-
ad([ao]s?)? | ed | id(a|as|o|os)? | [íÍ]a(n|s|is|mos)? | [íÍ]s |
|
188
|
-
as(e[ns]?|te|eis|teis)? | [áÁ](is|bamos|semos|ramos) | a(n|ndo|mos) |
|
189
|
-
ie(ra|se|ran|sen|ron|ndo|ras|ses|rais|seis) | i(ste|steis|[óÓ]|mos|[éÉ]ramos|[éÉ]semos) |
|
190
|
-
en|es|[éÉ]is|emos
|
191
|
-
)$/xiu
|
192
|
-
|
193
|
-
def step2b(str)
|
194
|
-
rv_pos = rv(str)
|
195
|
-
|
196
|
-
if idx = str[rv_pos..-1] =~ STEP2B_REGEXP
|
197
|
-
suffix = $&
|
198
|
-
if suffix =~ /^(en|es|[éÉ]is|emos)$/ui
|
199
|
-
str[%r{#{suffix}$}]=''
|
200
|
-
str[rv_pos+idx-1]='' if str[rv_pos+idx-2] =~ /g/i and str[rv_pos+idx-1] =~ /u/i
|
201
|
-
else
|
202
|
-
str[%r{#{suffix}$}]=''
|
203
|
-
end
|
204
|
-
return str
|
205
|
-
end
|
206
|
-
nil
|
207
|
-
end
|
208
|
-
|
209
|
-
def step3(str)
|
210
|
-
rv_pos = rv(str)
|
211
|
-
rv_text = str[rv_pos..-1]
|
212
|
-
|
213
|
-
if rv_text =~ /(os|[aoáíóÁÍÓ])$/ui
|
214
|
-
str[%r{#$&$}]=''
|
215
|
-
return str
|
216
|
-
elsif idx = rv_text =~ /(u?[eéÉ])$/i
|
217
|
-
if $&[0].downcase == 'u' and str[rv_pos+idx-1].downcase == 'g'
|
218
|
-
str[%r{#$&$}]=''
|
219
|
-
else
|
220
|
-
str.chop!
|
221
|
-
end
|
222
|
-
return str
|
223
|
-
end
|
224
|
-
nil
|
225
|
-
end
|
226
|
-
|
227
|
-
VOWEL = 'aeiouáéíóúüAEIOUÁÉÍÓÚÜ'
|
228
|
-
CONSONANT = "bcdfghjklmnñpqrstvwxyzABCDEFGHIJKLMNÑOPQRSTUVWXYZ"
|
229
|
-
end
|
230
|
-
|
231
|
-
class String
|
232
|
-
include EStem
|
233
|
-
end
|