russian_metaphone 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +18 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +84 -0
- data/Rakefile +1 -0
- data/lib/russian_metaphone/filter/breath_consonants.rb +46 -0
- data/lib/russian_metaphone/filter/duplicates_removal.rb +21 -0
- data/lib/russian_metaphone/filter/lastname_ending.rb +32 -0
- data/lib/russian_metaphone/filter/normalization.rb +16 -0
- data/lib/russian_metaphone/filter/replacement.rb +36 -0
- data/lib/russian_metaphone/filter.rb +6 -0
- data/lib/russian_metaphone/version.rb +4 -0
- data/lib/russian_metaphone.rb +21 -0
- data/russian_metaphone.gemspec +22 -0
- data/spec/breath_consonants_filter_spec.rb +23 -0
- data/spec/duplicates_removal_filter_spec.rb +16 -0
- data/spec/lastname_ending_filter_spec.rb +46 -0
- data/spec/normalization_filter_spec.rb +24 -0
- data/spec/replacement_filter_spec.rb +33 -0
- data/spec/spec_helper.rb +2 -0
- metadata +104 -0
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2013 Pavlo V. Lysov
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,84 @@
|
|
1
|
+
# RussianMetaphone
|
2
|
+
|
3
|
+
This gem provides an implementation of 'Metaphone' phonetic algorithm adapted for Russian language. Check [this Wikipedia article](http://rubydoc.info/gems/carrierwave/frames) for Metaphone intro.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
Add this line to your application's Gemfile:
|
8
|
+
|
9
|
+
gem 'russian_metaphone'
|
10
|
+
|
11
|
+
And then execute:
|
12
|
+
|
13
|
+
$ bundle
|
14
|
+
|
15
|
+
Or install it yourself as:
|
16
|
+
|
17
|
+
$ gem install russian_metaphone
|
18
|
+
|
19
|
+
## Usage
|
20
|
+
|
21
|
+
Going to switch to Russian at this point... Drop me a line if you'd like me to translate the Usage section to English...
|
22
|
+
|
23
|
+
Алгоритм русского "Метафон" был предложен Петром Каньковски более 10 лет назад. Именно он и лег в основу этой реализации. Оригинал статьи, за давностью лет, не сохранился, но его можно посмотреть в [архиве](http://web.archive.org/web/20071107145942/http://kankowski.narod.ru/dev/metaphoneru.htm).
|
24
|
+
|
25
|
+
Реализация RussianMetaphone не претендует на высокую производительность. Основной упор сделан на модульность реализации - позволяет легко менять, настраивать, тестировать и подстраивать его под нужды конкретного проекта. Думаю оптимизация хорошо настроенного алгоритма не будет сложной задачей, гораздо сложнее "оттюнить" сам алгоритм.
|
26
|
+
|
27
|
+
### Как пользоваться
|
28
|
+
|
29
|
+
```ruby
|
30
|
+
puts RussianMetaphone::process("Ахматова") # => ахмат%5
|
31
|
+
puts RussianMetaphone::process("Бродский") # => працк%9
|
32
|
+
puts RussianMetaphone::process("Мальденштам") # => малдинштам
|
33
|
+
```
|
34
|
+
|
35
|
+
### Как это работает
|
36
|
+
|
37
|
+
Входные данные проходят через набор фильтров и каждый фильтр по-своему модифицирует строку. Та строка, которую вернет последний в цепочке фильтр и будет конечным результатом.
|
38
|
+
|
39
|
+
### Фильтры
|
40
|
+
|
41
|
+
Фильтр - это руби модуль или экземпляр класса, который реализует метод *filter*:
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
def filter(string, options = {})
|
45
|
+
```
|
46
|
+
|
47
|
+
результатом выполнения фильтра должна быть строка.
|
48
|
+
|
49
|
+
RussianMetaphone имеет готовый набор фильтров для работы с именами и фамилиями, эти фильтры перечислены ниже. Вы можете добавить свои фильтры в цепочку если алгоритм не совсем четко справляется с Вашей задачей. Про добавление "кастомных" фильтров в цепочку смотрите ниже.
|
50
|
+
|
51
|
+
#### RussianMetaphone::Filter::Normalization
|
52
|
+
|
53
|
+
Нормализует строку - убирает из нее все не кириллическое, а так-же символы твердого и мягкого знаков ('Ъ' и 'Ь')
|
54
|
+
|
55
|
+
#### RussianMetaphone::Filter::DuplicatesRemoval
|
56
|
+
|
57
|
+
Исключает повторяющиеся символы - (Метревели многие напишут как Метревелли)
|
58
|
+
|
59
|
+
#### RussianMetaphone::Filter::LastnameEnding
|
60
|
+
|
61
|
+
При работе с фамилиями бывает полезным заменить часто употребимые окончания фамилий на что-то более короткое. Этот фильтр заменяет окончание *овский* на *%1*, *евский* на *%2* и т.д. Остальные замены см. в исходниках.
|
62
|
+
|
63
|
+
#### RussianMetaphone::Filter::Replacement
|
64
|
+
|
65
|
+
Заменяет символы следующим образом:
|
66
|
+
|
67
|
+
* ТС, ДС - заменяются на Ц
|
68
|
+
* ЙО, ИО, ЙЕ, ИЕ - заменяются на И
|
69
|
+
* О, Ы, А, Я - заменяются на А
|
70
|
+
* Ю, У - заменяются на У
|
71
|
+
* Е, Ё, Э - заменяются на И
|
72
|
+
|
73
|
+
#### RussianMetaphone::Filter::BreathConsonants
|
74
|
+
|
75
|
+
Производит оглушение согласных в слабой позиции. См. исходник - там описаны детали.
|
76
|
+
|
77
|
+
|
78
|
+
## Contributing
|
79
|
+
|
80
|
+
1. Fork it
|
81
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
82
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
83
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
84
|
+
5. Create new Pull Request
|
data/Rakefile
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
@@ -0,0 +1,46 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
module RussianMetaphone
|
3
|
+
module Filter
|
4
|
+
# Оглушение согласных в слабой позиции
|
5
|
+
module BreathConsonants
|
6
|
+
|
7
|
+
REZONANTS = %w(л м н р) # Сонорные согласные (те звонкие, у которых нет глухой пары?)
|
8
|
+
VOICED_VS_VOICELESS = { 'б' => 'п', 'в' => 'ф', 'г' => 'к', 'д' => 'т', 'ж' => 'ж', 'з' => 'с' }
|
9
|
+
VOICED = VOICED_VS_VOICELESS.keys
|
10
|
+
VOICELESS = VOICED_VS_VOICELESS.values
|
11
|
+
|
12
|
+
def filter(string, options = {})
|
13
|
+
options[:skip_if_before_rezonant] = false if !options.has_key?(:skip_if_before_rezonant)
|
14
|
+
previous_char = nil
|
15
|
+
|
16
|
+
string.each_char.each_with_index do |current_char, ind|
|
17
|
+
if VOICED.include?(previous_char) &&
|
18
|
+
(
|
19
|
+
VOICELESS.include?(current_char) ||
|
20
|
+
(!options[:skip_if_before_rezonant] && REZONANTS.include?(current_char))
|
21
|
+
)
|
22
|
+
string[ind-1] = VOICED_VS_VOICELESS[previous_char]
|
23
|
+
end
|
24
|
+
previous_char = current_char
|
25
|
+
end
|
26
|
+
|
27
|
+
# Звонкие согласные превращаются в глухие в конце слова
|
28
|
+
string[string.length-1] = VOICED_VS_VOICELESS[previous_char] if VOICED.include?(previous_char)
|
29
|
+
|
30
|
+
string
|
31
|
+
end
|
32
|
+
module_function :filter
|
33
|
+
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
# Слабой считается такая позиция (место в слове) звука,
|
40
|
+
# при которой он слышится неясно, неотчётливо.
|
41
|
+
#
|
42
|
+
# Такими позициями для согласных звуков являются
|
43
|
+
# 1) расположение согласного звука в конце слова: дуб [дуп], верблюд [вирблют];
|
44
|
+
# 2) расположение согласного звука перед другим согласным (кроме сонорных) -
|
45
|
+
# при так называемом стечении согласных, когда их несколько в слове:
|
46
|
+
# пробка [пропка], скобка [скопка];
|
@@ -0,0 +1,21 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
module RussianMetaphone
|
3
|
+
module Filter
|
4
|
+
# Исключение повторяющихся символов
|
5
|
+
module DuplicatesRemoval
|
6
|
+
|
7
|
+
def filter(string, options = {})
|
8
|
+
previous_char = nil
|
9
|
+
|
10
|
+
string.each_char.each_with_index do |current_char, ind|
|
11
|
+
string.slice!(ind-1) if previous_char == current_char
|
12
|
+
previous_char = current_char
|
13
|
+
end
|
14
|
+
|
15
|
+
string
|
16
|
+
end
|
17
|
+
module_function :filter
|
18
|
+
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
@@ -0,0 +1,32 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
module RussianMetaphone
|
3
|
+
module Filter
|
4
|
+
module LastnameEnding
|
5
|
+
|
6
|
+
REPLACEMENTS = {
|
7
|
+
/овский$/ => '%1',
|
8
|
+
/евский$/ => '%2',
|
9
|
+
/овская$/ => '%3',
|
10
|
+
/евская$/ => '%4',
|
11
|
+
/иева$|еева$|ова$|ева$/ => '%5',
|
12
|
+
/иев$|еев$|ов$|ев$/ => '%6',
|
13
|
+
/нко$/ => '%7',
|
14
|
+
/ая$/ => '%8',
|
15
|
+
/ий$|ый$/ => '%9',
|
16
|
+
/ых$|их$/ => '%10',
|
17
|
+
/ин$/ => '%11',
|
18
|
+
/ик$|ек$/ => '%12',
|
19
|
+
/ук$|юк$/ => '%13'
|
20
|
+
}
|
21
|
+
|
22
|
+
def filter(string, options = {})
|
23
|
+
REPLACEMENTS.each_pair do |regexp, substitution|
|
24
|
+
string.gsub!(regexp, substitution)
|
25
|
+
end
|
26
|
+
string
|
27
|
+
end
|
28
|
+
|
29
|
+
module_function :filter
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
@@ -0,0 +1,16 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
module RussianMetaphone
|
3
|
+
module Filter
|
4
|
+
module Normalization
|
5
|
+
|
6
|
+
STRIP_REGEXP = /[ъь]/
|
7
|
+
|
8
|
+
def filter(string, options = {})
|
9
|
+
string = Unicode.downcase(string.gsub(/\P{Cyrillic}+/, ''))
|
10
|
+
string.gsub(STRIP_REGEXP, '')
|
11
|
+
end
|
12
|
+
|
13
|
+
module_function :filter
|
14
|
+
end
|
15
|
+
end
|
16
|
+
end
|
@@ -0,0 +1,36 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
# Исходные символы | Конечный символ
|
4
|
+
# |
|
5
|
+
# О, Ы, А, Я | А
|
6
|
+
# Ю, У | У
|
7
|
+
# Е, Ё, Э, И | И
|
8
|
+
#
|
9
|
+
# ЙО, ИО, ЙЕ, ИЕ заменяются на И
|
10
|
+
# ТС, ДС заменяются на Ц
|
11
|
+
#
|
12
|
+
module RussianMetaphone
|
13
|
+
module Filter
|
14
|
+
module Replacement
|
15
|
+
|
16
|
+
REPLACEMENTS = {
|
17
|
+
/тс|дс/ => 'ц' ,
|
18
|
+
/йо|ио|йе|ие/ => 'и',
|
19
|
+
/[оыя]/ => 'а',
|
20
|
+
/[ю]/ => 'y',
|
21
|
+
/[еёэ]/ => 'и'
|
22
|
+
}
|
23
|
+
|
24
|
+
def filter(string, options = {})
|
25
|
+
result = String.new(string)
|
26
|
+
REPLACEMENTS.each_pair do |reg, char|
|
27
|
+
result.gsub!(reg, char)
|
28
|
+
end
|
29
|
+
result
|
30
|
+
end
|
31
|
+
module_function :filter
|
32
|
+
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
end
|
@@ -0,0 +1,6 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'russian_metaphone/filter/normalization'
|
3
|
+
require 'russian_metaphone/filter/replacement'
|
4
|
+
require 'russian_metaphone/filter/breath_consonants'
|
5
|
+
require 'russian_metaphone/filter/duplicates_removal'
|
6
|
+
require 'russian_metaphone/filter/lastname_ending'
|
@@ -0,0 +1,21 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require "unicode"
|
3
|
+
require "russian_metaphone/version"
|
4
|
+
require "russian_metaphone/filter"
|
5
|
+
|
6
|
+
module RussianMetaphone
|
7
|
+
def process(source)
|
8
|
+
filters = [
|
9
|
+
RussianMetaphone::Filter::Normalization,
|
10
|
+
RussianMetaphone::Filter::DuplicatesRemoval,
|
11
|
+
RussianMetaphone::Filter::LastnameEnding,
|
12
|
+
RussianMetaphone::Filter::Replacement,
|
13
|
+
RussianMetaphone::Filter::BreathConsonants,
|
14
|
+
RussianMetaphone::Filter::DuplicatesRemoval
|
15
|
+
]
|
16
|
+
result = String.new(source)
|
17
|
+
filters.each { |f| result = f.send(:filter, result) }
|
18
|
+
result
|
19
|
+
end
|
20
|
+
module_function :process
|
21
|
+
end
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'russian_metaphone/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |gem|
|
7
|
+
gem.name = "russian_metaphone"
|
8
|
+
gem.version = RussianMetaphone::VERSION
|
9
|
+
gem.authors = ["Pavlo V. Lysov"]
|
10
|
+
gem.email = ["pavlo@cleverua.com"]
|
11
|
+
gem.description = %q{Implements 'Metaphone' phonetic algorithm adapted for Russian language}
|
12
|
+
gem.summary = %q{Implements 'Metaphone' phonetic algorithm adapted for Russian language, allows easy extending and algorithm tuning.}
|
13
|
+
gem.homepage = "https://github.com/cleverua/russian_metaphone"
|
14
|
+
|
15
|
+
gem.files = `git ls-files`.split($/)
|
16
|
+
gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
|
17
|
+
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
|
+
gem.require_paths = ["lib"]
|
19
|
+
|
20
|
+
gem.add_dependency("unicode", ">= 0.4.4")
|
21
|
+
gem.add_development_dependency "rspec"
|
22
|
+
end
|
@@ -0,0 +1,23 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'spec_helper'
|
3
|
+
|
4
|
+
describe "Breath Consonants Filter" do
|
5
|
+
|
6
|
+
it "should replace voiced with voiceless counterpart if voiced goes right before a voiceless" do
|
7
|
+
RussianMetaphone::Filter::BreathConsonants.filter('мавка').should == 'мафка'
|
8
|
+
RussianMetaphone::Filter::BreathConsonants.filter('обнимать').should == 'опнимать'
|
9
|
+
RussianMetaphone::Filter::BreathConsonants.filter('втгжбсаук').should == 'фткжпсаук'
|
10
|
+
end
|
11
|
+
|
12
|
+
it "skip_if_before_rezonant option is TRUE" do
|
13
|
+
# если 'Б' идет перед сонорным 'Н', тогда его не заменяем, если передан skip_if_before_rezonant
|
14
|
+
RussianMetaphone::Filter::BreathConsonants.filter('обнимать', :skip_if_before_rezonant => true).should == 'обнимать'
|
15
|
+
end
|
16
|
+
|
17
|
+
it "should replace voiced conconant with voiceless at the end of the word" do
|
18
|
+
RussianMetaphone::Filter::BreathConsonants.filter('город').should == 'горот'
|
19
|
+
RussianMetaphone::Filter::BreathConsonants.filter('мороз').should == 'морос'
|
20
|
+
end
|
21
|
+
|
22
|
+
end
|
23
|
+
|
@@ -0,0 +1,16 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'spec_helper'
|
3
|
+
|
4
|
+
describe "Duplicates Removal Filter" do
|
5
|
+
|
6
|
+
it "should remove all duplicates" do
|
7
|
+
RussianMetaphone::Filter::DuplicatesRemoval.filter('Метревелли').should == 'Метревели'
|
8
|
+
RussianMetaphone::Filter::DuplicatesRemoval.filter('Длиннющий').should == 'Длинющий'
|
9
|
+
end
|
10
|
+
|
11
|
+
it "should not do anything if no duplicates available" do
|
12
|
+
RussianMetaphone::Filter::DuplicatesRemoval.filter('новый год').should == 'новый год'
|
13
|
+
end
|
14
|
+
|
15
|
+
end
|
16
|
+
|
@@ -0,0 +1,46 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'spec_helper'
|
3
|
+
|
4
|
+
describe "Lastname Ending Filter" do
|
5
|
+
|
6
|
+
it "should collapse like this:" do
|
7
|
+
assert_filter('Дубровский', 'Дубр%1')
|
8
|
+
assert_filter('Раевский', 'Ра%2')
|
9
|
+
assert_filter('Покровская', 'Покр%3')
|
10
|
+
assert_filter('Раневская', 'Ран%4')
|
11
|
+
|
12
|
+
assert_filter('Палиева', 'Пал%5')
|
13
|
+
assert_filter('Авдеева', 'Авд%5')
|
14
|
+
assert_filter('Семенова', 'Семен%5')
|
15
|
+
assert_filter('Терентьева', 'Теренть%5')
|
16
|
+
|
17
|
+
assert_filter('Палиев', 'Пал%6')
|
18
|
+
assert_filter('Авдеев', 'Авд%6')
|
19
|
+
assert_filter('Семенов', 'Семен%6')
|
20
|
+
assert_filter('Терентьев', 'Теренть%6')
|
21
|
+
|
22
|
+
assert_filter('Кононенко', 'Кононе%7')
|
23
|
+
|
24
|
+
assert_filter('Яровая', 'Яров%8')
|
25
|
+
|
26
|
+
assert_filter('Чернявский', 'Чернявск%9')
|
27
|
+
assert_filter('Буденый', 'Буден%9')
|
28
|
+
|
29
|
+
assert_filter('Боровских', 'Боровск%10')
|
30
|
+
assert_filter('Черных', 'Черн%10')
|
31
|
+
|
32
|
+
assert_filter('Литвин', 'Литв%11')
|
33
|
+
|
34
|
+
assert_filter('Кулик', 'Кул%12')
|
35
|
+
assert_filter('Гашек', 'Гаш%12')
|
36
|
+
|
37
|
+
assert_filter('Гайдук', 'Гайд%13')
|
38
|
+
assert_filter('Мазнюк', 'Мазн%13')
|
39
|
+
end
|
40
|
+
|
41
|
+
end
|
42
|
+
|
43
|
+
def assert_filter(source, expected)
|
44
|
+
RussianMetaphone::Filter::LastnameEnding.filter(source).should == expected
|
45
|
+
end
|
46
|
+
|
@@ -0,0 +1,24 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'spec_helper'
|
3
|
+
|
4
|
+
describe "Normalization Filter" do
|
5
|
+
|
6
|
+
it "should join multiple words into one" do
|
7
|
+
RussianMetaphone::Filter::Normalization.filter("привет от старых штиблет").should == 'приветотстарыхштиблет'
|
8
|
+
end
|
9
|
+
|
10
|
+
it "should not allow anything but cyrillic chars" do
|
11
|
+
RussianMetaphone::Filter::Normalization.filter("Привет мир! Hello World!").should == 'приветмир'
|
12
|
+
end
|
13
|
+
|
14
|
+
it "should downcase things properly" do
|
15
|
+
source = "МИРу МИр"
|
16
|
+
RussianMetaphone::Filter::Normalization.filter("МИРу МИр").should == 'мирумир'
|
17
|
+
end
|
18
|
+
|
19
|
+
it "should stip Ь and Ъ chars" do
|
20
|
+
RussianMetaphone::Filter::Normalization.filter("масянька").should == 'масянка'
|
21
|
+
RussianMetaphone::Filter::Normalization.filter("гундосъев").should == 'гундосев'
|
22
|
+
end
|
23
|
+
|
24
|
+
end
|
@@ -0,0 +1,33 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
require 'spec_helper'
|
3
|
+
|
4
|
+
#
|
5
|
+
# Исходные символы | Конечный символ
|
6
|
+
# |
|
7
|
+
# О, Ы, А, Я | А
|
8
|
+
# Ю, У | У
|
9
|
+
# Е, Ё, Э, И | И
|
10
|
+
#
|
11
|
+
# ЙО, ИО, ЙЕ, ИЕ заменяются на И
|
12
|
+
# ТС, ДС заменяются на Ц
|
13
|
+
|
14
|
+
describe "Replacements Filter" do
|
15
|
+
|
16
|
+
it "should not take into account the accent of a word" do
|
17
|
+
RussianMetaphone::Filter::Replacement.filter('боян').should == 'баан'
|
18
|
+
RussianMetaphone::Filter::Replacement.filter('малюкиёлка').should == 'малyкиилка'
|
19
|
+
end
|
20
|
+
|
21
|
+
it "should replace ЙО, ИО, ЙЕ, ИЕ = И" do
|
22
|
+
RussianMetaphone::Filter::Replacement.filter('майонез').should == 'маиниз'
|
23
|
+
RussianMetaphone::Filter::Replacement.filter('физиотерапия').should == 'физитирапиа'
|
24
|
+
RussianMetaphone::Filter::Replacement.filter('йемен').should == 'имин'
|
25
|
+
RussianMetaphone::Filter::Replacement.filter('приключение').should == 'приклyчини'
|
26
|
+
end
|
27
|
+
|
28
|
+
it "should replace ТС, ДС = Ц" do
|
29
|
+
RussianMetaphone::Filter::Replacement.filter('безрассудство').should == 'бизрассуцтва'
|
30
|
+
RussianMetaphone::Filter::Replacement.filter('детсад').should == 'дицад'
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
data/spec/spec_helper.rb
ADDED
metadata
ADDED
@@ -0,0 +1,104 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: russian_metaphone
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
prerelease:
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Pavlo V. Lysov
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
date: 2013-07-27 00:00:00.000000000 Z
|
13
|
+
dependencies:
|
14
|
+
- !ruby/object:Gem::Dependency
|
15
|
+
name: unicode
|
16
|
+
requirement: !ruby/object:Gem::Requirement
|
17
|
+
none: false
|
18
|
+
requirements:
|
19
|
+
- - ! '>='
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: 0.4.4
|
22
|
+
type: :runtime
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
none: false
|
26
|
+
requirements:
|
27
|
+
- - ! '>='
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: 0.4.4
|
30
|
+
- !ruby/object:Gem::Dependency
|
31
|
+
name: rspec
|
32
|
+
requirement: !ruby/object:Gem::Requirement
|
33
|
+
none: false
|
34
|
+
requirements:
|
35
|
+
- - ! '>='
|
36
|
+
- !ruby/object:Gem::Version
|
37
|
+
version: '0'
|
38
|
+
type: :development
|
39
|
+
prerelease: false
|
40
|
+
version_requirements: !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ! '>='
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
version: '0'
|
46
|
+
description: Implements 'Metaphone' phonetic algorithm adapted for Russian language
|
47
|
+
email:
|
48
|
+
- pavlo@cleverua.com
|
49
|
+
executables: []
|
50
|
+
extensions: []
|
51
|
+
extra_rdoc_files: []
|
52
|
+
files:
|
53
|
+
- .gitignore
|
54
|
+
- Gemfile
|
55
|
+
- LICENSE.txt
|
56
|
+
- README.md
|
57
|
+
- Rakefile
|
58
|
+
- lib/russian_metaphone.rb
|
59
|
+
- lib/russian_metaphone/filter.rb
|
60
|
+
- lib/russian_metaphone/filter/breath_consonants.rb
|
61
|
+
- lib/russian_metaphone/filter/duplicates_removal.rb
|
62
|
+
- lib/russian_metaphone/filter/lastname_ending.rb
|
63
|
+
- lib/russian_metaphone/filter/normalization.rb
|
64
|
+
- lib/russian_metaphone/filter/replacement.rb
|
65
|
+
- lib/russian_metaphone/version.rb
|
66
|
+
- russian_metaphone.gemspec
|
67
|
+
- spec/breath_consonants_filter_spec.rb
|
68
|
+
- spec/duplicates_removal_filter_spec.rb
|
69
|
+
- spec/lastname_ending_filter_spec.rb
|
70
|
+
- spec/normalization_filter_spec.rb
|
71
|
+
- spec/replacement_filter_spec.rb
|
72
|
+
- spec/spec_helper.rb
|
73
|
+
homepage: https://github.com/cleverua/russian_metaphone
|
74
|
+
licenses: []
|
75
|
+
post_install_message:
|
76
|
+
rdoc_options: []
|
77
|
+
require_paths:
|
78
|
+
- lib
|
79
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
80
|
+
none: false
|
81
|
+
requirements:
|
82
|
+
- - ! '>='
|
83
|
+
- !ruby/object:Gem::Version
|
84
|
+
version: '0'
|
85
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
86
|
+
none: false
|
87
|
+
requirements:
|
88
|
+
- - ! '>='
|
89
|
+
- !ruby/object:Gem::Version
|
90
|
+
version: '0'
|
91
|
+
requirements: []
|
92
|
+
rubyforge_project:
|
93
|
+
rubygems_version: 1.8.24
|
94
|
+
signing_key:
|
95
|
+
specification_version: 3
|
96
|
+
summary: Implements 'Metaphone' phonetic algorithm adapted for Russian language, allows
|
97
|
+
easy extending and algorithm tuning.
|
98
|
+
test_files:
|
99
|
+
- spec/breath_consonants_filter_spec.rb
|
100
|
+
- spec/duplicates_removal_filter_spec.rb
|
101
|
+
- spec/lastname_ending_filter_spec.rb
|
102
|
+
- spec/normalization_filter_spec.rb
|
103
|
+
- spec/replacement_filter_spec.rb
|
104
|
+
- spec/spec_helper.rb
|