cologne_phonetics 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 34b708dd44c7f1aea872b47dfa5db5f9d3ad1b91
4
+ data.tar.gz: 939eaec4a50c1c3e9fff629ee1b52b2f0ca1470d
5
+ SHA512:
6
+ metadata.gz: '0948c7a91ec2a0764ea8c06d7a8c9a2c9fab2acebdc5cae1cafdf07283dbd8813114513d94b1743e2c7f0c5a85ca416e7e73234d953707ba30bbada050ed5c52'
7
+ data.tar.gz: 741a913e92789e62e6d1335f941954f915e2a5fab9155801585d661b68426274492de36af677ca2c3fe7b4a25e64a4174e220f8388b873c48099486c186cdea3
data/CHANGELOG.md ADDED
@@ -0,0 +1,9 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
6
+
7
+ ## [1.0.0] – 2018-03-06
8
+ - Initial release.
9
+
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 Stefan Daschek
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,67 @@
1
+ # ColognePhonetics
2
+
3
+ The [“Cologne phonetics (Kölner Phonetik)”](https://en.wikipedia.org/wiki/Cologne_phonetics) algorithm encodes words in a way that enables to search for similarly sounding words. It’s related to the [“Soundex”](https://en.wikipedia.org/wiki/Soundex) algorithm, but better suited for the German language.
4
+
5
+ This implementations closely follows the algorithm as described on its Wikipedia page. Support for umlauts (Ä, Ö, Ü) and ß has been added as suggested there.
6
+
7
+ Note that *other accented characters are not handled*. If your data may contain such characters you need to preprocess it (for example by using [`I18n.transliterate`](http://www.rubydoc.info/gems/i18n/I18n/Base#transliterate-instance_method)).
8
+
9
+ ## Status
10
+
11
+ I consider this gem to be stable and (more or less) finished.
12
+
13
+ ## Usage
14
+
15
+ Example usage:
16
+
17
+ ```ruby
18
+ ColognePhonetics.encode('Wikipedia') # => "3412"
19
+
20
+ # Only basic characters and äöüß are handled, everything else gets ignored:
21
+ ColognePhonetics.encode('Åè1%-') # => ""
22
+
23
+ # If a string contains words separated by spaces, each word is encoded separately:
24
+ ColognePhonetics.encode('Heinz Classen') # => "068 4586"
25
+
26
+ # Use `encode_word` if you want to ignore spaces (note that this usually gives
27
+ # different results that using `encode` and removing spaces afterwards; see
28
+ # Wikipedia article for details):
29
+ ColognePhonetics.encode_word('Heinz Classen') # => "068586"
30
+ ```
31
+
32
+ You can set `ColognePhonetics.debug = true` to get warnings printed to `$stderr` about characters that can not be encoded:
33
+
34
+ ```ruby
35
+ ColognePhonetics.debug = true
36
+ ColognePhonetics.encode('Olé')
37
+ # Cologne Phonetics: No rule for 'é' (prev: 'l', next: '')
38
+ # => "05"
39
+ ```
40
+
41
+ ## Installation
42
+
43
+ Add this line to your application's Gemfile:
44
+
45
+ ```ruby
46
+ gem 'cologne_phonetics'
47
+ ```
48
+
49
+ And then execute:
50
+
51
+ $ bundle
52
+
53
+ Or install it yourself as:
54
+
55
+ $ gem install cologne_phonetics
56
+
57
+ ## Development
58
+
59
+ After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
60
+
61
+ ## Contributing
62
+
63
+ Bug reports and pull requests are welcome on GitHub at https://github.com/noniq/cologne_phonetics. Please make sure to include tests, and check that running `bin/rubocop` does not show any warnings.
64
+
65
+ ## License
66
+
67
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
@@ -0,0 +1,53 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ColognePhonetics
4
+ # @api private
5
+ module Rules
6
+ def self.define(&block)
7
+ @rules = DSL.new(&block).rules
8
+ end
9
+
10
+ def self.apply_to(string)
11
+ string = string.downcase.tr('ÄÖÜ', 'äöü') # Ruby < 2.3 downcases ASCII characters only
12
+ chars = [nil] + string.chars + [nil]
13
+ chars.each_cons(3).map{ |prev_char, char, next_char|
14
+ code_for(prev_char, char, next_char)
15
+ }.join
16
+ end
17
+
18
+ def self.code_for(prev_char, char, next_char)
19
+ @rules.each do |matcher, code|
20
+ return code if matcher.call(prev_char, char, next_char)
21
+ end
22
+ debug_info "Cologne Phonetics: No rule for '#{char}' (prev: '#{prev_char}', next: '#{next_char}')"
23
+ nil
24
+ end
25
+
26
+ def self.debug_info(message)
27
+ return unless ColognePhonetics.debug
28
+ $stderr.puts message # rubocop:disable StderrPuts
29
+ end
30
+
31
+ class DSL
32
+ attr_reader :rules
33
+
34
+ def initialize(&block)
35
+ @rules = []
36
+ instance_exec(&block)
37
+ end
38
+
39
+ def change(chars, to:, before: nil, not_before: nil, after: nil, not_after: nil, initial: nil)
40
+ matcher = ->(prev_char, char, next_char){
41
+ return unless chars.include?(char)
42
+ return if initial && prev_char
43
+ return if before && (!next_char || !before.include?(next_char))
44
+ return if not_before && next_char && not_before.include?(next_char)
45
+ return if after && (!prev_char || !after.include?(prev_char))
46
+ return if not_after && prev_char && not_after.include?(prev_char)
47
+ true
48
+ }
49
+ @rules << [matcher, to]
50
+ end
51
+ end
52
+ end
53
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module ColognePhonetics
4
+ VERSION = '1.0.0'
5
+ end
@@ -0,0 +1,58 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'cologne_phonetics/rules'
4
+ require 'cologne_phonetics/version'
5
+
6
+ module ColognePhonetics
7
+ class << self
8
+ # Enable / disable debug mode. If set to true, using {.encode} or {.encode_word} will output
9
+ # warnings to `$stderr` if they encounter characters that cannot be encoded.
10
+ attr_accessor :debug
11
+ end
12
+
13
+ # rubocop:disable SpaceBeforeComma
14
+ Rules.define do
15
+ change 'aeijouy', to: '0'
16
+ change 'äöü' , to: '0' # additional rule: treat umlauts like vowels
17
+ change 'h' , to: ''
18
+ change 'b' , to: '1'
19
+ change 'p' , to: '1', not_before: 'h'
20
+ change 'dt' , to: '2', not_before: 'csz'
21
+ change 'fvw' , to: '3'
22
+ change 'p' , to: '3', before: 'h'
23
+ change 'gkq' , to: '4'
24
+ change 'c' , to: '4', initial: true, before: 'ahkloqrux'
25
+ change 'c' , to: '4', before: 'ahkoqux', not_after: 'sz'
26
+ change 'x' , to: '48', not_after: 'ckq'
27
+ change 'l' , to: '5'
28
+ change 'mn' , to: '6'
29
+ change 'r' , to: '7'
30
+ change 'sz' , to: '8'
31
+ change 'ß' , to: '8' # additional rule: treat 'ß' like 's'
32
+ change 'c' , to: '8', after: 'sz'
33
+ change 'c' , to: '8', initial: true, not_before: 'ahkloqrux'
34
+ change 'c' , to: '8', not_before: 'ahkoqux'
35
+ change 'dt' , to: '8', before: 'csz'
36
+ change 'x' , to: '8', after: 'ckq'
37
+ end
38
+ # rubocop:enable SpaceBeforeComma
39
+
40
+ # Encode string using Cologne phonetics rules. The encoding process can handle upper and lower case
41
+ # characters in the range of `a–z`, as well as `äöüß`. Everything else is ignored.
42
+ #
43
+ # If the string consists of several words separated by spaces, each word is encoded seperately,
44
+ # and the resulting codes are then again joined together with spaces.
45
+ #
46
+ # @return [String] Encoded string (consist of digits only, and maybe spaces)
47
+ def self.encode(string)
48
+ string.split(' ').map{ |word| encode_word(word) }.join(' ')
49
+ end
50
+
51
+ # Low-level method for encoding a single word using Cologne phonetics rules (spaces will be
52
+ # ignored). You most probably want to use {.encode} instead.
53
+ #
54
+ # @return [String] Encoded word (consists of digits only)
55
+ def self.encode_word(word)
56
+ Rules.apply_to(word).squeeze.gsub(/(.)0/, '\1')
57
+ end
58
+ end
metadata ADDED
@@ -0,0 +1,121 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: cologne_phonetics
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Stefan Daschek
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2018-03-06 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.15'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.15'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.6'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.6'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rubocop
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: 0.52.1
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: 0.52.1
69
+ - !ruby/object:Gem::Dependency
70
+ name: yard
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: 0.9.12
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: 0.9.12
83
+ description: Cologne phonetics (also Kölner Phonetik, Cologne process) is a phonetic
84
+ algorithm which assigns to words a sequence of digits, the phonetic code.
85
+ email:
86
+ - stefan@die-antwort.eu
87
+ executables: []
88
+ extensions: []
89
+ extra_rdoc_files: []
90
+ files:
91
+ - CHANGELOG.md
92
+ - LICENSE.txt
93
+ - README.md
94
+ - lib/cologne_phonetics.rb
95
+ - lib/cologne_phonetics/rules.rb
96
+ - lib/cologne_phonetics/version.rb
97
+ homepage: https://github.com/noniq/cologne_phonetics
98
+ licenses:
99
+ - MIT
100
+ metadata: {}
101
+ post_install_message:
102
+ rdoc_options: []
103
+ require_paths:
104
+ - lib
105
+ required_ruby_version: !ruby/object:Gem::Requirement
106
+ requirements:
107
+ - - ">="
108
+ - !ruby/object:Gem::Version
109
+ version: '0'
110
+ required_rubygems_version: !ruby/object:Gem::Requirement
111
+ requirements:
112
+ - - ">="
113
+ - !ruby/object:Gem::Version
114
+ version: '0'
115
+ requirements: []
116
+ rubyforge_project:
117
+ rubygems_version: 2.6.14
118
+ signing_key:
119
+ specification_version: 4
120
+ summary: Cologne phonetics (Kölner Phonetik) text encoding algorithm
121
+ test_files: []