RubyGems - cologne_phonetics - Versions diffs - 1.0.0 - Mend

cologne_phonetics 1.0.0

Files changed (8) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +9 -0
data/LICENSE.txt +21 -0
data/README.md +67 -0
data/lib/cologne_phonetics/rules.rb +53 -0
data/lib/cologne_phonetics/version.rb +5 -0
data/lib/cologne_phonetics.rb +58 -0
metadata +121 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 34b708dd44c7f1aea872b47dfa5db5f9d3ad1b91
+  data.tar.gz: 939eaec4a50c1c3e9fff629ee1b52b2f0ca1470d
+SHA512:
+  metadata.gz: '0948c7a91ec2a0764ea8c06d7a8c9a2c9fab2acebdc5cae1cafdf07283dbd8813114513d94b1743e2c7f0c5a85ca416e7e73234d953707ba30bbada050ed5c52'
+  data.tar.gz: 741a913e92789e62e6d1335f941954f915e2a5fab9155801585d661b68426274492de36af677ca2c3fe7b4a25e64a4174e220f8388b873c48099486c186cdea3

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,9 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+## [1.0.0] – 2018-03-06
+- Initial release.

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2017 Stefan Daschek
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,67 @@
+# ColognePhonetics
+The [“Cologne phonetics (Kölner Phonetik)”](https://en.wikipedia.org/wiki/Cologne_phonetics) algorithm encodes words in a way that enables to search for similarly sounding words. It’s related to the [“Soundex”](https://en.wikipedia.org/wiki/Soundex) algorithm, but better suited for the German language.
+This implementations closely follows the algorithm as described on its Wikipedia page. Support for umlauts (Ä, Ö, Ü) and ß has been added as suggested there.
+Note that *other accented characters are not handled*. If your data may contain such characters you need to preprocess it (for example by using [`I18n.transliterate`](http://www.rubydoc.info/gems/i18n/I18n/Base#transliterate-instance_method)).
+## Status
+I consider this gem to be stable and (more or less) finished.
+## Usage
+Example usage:
+```ruby
+ColognePhonetics.encode('Wikipedia') # => "3412"
+# Only basic characters and äöüß are handled, everything else gets ignored:
+ColognePhonetics.encode('Åè1%-') # => ""
+# If a string contains words separated by spaces, each word is encoded separately:
+ColognePhonetics.encode('Heinz Classen') # => "068 4586"
+# Use `encode_word` if you want to ignore spaces (note that this usually gives
+# different results that  using `encode` and removing spaces afterwards; see
+# Wikipedia article for details):
+ColognePhonetics.encode_word('Heinz Classen') # => "068586"
+```
+You can set `ColognePhonetics.debug = true` to get warnings printed to `$stderr` about characters that can not be encoded:
+```ruby
+ColognePhonetics.debug = true
+ColognePhonetics.encode('Olé')
+# Cologne Phonetics: No rule for 'é' (prev: 'l', next: '')
+# => "05"
+```
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'cologne_phonetics'
+```
+And then execute:
+    $ bundle
+Or install it yourself as:
+    $ gem install cologne_phonetics
+## Development
+After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/noniq/cologne_phonetics. Please make sure to include tests, and check that running `bin/rubocop` does not show any warnings.
+## License
+The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

data/lib/cologne_phonetics/rules.rb ADDED Viewed

@@ -0,0 +1,53 @@
+# frozen_string_literal: true
+module ColognePhonetics
+  # @api private
+  module Rules
+    def self.define(&block)
+      @rules = DSL.new(&block).rules
+    end
+    def self.apply_to(string)
+      string = string.downcase.tr('ÄÖÜ', 'äöü') # Ruby < 2.3 downcases ASCII characters only
+      chars = [nil] + string.chars + [nil]
+      chars.each_cons(3).map{ |prev_char, char, next_char|
+        code_for(prev_char, char, next_char)
+      }.join
+    end
+    def self.code_for(prev_char, char, next_char)
+      @rules.each do |matcher, code|
+        return code if matcher.call(prev_char, char, next_char)
+      end
+      debug_info "Cologne Phonetics: No rule for '#{char}' (prev: '#{prev_char}', next: '#{next_char}')"
+      nil
+    end
+    def self.debug_info(message)
+      return unless ColognePhonetics.debug
+      $stderr.puts message # rubocop:disable StderrPuts
+    end
+    class DSL
+      attr_reader :rules
+      def initialize(&block)
+        @rules = []
+        instance_exec(&block)
+      end
+      def change(chars, to:, before: nil, not_before: nil, after: nil, not_after: nil, initial: nil)
+        matcher = ->(prev_char, char, next_char){
+          return unless chars.include?(char)
+          return if initial && prev_char
+          return if before && (!next_char || !before.include?(next_char))
+          return if not_before && next_char && not_before.include?(next_char)
+          return if after && (!prev_char || !after.include?(prev_char))
+          return if not_after && prev_char && not_after.include?(prev_char)
+          true
+        }
+        @rules << [matcher, to]
+      end
+    end
+  end
+end

data/lib/cologne_phonetics/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module ColognePhonetics
+  VERSION = '1.0.0'
+end

data/lib/cologne_phonetics.rb ADDED Viewed

@@ -0,0 +1,58 @@
+# frozen_string_literal: true
+require 'cologne_phonetics/rules'
+require 'cologne_phonetics/version'
+module ColognePhonetics
+  class << self
+    # Enable / disable debug mode. If set to true, using {.encode} or {.encode_word} will output
+    # warnings to `$stderr` if they encounter characters that cannot be encoded.
+    attr_accessor :debug
+  end
+  # rubocop:disable SpaceBeforeComma
+  Rules.define do
+    change 'aeijouy', to: '0'
+    change 'äöü'    , to: '0' # additional rule: treat umlauts like vowels
+    change 'h'      , to: ''
+    change 'b'      , to: '1'
+    change 'p'      , to: '1', not_before: 'h'
+    change 'dt'     , to: '2', not_before: 'csz'
+    change 'fvw'    , to: '3'
+    change 'p'      , to: '3', before: 'h'
+    change 'gkq'    , to: '4'
+    change 'c'      , to: '4', initial: true, before: 'ahkloqrux'
+    change 'c'      , to: '4', before: 'ahkoqux', not_after: 'sz'
+    change 'x'      , to: '48', not_after: 'ckq'
+    change 'l'      , to: '5'
+    change 'mn'     , to: '6'
+    change 'r'      , to: '7'
+    change 'sz'     , to: '8'
+    change 'ß'      , to: '8' # additional rule: treat 'ß' like 's'
+    change 'c'      , to: '8', after: 'sz'
+    change 'c'      , to: '8', initial: true, not_before: 'ahkloqrux'
+    change 'c'      , to: '8', not_before: 'ahkoqux'
+    change 'dt'     , to: '8', before: 'csz'
+    change 'x'      , to: '8', after: 'ckq'
+  end
+  # rubocop:enable SpaceBeforeComma
+  # Encode string using Cologne phonetics rules. The encoding process can handle upper and lower case
+  # characters in the range of `a–z`, as well as `äöüß`. Everything else is ignored.
+  #
+  # If the string consists of several words separated by spaces, each word is encoded seperately,
+  # and the resulting codes are then again joined together with spaces.
+  #
+  # @return [String] Encoded string (consist of digits only, and maybe spaces)
+  def self.encode(string)
+    string.split(' ').map{ |word| encode_word(word) }.join(' ')
+  end
+  # Low-level method for encoding a single word using Cologne phonetics rules (spaces will be
+  # ignored). You most probably want to use {.encode} instead.
+  #
+  # @return [String] Encoded word (consists of digits only)
+  def self.encode_word(word)
+    Rules.apply_to(word).squeeze.gsub(/(.)0/, '\1')
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,121 @@
+--- !ruby/object:Gem::Specification
+name: cologne_phonetics
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- Stefan Daschek
+autorequire:
+bindir: exe
+cert_chain: []
+date: 2018-03-06 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.15'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '1.15'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '10.0'
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.6'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.6'
+- !ruby/object:Gem::Dependency
+  name: rubocop
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.52.1
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.52.1
+- !ruby/object:Gem::Dependency
+  name: yard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.9.12
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.9.12
+description: Cologne phonetics (also Kölner Phonetik, Cologne process) is a phonetic
+  algorithm which assigns to words a sequence of digits, the phonetic code.
+email:
+- stefan@die-antwort.eu
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- CHANGELOG.md
+- LICENSE.txt
+- README.md
+- lib/cologne_phonetics.rb
+- lib/cologne_phonetics/rules.rb
+- lib/cologne_phonetics/version.rb
+homepage: https://github.com/noniq/cologne_phonetics
+licenses:
+- MIT
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.6.14
+signing_key:
+specification_version: 4
+summary: Cologne phonetics (Kölner Phonetik) text encoding algorithm
+test_files: []