RubyGems - phonetic - Versions diffs - 1.1.0 → 1.2.0 - Mend

phonetic 1.1.0 → 1.2.0

Files changed (28) hide show

checksums.yaml +4 -4
data/.travis.yml +9 -5
data/.yardopts +5 -0
data/CHANGELOG.md +14 -0
data/README.md +123 -109
data/lib/phonetic.rb +1 -0
data/lib/phonetic/core_ext/string/nysiis.rb +1 -1
data/lib/phonetic/core_ext/string/refined_nysiis.rb +12 -0
data/lib/phonetic/dm_soundex.rb +4 -21
data/lib/phonetic/dm_soundex/code.rb +30 -0
data/lib/phonetic/{dm_soundex_map.rb → dm_soundex/map.rb} +0 -0
data/lib/phonetic/double_metaphone.rb +111 -130
data/lib/phonetic/double_metaphone/code.rb +28 -0
data/lib/phonetic/metaphone.rb +123 -87
data/lib/phonetic/refined_nysiis.rb +72 -0
data/lib/phonetic/version.rb +1 -1
data/phonetic.gemspec +29 -27
data/spec/phonetic/caverphone2_spec.rb +2 -53
data/spec/phonetic/caverphone_spec.rb +2 -104
data/spec/phonetic/core_ext/string/refined_nysiis_spec.rb +9 -0
data/spec/phonetic/double_metaphone_spec.rb +3 -2
data/spec/phonetic/refined_nysiis_spec.rb +30 -0
data/spec/spec_helper.rb +6 -5
data/spec/support/caverphone2_data.rb +53 -0
data/spec/support/caverphone_data.rb +104 -0
data/spec/support/double_metaphone_data.rb +5 -0
data/spec/support/refined_nysiis_data.rb +49 -0
metadata +20 -4

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: a20da7ce0b4dab68d7671088098226a035c64b05
-  data.tar.gz: 2b721bc986d8e23ba6780bb7cab92059e6a7652b
+  metadata.gz: 98bef8e122a5abed59eee25d4e9e4a2475aef89b
+  data.tar.gz: 9a4656b92c3e81f507ab5ffdcbd55a701728ff05
 SHA512:
-  metadata.gz: 14325fa3846251dd1a1cbc59b38c12a32471291b45b07074387747fa9331b5ad98b1b0afaa8dbbac62872f9bf959d5e622742e5ec673f3e1294807f91b5fdc85
-  data.tar.gz: ad80a4c26cae46cbc516cc6cfebbfea39be69fbfe86426737cd94c065da57f4448ff3ffe9f892bd60af94fd834ae122c472cdc09b5fe683344ef479d1f31f90c
+  metadata.gz: aea70d4160ade24bfd89370b06ba2381d7e444f8e59f361c48c48209e77de10b5dde42ef04e77a67558f3be5e6f3379812618e5a3c91f165e37aa5378e6b4acf
+  data.tar.gz: 17af435c3b3d7c8603a0a5a2de323671f04eefe5cbec6100e89bef862cfe4beab14fa63f3449b83233e001479f0d52a30b8f448400281db33f2789669b7cb31b

data/.travis.yml CHANGED

@@ -1,5 +1,9 @@
-language: ruby
-rvm:
-  - "1.9.2"
-  - "1.9.3"
-  - "2.0.0"
+language: ruby
+rvm:
+  - 1.9.2
+  - 1.9.3
+  - 2.0.0
+  - ruby-head
+  - jruby-19mode
+  - jruby-head
+  - rbx-2.1.1

data/.yardopts ADDED

@@ -0,0 +1,5 @@
+--charset utf-8
+-
+README.md
+LICENSE.txt
+CHANGELOG.md

data/CHANGELOG.md ADDED

@@ -0,0 +1,14 @@
+# Phonetic CHANGELOG
+## 1.2.0
+* added Refined NYSIIS
+## 1.1.0
+* added Daitch–Mokotoff Soundex
+## 1.0.0
+* Initial release with Soundex, Refined Soundex, Metaphone, Double Metaphone,
+Caverphone, Caverphone 2 and NYSIIS

data/README.md CHANGED

@@ -1,109 +1,123 @@
-# Phonetic
-[![Build Status](https://travis-ci.org/n7v/phonetic.png)](https://travis-ci.org/n7v/phonetic)
-[![Gem Version](https://badge.fury.io/rb/phonetic.png)](http://badge.fury.io/rb/phonetic)
-[![Coverage Status](https://coveralls.io/repos/n7v/phonetic/badge.png)](https://coveralls.io/r/n7v/phonetic)
-[![Code Climate](https://codeclimate.com/github/n7v/phonetic.png)](https://codeclimate.com/github/n7v/phonetic)
-Ruby library for phonetic algorithms.
-It supports Soundex, Metaphone, Double Metaphone, Caverphone, NYSIIS and others.
-## Installation
-Add this line to your application's Gemfile:
-    gem 'phonetic'
-And then execute:
-```shell
-$ bundle
-```
-Or install it yourself as:
-```shell
-$ gem install phonetic
-```
-## Usage
-```ruby
-require 'phonetic'
-```
-### Soundex
-```ruby
-'Ackerman'.soundex # => 'A265'
-'ammonium'.soundex # => 'A500'
-'implementation'.soundex # => 'I514'
-```
-### Refined Soundex
-```ruby
-'Caren'.refined_soundex   # => 'C30908'
-'Hayers'.refined_soundex  # => 'H093'
-'Lambard'.refined_soundex # => 'L7081096'
-```
-### Metaphone
-```ruby
-'Accola'.metaphone # => 'AKKL'
-'Nikki'.metaphone # => 'NK'
-'Wright'.metaphone #=> 'RT'
-```
-### Double Metaphone
-```ruby
-'czerny'.double_metaphone # => ['SRN', 'XRN']
-'dumb'.double_metaphone   # => ['TM', 'TM']
-'edgar'.double_metaphone  # => ['ATKR', 'ATKR']
-```
-or use alias:
-```ruby
-'czerny'.metaphone2 # => ['SRN', 'XRN']
-'dumb'.metaphone2   # => ['TM', 'TM']
-'edgar'.metaphone2  # => ['ATKR', 'ATKR']
-```
-### Caverphone
-```ruby
-'Lashaunda'.caverphone # => 'LSNT11'
-'Vidaurri'.caverphone # => 'FTR111'
-````
-### Caverphone 2
-```ruby
-'Stevenson'.caverphone2 # => 'STFNSN1111'
-'Peter'.caverphone2 # => 'PTA1111111'
-```
-### NYSIIS
-```ruby
-'Alexandra'.nysiis # => 'ALAXANDR'
-'Aumont'.nysiis # => 'AANAD'
-'Bonnie'.nysiis # => 'BANY'
-```
-### Daitch–Mokotoff Soundex (D–M Soundex)
-```ruby
-'Anja'.dm_soundex # => ['060000', '064000']
-'Schwarz'.dm_soundex # => ['474000', '479400']
-'Schtolteheim'.dm_soundex # => ['283560']
-```
-## Contributing
-1. Fork it
-2. Create your feature branch (`git checkout -b my-new-feature`)
-3. Commit your changes (`git commit -am 'Add some feature'`)
-4. Push to the branch (`git push origin my-new-feature`)
-5. Create new Pull Request
+# Phonetic
+[![Build Status](https://travis-ci.org/n7v/phonetic.png)](https://travis-ci.org/n7v/phonetic)
+[![Gem Version](https://badge.fury.io/rb/phonetic.png)](http://badge.fury.io/rb/phonetic)
+[![Coverage Status](https://coveralls.io/repos/n7v/phonetic/badge.png)](https://coveralls.io/r/n7v/phonetic)
+[![Code Climate](https://codeclimate.com/github/n7v/phonetic.png)](https://codeclimate.com/github/n7v/phonetic)
+[![Dependency Status](https://gemnasium.com/n7v/phonetic.png)](https://gemnasium.com/n7v/phonetic)
+Ruby library for phonetic algorithms.
+It supports Soundex, Metaphone, Double Metaphone, Caverphone, NYSIIS and others.
+## Installation
+Add this line to your application's Gemfile:
+    gem 'phonetic'
+And then execute:
+```shell
+$ bundle
+```
+Or install it yourself as:
+```shell
+$ gem install phonetic
+```
+## Dependencies
+Ruby >= 1.9, JRuby 1.7.6, Rubinius 2.1.1
+## Usage
+```ruby
+require 'phonetic'
+```
+### Soundex
+```ruby
+'Ackerman'.soundex # => 'A265'
+'ammonium'.soundex # => 'A500'
+'implementation'.soundex # => 'I514'
+```
+### Refined Soundex
+```ruby
+'Caren'.refined_soundex   # => 'C30908'
+'Hayers'.refined_soundex  # => 'H093'
+'Lambard'.refined_soundex # => 'L7081096'
+```
+### Metaphone
+```ruby
+'Accola'.metaphone # => 'AKKL'
+'Nikki'.metaphone # => 'NK'
+'Wright'.metaphone #=> 'RT'
+```
+### Double Metaphone
+```ruby
+'czerny'.double_metaphone # => ['SRN', 'XRN']
+'dumb'.double_metaphone   # => ['TM', 'TM']
+'edgar'.double_metaphone  # => ['ATKR', 'ATKR']
+```
+or use alias:
+```ruby
+'czerny'.metaphone2 # => ['SRN', 'XRN']
+'dumb'.metaphone2   # => ['TM', 'TM']
+'edgar'.metaphone2  # => ['ATKR', 'ATKR']
+```
+### Caverphone
+```ruby
+'Lashaunda'.caverphone # => 'LSNT11'
+'Vidaurri'.caverphone # => 'FTR111'
+````
+### Caverphone 2
+```ruby
+'Stevenson'.caverphone2 # => 'STFNSN1111'
+'Peter'.caverphone2 # => 'PTA1111111'
+```
+### NYSIIS
+```ruby
+'Alexandra'.nysiis # => 'ALAXANDR'
+'Aumont'.nysiis # => 'AANAD'
+'Bonnie'.nysiis # => 'BANY'
+```
+### Refined NYSIIS
+```ruby
+'Aumont'.refined_nysiis  # => 'ANAD'
+'Phoenix'.refined_nysiis # => 'FANAC'
+'Schmidt'.refined_nysiis # => 'SNAD'
+```
+### Daitch–Mokotoff Soundex (D–M Soundex)
+```ruby
+'Anja'.dm_soundex # => ['060000', '064000']
+'Schwarz'.dm_soundex # => ['474000', '479400']
+'Schtolteheim'.dm_soundex # => ['283560']
+```
+## Contributing
+1. Fork it
+2. Create your feature branch (`git checkout -b my-new-feature`)
+3. Commit your changes (`git commit -am 'Add some feature'`)
+4. Push to the branch (`git push origin my-new-feature`)
+5. Create new Pull Request

data/lib/phonetic.rb CHANGED

@@ -1,5 +1,6 @@
 require 'phonetic/version'
 require 'phonetic/nysiis'
+require 'phonetic/refined_nysiis'
 require 'phonetic/soundex'
 require 'phonetic/refined_soundex'
 require 'phonetic/metaphone'

data/lib/phonetic/core_ext/string/nysiis.rb CHANGED

@@ -1,7 +1,7 @@
 require 'phonetic/nysiis'
 class String
-  # Caverphone value of string.
+  # NYSIIS value of string.
   # @example
   #    'Alexandra'.nysiis # => 'ALAXANDR'
   #    'Aumont'.nysiis # => 'AANAD'

data/lib/phonetic/core_ext/string/refined_nysiis.rb ADDED

@@ -0,0 +1,12 @@
+require 'phonetic/refined_nysiis'
+class String
+  # Refined NYSIIS value of string.
+  # @example
+  #    'Aumont'.refined_nysiis  # => 'ANAD'
+  #    'Phoenix'.refined_nysiis # => 'FANAC'
+  #    'Schmidt'.refined_nysiis # => 'SNAD'
+  def refined_nysiis(options = { trim: true })
+    Phonetic::RefinedNYSIIS.encode(self, options)
+  end
+end

data/lib/phonetic/dm_soundex.rb CHANGED

@@ -1,5 +1,6 @@
 require 'phonetic/algorithm'
-require 'phonetic/dm_soundex_map'
+require 'phonetic/dm_soundex/map'
+require 'phonetic/dm_soundex/code'
 module Phonetic
   # Daitch–Mokotoff Soundex (D–M Soundex) is a phonetic algorithm invented
@@ -19,7 +20,7 @@ module Phonetic
     def self.encode_word(word, options = {})
       w = word.strip.upcase.gsub(/[^A-Z]+/, '')
       i = 0
-      code = init_code()
+      code = Code.new
       while i < w.size
         if w[i] != w[i + 1]
           c = find_code(MAP, w, i)
@@ -37,29 +38,11 @@ module Phonetic
         end
         i += 1
       end
-      code.result
+      code.results
     end
     private
-    def self.init_code
-      code = [[]]
-      def code.add(a)
-        case a
-        when Array
-          c = self.map{|w| w.last != a[1] ? w + [a[1]] : w}
-          self.map!{|w| w.last != a[0] ? w + [a[0]] : w}
-          self.push(*c)
-        else
-          self.map!{|w| w.last != a ? w + [a] : w}
-        end
-      end
-      def code.result
-        self.map{|w| w.join[0..5].ljust(6, '0')}.uniq
-      end
-      code
-    end
     def self.find_code(map, w, i, last = nil, count = 0)
       elem = map[w[i]]
       r = case elem

data/lib/phonetic/dm_soundex/code.rb ADDED

@@ -0,0 +1,30 @@
+module Phonetic
+ class DMSoundex
+    class Code
+      def initialize
+        @codes = [[]]
+      end
+      def add(a)
+        case a
+        when Array
+          c1 = add_code(a[0])
+          c2 = add_code(a[1])
+          @codes = c1 + c2
+        else
+          @codes = add_code(a)
+        end
+      end
+      def results
+        @codes.map{|w| w.join[0..5].ljust(6, '0')}.uniq
+      end
+      private
+      def add_code(code)
+        @codes.map{|w| w.last != code ? w + [code] : w}
+      end
+    end
+  end
+end

data/lib/phonetic/{dm_soundex_map.rb → dm_soundex/map.rb} RENAMED

File without changes

data/lib/phonetic/double_metaphone.rb CHANGED

@@ -1,6 +1,7 @@
 # encoding: utf-8
 require 'phonetic/algorithm'
+require 'phonetic/double_metaphone/code'
 module Phonetic
   # The Double Metaphone phonetic encoding algorithm is the second generation
@@ -22,15 +23,39 @@ module Phonetic
   #    Phonetic::Metaphone2.encode('dumb')   # => ['TM', 'TM']
   #    Phonetic::Metaphone2.encode('edgar')  # => ['ATKR', 'ATKR']
   class DoubleMetaphone < Algorithm
+    START_OF_WORD_MAP = {
+      # skip these when at start of word
+      /^([GKP]N|WR|PS)/ => ['', '', 1],
+      # initial 'X' is pronounced 'Z' e.g. 'Xavier'
+      /^X/ => ['S', 'S', 1],
+      # all init vowels now map to 'A'
+      /^[AEIOUY]/ => ['A', 'A', 1],
+      # special case 'caesar'
+      /^CAESAR/ => ['S', 'S', 1],
+      # special case 'sugar-'
+      /^SUGAR/ => ['X', 'S', 1],
+      # -ges-, -gep-, -gel-, -gie- at beginning
+      /^G(Y|E[SPBLYIR]|I[BLNE])/ => ['K', 'J', 2],
+      # keep H if first & before vowel
+      /^H[AEIOUY]/ => ['H', 'H', 2],
+      # german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'
+      /^S[MNLW]/ => ['S', 'X', 1],
+      # ghislane, ghiradelli
+      /^GHI/ => ['J', 'J', 2],
+      /^GH/ => ['K', 'K', 2],
+      # greek roots e.g. 'chemistry', 'chorus'
+      /^CH(ARAC|ARIS|OR[^E]|YM|EM)/ => ['K', 'K', 2],
+      # Wasserman should match Vasserman
+      /^W[AEIOUY]/ => ['A', 'F', 0],
+      # need Uomo to match Womo
+      /^WH/ => ['A', 'A', 0]
+    }
     # Encode word to its Double Metaphone code.
     def self.encode_word(word, options = { size: 4 })
       code_size = options[:size] || 4
       w = word.strip.upcase
-      code = ['', '']
-      def code.add(primary, secondary)
-        self[0] += primary
-        self[1] += secondary
-      end
+      code = Code.new
       i = 0
       len = w.size
       last = len - 1
@@ -47,22 +72,12 @@ module Phonetic
         when 'Ç', 'ç'
           code.add 'S', 'S'
           i += 1
-        when 'C'
-          i += encode_c(w, i, len, code)
-        when 'D'
-          i += encode_d(w, i, len, code)
+        when 'C', 'D'
+          i += char_encode(w, i, len, code)
         when 'F', 'K', 'N'
           i += gen_encode(w, i, w[i], w[i], code)
-        when 'G'
-          i += encode_g(w, i, len, code)
-        when 'H'
-          i += encode_h(w, i, len, code)
-        when 'J'
-          i += encode_j(w, i, len, code)
-        when 'L'
-          i += encode_l(w, i, len, code)
-        when 'M'
-          i += encode_m(w, i, len, code)
+        when 'G', 'H', 'J', 'L', 'M'
+          i += char_encode(w, i, len, code)
         when 'Ñ', 'ñ'
           code.add 'N', 'N'
           i += 1
@@ -70,25 +85,17 @@ module Phonetic
           i += encode_p(w, i, len, code)
         when 'Q'
           i += gen_encode(w, i, 'K', 'K', code)
-        when 'R'
-          i += encode_r(w, i, len, code)
-        when 'S'
-          i += encode_s(w, i, len, code)
-        when 'T'
-          i += encode_t(w, i, len, code)
+        when 'R', 'S', 'T'
+          i += char_encode(w, i, len, code)
         when 'V'
           i += gen_encode(w, i, 'F', 'F', code)
-        when 'W'
-          i += encode_w(w, i, len, code)
-        when 'X'
-          i += encode_x(w, i, len, code)
-        when 'Z'
-          i += encode_z(w, i, len, code)
+        when 'W', 'X', 'Z'
+          i += char_encode(w, i, len, code)
         else
           i += 1
         end
       end
-      [code.first[0, code_size], code.last[0, code_size]]
+      code.results(code_size)
     end
     def self.encode(str, options = { size: 4 })
@@ -99,19 +106,12 @@ module Phonetic
     def self.encode_start_of_word(w, code)
       i = 0
-      # skip these when at start of word
-      if w[0, 2] =~ /[GKP]N|WR|PS/
-        i = 1
-      # initial 'X' is pronounced 'Z' e.g. 'Xavier'
-      elsif w[0] == 'X'
-        code.add 'S', 'S'
-        i = 1
-      elsif w[0] =~ /[AEIOUY]/
-        code.add 'A', 'A' # all init vowels now map to 'A'
-        i = 1
-      elsif w[0, 6] == 'CAESAR' # special case 'caesar'
-        code.add 'S', 'S'
-        i = 1
+      START_OF_WORD_MAP.each do |r, v|
+        if w =~ r
+          code.add v[0], v[1]
+          i = v[2]
+          break
+        end
       end
       i
     end
@@ -121,6 +121,10 @@ module Phonetic
       w[i + 1] == w[i] ? 2 : 1
     end
+    def self.char_encode(w, i, len, code)
+      self.send "encode_#{w[i].downcase}", w, i, len, code
+    end
     def self.encode_c(w, i, len, code)
       r = 1
       case
@@ -129,8 +133,7 @@ module Phonetic
         code.add 'K', 'K'
         r += 1
       when w[i, 2] == 'CH'
-        encode_ch(w, i, len, code)
-        r += 1
+        r += encode_ch(w, i, len, code)
       when w[i, 2] == 'CZ' && !(i > 1 && w[i - 2, 4] == 'WICZ')
         # e.g, 'czerny'
         code.add 'S', 'X'
@@ -145,13 +148,12 @@ module Phonetic
       when w[i, 2] =~ /C[KGQ]/
         code.add 'K', 'K'
         r += 1
+      # italian vs. english
+      when w[i, 3] =~ /CI[OEA]/
+        code.add 'S', 'X'
+        r += 1
       when w[i, 2] =~ /C[IEY]/
-        # italian vs. english
-        if w[i, 3] =~ /CI[OEA]/
-          code.add 'S', 'X'
-        else
-          code.add 'S', 'S'
-        end
+        code.add 'S', 'S'
         r += 1
       else
         code.add 'K', 'K'
@@ -167,17 +169,16 @@ module Phonetic
     def self.encode_d(w, i, len, code)
       r = 1
-      if w[i, 2] == 'DG'
-        if w[i + 2] =~ /[IEY]/
-          # e.g. 'edge'
-          code.add 'J', 'J'
-          r += 2
-        else
-          # e.g. 'edgar'
-          code.add 'TK', 'TK'
-          r += 1
-        end
-      elsif w[i, 2] =~ /D[TD]/
+      case
+      when w[i + 1, 2] =~ /G[IEY]/
+        # e.g. 'edge'
+        code.add 'J', 'J'
+        r += 2
+      when w[i + 1] == 'G'
+        # e.g. 'edgar'
+        code.add 'TK', 'TK'
+        r += 1
+      when w[i + 1] =~ /[TD]/
         code.add 'T', 'T'
         r += 1
       else
@@ -188,22 +189,19 @@ module Phonetic
     def self.encode_g(w, i, len, code)
       r = 2
-      if w[i + 1] == 'H'
+      case
+      when w[i + 1] == 'H'
         encode_gh(w, i, code)
-      elsif w[i + 1] == 'N'
+      when w[i + 1] == 'N'
         encode_gn(w, i, code)
       # 'tagliaro'
-      elsif w[i + 1, 2] == 'LI' && !slavo_germanic?(w)
+      when w[i + 1, 2] == 'LI' && !slavo_germanic?(w)
         code.add 'KL', 'L'
-      # -ges-, -gep-, -gel-, -gie- at beginning
-      elsif i == 0 && w[1, 2] =~ /^Y|E[SPBLYIR]|I[BLNE]/
-        code.add 'K', 'J'
       # -ger-,  -gy-
-      elsif g_ger_or_gy?(w, i)
+      when g_ger_or_gy?(w, i)
         code.add 'K', 'J'
-      # italian e.g, 'biaggi'
-      elsif w[i + 1] =~ /[EIY]/ || (i > 0 && w[i - 1, 4] =~ /[AO]GGI/)
-        if w[0, 4] =~ /^(VAN |VON |SCH)/ || w[i + 1, 2] == 'ET'
+      when g_italian?(w, i)
+        if w[0, 4] =~ /^(V[AO]N\s|SCH)/ || w[i + 1, 2] == 'ET'
           code.add 'K', 'K'
         elsif w[i + 1, 4] =~ /IER\s/
           code.add 'J', 'J'
@@ -219,8 +217,8 @@ module Phonetic
     def self.encode_h(w, i, len, code)
       r = 1
-      # only keep if first & before vowel or btw. 2 vowels
-      if (i == 0 || i > 0 && vowel?(w[i - 1])) && vowel?(w[i + 1])
+      # keep if btw. 2 vowels
+      if i > 0 && vowel?(w[i - 1]) && vowel?(w[i + 1])
         code.add 'H', 'H'
         r += 1
       end
@@ -307,39 +305,27 @@ module Phonetic
     def self.encode_s(w, i, len, code)
       r = 1
       last = len - 1
+      case
       # special cases 'island', 'isle', 'carlisle', 'carlysle'
-      if i > 0 && w[i - 1, 3] =~ /[IY]SL/
-      # special case 'sugar-'
-      elsif i == 0 && w[i, 5] == 'SUGAR'
-        code.add 'X', 'S'
-      elsif w[i, 2] == 'SH'
-        # germanic
-        if w[i + 1, 4] =~ /H(EIM|OEK|OL[MZ])/
-          code.add 'S', 'S'
-        else
-          code.add 'X', 'X'
-        end
-        r += 1
+      when i > 0 && w[i - 1, 3] =~ /[IY]SL/
+      when w[i, 2] == 'SH'
+        r += encode_sh(w, i, code)
       # italian & armenian
-      elsif w[i, 3] =~ /SI[OA]/
+      when w[i, 3] =~ /SI[OA]/
         if !slavo_germanic?(w)
           code.add 'S', 'X'
         else
           code.add 'S', 'S'
         end
         r += 2
-      # german & anglicisations, e.g. 'smith' match 'schmidt',
-      # 'snider' match 'schneider' also, -sz- in slavic language altho in
-      # hungarian it is pronounced 's'
-      elsif i == 0 && w[i + 1] =~ /[MNLW]/ || w[i + 1] == 'Z'
+      # -sz- in slavic language altho in hungarian it is pronounced 's'
+      when w[i, 2] == 'SZ'
         code.add 'S', 'X'
-        r += 1 if w[i + 1] == 'Z'
-      elsif w[i, 2] == 'SC'
-        encode_sc(w, i, code)
-        r += 2
-      # french e.g. 'resnais', 'artois'
+        r += 1
+      when w[i, 2] == 'SC'
+        r += encode_sc(w, i, code)
       else
-        if i == last && i > 1 && w[i - 2, 2] =~ /[AO]I/
+        if s_french?(w, i, last)
           code.add '', 'S'
         else
           code.add 'S', 'S'
@@ -377,18 +363,9 @@ module Phonetic
         code.add 'R', 'R'
         r += 1
       else
-        if i == 0 && (vowel?(w[i + 1]) || w[i, 2] == 'WH')
-          # Wasserman should match Vasserman
-          if vowel?(w[i + 1])
-            code.add 'A', 'F'
-          else
-            # need Uomo to match Womo
-            code.add 'A', 'A'
-          end
-        end
         # Arnow should match Arnoff
         if i == last && i > 0 && vowel?(w[i - 1]) ||
-           i > 0 && w[i - 1, 5] =~ /EWSKI|EWSKY|OWSKI|OWSKY/ ||
+           i > 0 && w[i - 1, 5] =~ /[EO]WSK[IY]/ ||
            w[0, 3] == 'SCH'
           code.add '', 'F'
         elsif w[i, 4] =~ /WICZ|WITZ/
@@ -432,9 +409,6 @@ module Phonetic
       # find 'michael'
       when i > 0 && w[i, 4] == 'CHAE'
         code.add 'K', 'X'
-      # greek roots e.g. 'chemistry', 'chorus'
-      when ch_greek_roots?(w, i)
-        code.add 'K', 'K'
       # germanic, greek, or otherwise 'ch' for 'kh' sound
       when ch_germanic_or_greek?(w, i, len)
         code.add 'K', 'K'
@@ -446,6 +420,7 @@ module Phonetic
       else
         code.add 'X', 'K'
       end
+      1
     end
     def self.encode_cc(w, i, code)
@@ -470,19 +445,12 @@ module Phonetic
     def self.encode_gh(w, i, code)
       if i > 0 && !vowel?(w[i - 1])
         code.add 'K', 'K'
-      elsif i == 0
-        # ghislane, ghiradelli
-        if w[i + 2] == 'I'
-          code.add 'J', 'J'
-        else
-          code.add 'K', 'K'
-        end
       # Parker's rule (with some further refinements)
       elsif !(i > 1 && w[i - 2] =~ /[BHD]/ || # e.g., 'hugh'
               i > 2 && w[i - 3] =~ /[BHD]/ || # e.g., 'bough'
-              i > 3 && w[i - 4] =~ /[BH]/)  # e.g., 'broughton'
+              i > 3 && w[i - 4] =~ /[BH]/)    # e.g., 'broughton'
         # e.g., 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'
-        if i > 2 && w[i - 1] == 'U' && w[i - 3] =~ /[CGLRT]/
+        if i > 2 && w[i - 3, 3] =~ /[CGLRT].U/
           code.add 'F', 'F'
         elsif i > 0 && w[i - 1] != 'I'
           code.add 'K', 'K'
@@ -501,6 +469,16 @@ module Phonetic
       end
     end
+    def self.encode_sh(w, i, code)
+      # germanic
+      if w[i + 1, 4] =~ /H(EIM|OEK|OL[MZ])/
+        code.add 'S', 'S'
+      else
+        code.add 'X', 'X'
+      end
+      1
+    end
     def self.encode_sc(w, i, code)
       # Schlesinger's rule
       if w[i + 2] == 'H'
@@ -520,6 +498,7 @@ module Phonetic
       else
         code.add 'SK', 'SK'
       end
+      2
     end
     def self.slavo_germanic?(w)
@@ -532,15 +511,7 @@ module Phonetic
     def self.c_germanic?(w, i)
       # various germanic
-      i > 1 &&
-      !vowel?(w[i - 2]) &&
-      w[i - 1, 3] == 'ACH' &&
-      (w[i + 2] !~ /[IE]/ || w[i - 2, 6] =~ /[BM]ACHER/)
-    end
-    def self.ch_greek_roots?(w, i)
-      # greek roots e.g. 'chemistry', 'chorus'
-      i == 0 && w[1, 5] =~ /^H(ARAC|ARIS|OR|YM|IA|EM)/ && w[0, 5] != 'CHORE'
+      i > 1 && w[i - 2, 6] =~ /(^[^AEIOUY]ACH[^IE])|([BM]ACHER)/
     end
     def self.ch_germanic_or_greek?(w, i, len)
@@ -562,6 +533,11 @@ module Phonetic
       !(i > 0 && w[i - 1, 3] =~ /[RO]GY/)
     end
+    def self.g_italian?(w, i)
+      # italian e.g, 'biaggi'
+      w[i + 1] =~ /[EIY]/ || (i > 0 && w[i - 1, 4] =~ /[AO]GGI/)
+    end
     def self.j_spanish_pron?(w, i)
       # spanish pron. of e.g. 'bajador'
       i > 0 && vowel?(w[i - 1]) && !slavo_germanic?(w) && w[i + 1] =~ /[AO]/
@@ -582,6 +558,11 @@ module Phonetic
       !(i > 3 && w[i - 4, 2] =~ /M[EA]/)
     end
+    def self.s_french?(w, i, last)
+      # french e.g. 'resnais', 'artois'
+      i == last && i > 1 && w[i - 2, 2] =~ /[AO]I/
+    end
     def self.x_french?(w, i, last)
       # french e.g. breaux
       i == last && (i > 2 && w[i - 3, 3] =~ /[IE]AU/ || i > 1 && w[i - 2, 2] =~ /[AO]U/)