RubyGems - translit_kit - Versions diffs - 0.9 - Mend

translit_kit 0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

checksums.yaml +7 -0
data/MIT-LICENSE +20 -0
data/README.md +86 -0
data/Rakefile +29 -0
data/lib/hebrewword.rb +60 -0
data/lib/permuter.rb +97 -0
data/lib/phoneme_maps.rb +80 -0
data/lib/phoneme_maps/long.json +41 -0
data/lib/phoneme_maps/short.json +39 -0
data/lib/phoneme_maps/single.json +40 -0
data/lib/phonemizer.rb +170 -0
data/lib/readme.md +120 -0
data/lib/translit_kit.rb +2 -0
data/lib/translit_kit/version.rb +3 -0
data/lib/transliterator.rb +115 -0
data/test/dummy/README.rdoc +28 -0
data/test/dummy/Rakefile +6 -0
data/test/dummy/app/assets/javascripts/application.js +13 -0
data/test/dummy/app/assets/stylesheets/application.css +15 -0
data/test/dummy/app/controllers/application_controller.rb +5 -0
data/test/dummy/app/helpers/application_helper.rb +2 -0
data/test/dummy/app/views/layouts/application.html.erb +14 -0
data/test/dummy/bin/bundle +3 -0
data/test/dummy/bin/rails +4 -0
data/test/dummy/bin/rake +4 -0
data/test/dummy/bin/setup +34 -0
data/test/dummy/bin/update +29 -0
data/test/dummy/config.ru +4 -0
data/test/dummy/config/application.rb +15 -0
data/test/dummy/config/boot.rb +3 -0
data/test/dummy/config/cable.yml +9 -0
data/test/dummy/config/database.yml +25 -0
data/test/dummy/config/environment.rb +5 -0
data/test/dummy/config/environments/development.rb +54 -0
data/test/dummy/config/environments/production.rb +86 -0
data/test/dummy/config/environments/test.rb +42 -0
data/test/dummy/config/initializers/application_controller_renderer.rb +6 -0
data/test/dummy/config/initializers/assets.rb +11 -0
data/test/dummy/config/initializers/backtrace_silencers.rb +7 -0
data/test/dummy/config/initializers/cookies_serializer.rb +5 -0
data/test/dummy/config/initializers/filter_parameter_logging.rb +4 -0
data/test/dummy/config/initializers/inflections.rb +16 -0
data/test/dummy/config/initializers/mime_types.rb +4 -0
data/test/dummy/config/initializers/new_framework_defaults.rb +23 -0
data/test/dummy/config/initializers/session_store.rb +3 -0
data/test/dummy/config/initializers/wrap_parameters.rb +14 -0
data/test/dummy/config/locales/en.yml +23 -0
data/test/dummy/config/puma.rb +47 -0
data/test/dummy/config/routes.rb +3 -0
data/test/dummy/config/secrets.yml +22 -0
data/test/dummy/config/spring.rb +6 -0
data/test/dummy/db/test.sqlite3 +0 -0
data/test/dummy/log/test.log +85939 -0
data/test/dummy/public/404.html +67 -0
data/test/dummy/public/422.html +67 -0
data/test/dummy/public/500.html +66 -0
data/test/dummy/public/favicon.ico +0 -0
data/test/hebrewword_test.rb +45 -0
data/test/permuter_test.rb +53 -0
data/test/phoneme_maps_test.rb +29 -0
data/test/phonemizer_test.rb +209 -0
data/test/test_helper.rb +29 -0
data/test/transliterator_test.rb +75 -0
metadata +155 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 7280e8d3c76a0d829c9675616aef2913f879c9ad
+  data.tar.gz: 11188ff85dfe552a5057ad0bb7ca4003725343b8
+SHA512:
+  metadata.gz: 831e6a6bb8c98af691721f055d74be5d415bfda3fe694f194133893218ae29af6df25b288b9bfeea656dc144a960c22c3dbe0d479afdf279c529676c5d0c9275
+  data.tar.gz: 18d2d3eee2be1383118ee6fd56547bc9974c74547a7bf4aed8bff8a9e3f08397409855f7495e16507bf6a7cb6da08f00aed17191f128a07218e4839525ac8582

data/MIT-LICENSE ADDED Viewed

@@ -0,0 +1,20 @@
+Copyright 2017 Michoel Samuels
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,86 @@
+# TranslitKit
+[![Build Status](https://travis-ci.org/AnalyzePlatypus/TranslitKit.svg?branch=master)](https://travis-ci.org/AnalyzePlatypus/TranslitKit)
+[![Code Climate](https://codeclimate.com/github/AnalyzePlatypus/TranslitKit/badges/gpa.svg)](https://codeclimate.com/github/AnalyzePlatypus/TranslitKit)
+[![Coverage Status](https://coveralls.io/repos/github/AnalyzePlatypus/TranslitKit/badge.svg?branch=master)](https://coveralls.io/github/AnalyzePlatypus/TranslitKit?branch=master)
+[![Inline docs](http://inch-ci.org/github/AnalyzePlatypus/TranslitKit.svg?branch=master)](http://inch-ci.org/github/AnalyzePlatypus/TranslitKit)
+*TranslitKit* is a framework for Hebrew-English transliteration.
+Example:
+```ruby
+  require 'translit_kit'
+  word = HebrewWord.new "אַברָהָם"
+  word.transliterate(:single)
+  # => ["avrohom"]
+  # Shortcut
+  word.t(:single)
+  # => ["avrohom"]
+```
+Transliteration is powered by _phoneme maps_, files that map between Hebrew _phonemes_, or units of sound, and English characters. (see below)
+Three `phoneme_maps` are provided: `:long`, `:short`, and `:single`.
+You can easily add your own (see below)
+```ruby
+word.t(:single)
+# => ["avrohom"]
+word.t(:short)
+# => ["avroom", "avroam", "avroem", "avrohom", "avroham",
+# "avrohem", "avraom", "avraam", "avraem", "avrahom",
+# "avraham", "avrahem", "avreom", "avream", "avreem",
+# "avrehom", "avreham", "avrehem" ]
+word.t(:long)
+# => ["avroom", "avrooom", "avroohm", ... ] # 5,997 more!
+```
+The default is `:short`:
+```ruby
+  word.t == word.t(:short)
+  # => true
+```
+To get the total permutation count, call `HebrewWord#inspect`
+```ruby
+word.inspect
+# => "אַברָהָם: Permutations: 1 single | 18 short | 6000 long"
+```
+## Adding Custom Phoneme maps
+###### Format
+_Phoneme Maps_ are simply JSON files, placed in the `lib/phoneme_maps` directory.
+The file should map between each `String` (the phonemes) and an `Array`s of replacement characters.
+```json
+{
+  "ב": ["v"],
+  "בּ": ["b", "bb"]
+}
+```
+A _phoneme_ can be a Hebrew character `א`, _nekuda_ (`ָ`), or character with modifiers, such as a _dagesh_ (`בּ`). Keep in mind that many characters will be normalized (see below).
+###### Installation
+To install your custom map, place the file in `lib/resources`
+Your file will be available as the symbol`:<filename>` without the `.json` extension.
+Example: `klingon.json` becomes `:klingon`
+Now you can use it anywhere:
+```ruby
+  word.transliterate(:klingon)
+  # => (Results)
+```
+At present, your map will not display results in `HebrewWord#inspect`
+## Appendix: Pre-Processing
+When a word is transliterated, it is pre-processed to normalize certain characters.
+Specifically:
+* Whitespace is stripped
+* The final letters `[םןךףץ]` are normalized to their standard forms
+* _CHATAF_ _nekudos_ `['ֲ','ֳ','ֱ']` are normalized to their standard forms
+* Full _CHIRIK_, _TZEIREI_, and _CHOLOM_ _nekudos_ have their letters removed
+* _DAGESH_ characters are removed from all but the characters `[בוכפת]`

data/Rakefile ADDED Viewed

@@ -0,0 +1,29 @@
+begin
+  require 'bundler/setup'
+rescue LoadError
+  puts 'You must `gem install bundler` and `bundle install` to run rake tasks'
+end
+require 'rdoc/task'
+RDoc::Task.new(:rdoc) do |rdoc|
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.title    = 'TranslitKit'
+  rdoc.options << '--line-numbers'
+  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('lib/**/*.rb')
+end
+Bundler::GemHelper.install_tasks
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |t|
+  t.libs << 'lib'
+  t.libs << 'test'
+  t.pattern = 'test/**/*_test.rb'
+  t.verbose = false
+end
+task default: :test

data/lib/hebrewword.rb ADDED Viewed

@@ -0,0 +1,60 @@
+=begin
+  HebrewWord.rb
+  Wraps a Hebrew word.
+  Methods:
+    * raw -> returns the original word
+    * to_s -> Alias to `raw`
+    * phonemes -> Returns an Array of phonemes (see Class::Phonemizer)
+    * transliterate(list_name) -> Returns as Array of transliterated strings
+    * t -> Alias for `transliterate`
+    * inspect -> Returns an informative string of the original Hebrew, and the available translit counts
+=end
+require 'phoneme_maps'
+require 'phonemizer'
+require 'transliterator'
+# The user-facing transliterator class
+class HebrewWord
+  # Initializer
+  # Expects a Unicode Hebrew word (i.e. "עַקֵדָה")
+  def initialize string
+    @hebword = string
+  end
+  # Get the raw Hebrew text of the word (Included NIKUD)
+  def raw
+    @hebword
+  end
+  # Alias of `raw`
+  def to_s
+    raw
+  end
+  # Returns a `String` of format:
+  # `hebrew_text`: Permutations: `x` single | `y` short | `z` long
+  def inspect
+    "#{@hebword}: Permutations: #{transliterate(:single).length} single | #{transliterate(:short).length} short | #{transliterate(:long).length} long"
+  end
+  def phonemes
+    Phonemizer.new(@hebword).phonemes
+  end
+  # Return an `Array` of all possible transliterations of the word
+  # As defined in the optional `list_name` argument. options: [:long, :short, :single]
+  # Default is `:short`
+  def transliterate list_name = nil
+    Transliterator.new(@hebword, list_name).transliterate
+  end
+  # Alias for #transliterate
+  def t list_name = nil
+    transliterate list_name
+  end
+end

data/lib/permuter.rb ADDED Viewed

@@ -0,0 +1,97 @@
+=begin
+  Permuter.rb
+   Encapsulates the logic of creating permutations
+   Usage:
+    p = Permuter.new
+    p.add_array [0,1]
+    p.add_array [0,1]
+    p.permutations
+    => ["00","01","10","11"]
+   Methods:
+    #add_array arr
+    #permutations
+    #any?
+    #empty?
+    #clear
+  Test Suite:
+     Complete
+=end
+class Permuter
+  def initialize
+    @arrays = []
+  end
+  # Add an array to be permuted
+  # Raises an error if given nil
+  def add_array arr
+    raise "Cannot add nil array" if arr == nil
+    @arrays << arr
+  end
+  # Remove all arrays
+  def clear
+    @arrays = []
+  end
+  def any?
+    @arrays.any?
+  end
+  def empty?
+    @arrays.empty?
+  end
+  # Get all permutations of the previously registered arrays
+  # Returns an array of strings,
+  # or an empty array if none were registered
+  def permutations
+    return [] if @arrays.empty?
+    @permutations = []
+    permute []
+    @permutations
+  end
+  private
+  # permute (indices)
+  # Recursively generate every permutation of the arrays (Courtesy of Ari Fordsham)
+  #
+  # The classic recursive permutation algorithm:
+  # Imagine picking a combination lock: [0][0][0]
+  # Each cylinder is the index to one of the arrays
+  # On each recursion, we add another cylinder [0], [0][0], [0][0][0]
+  # When we have enough cylinders, we generate the permutation (base case)
+  # and iterate to the next value by dropping a cylinder, [0][0]
+  # iterating the loop in else, and recursing again  [0][0][1]
+  # Simple and elegant
+  def permute indices
+    # Base case
+    # puts "permute(#{indices})"
+    if indices.length == @arrays.length # If the set of indices is complete
+      # Build a `String` based on the completed set of indices
+      build_permutation indices
+    else
+      # Otherwise, add a cylinder, iterate through its options;
+      # If it's the final cylinder it will trigger the base case on every option and return;
+      # If it's not, it will also trigger this case and iterate through the options of the next cylinder.
+      @arrays[indices.length].each_with_index do |item,i|
+        permute indices.dup << i
+      end
+    end
+  end
+  def build_permutation indices
+    permutation = []
+    indices.each_with_index do |item_code,i|
+      permutation << @arrays[i][item_code]
+    end
+    result = permutation.join('')
+    @permutations << result
+    result
+  end
+end

data/lib/phoneme_maps.rb ADDED Viewed

@@ -0,0 +1,80 @@
+=begin
+  PhonemeMaps.rb
+  Loads phoneme_map files
+  Lazily loads by default;
+  The file is loaded on the first method call,
+  and is cached for future calls.
+  For eager loading, pass true in the initializer.
+  Methods:
+    * initialize(eager?)   ->
+    * long
+    * short
+    * single
+    * loaded? (:list_name)
+  Test Suite:
+    Complete
+=end
+require 'json'
+lib_directory = File.dirname(__FILE__)
+FILE_DIRECTORY  = "#{lib_directory}/phoneme_maps"
+class PhonemeMaps
+# Takes a symbol, converts it into a file name,
+# And attempts to load its contents
+# Returns a Hash
+def load symbol
+  load_file "#{FILE_DIRECTORY}/#{symbol.to_s}.json"
+end
+# What directory are we searching in?
+def directory
+  FILE_DIRECTORY
+end
+# Parses a string into JSON
+# Raises an informative error if the JSON is malformed
+def validate_json text
+  begin
+    return JSON.parse text
+  rescue JSON::ParserError
+      raise "JSON is not formatted properly.\nTry validating it at JSONlint.com (Look out for missing braces and missing/extra commas)\n File contents: #{text}"
+  end
+end
+# Opens a file with `File.open`
+# Raises an informative error if the file cannot be found
+def open_file_safely path
+  dir = path[0..path.rindex('/')]
+  filename = path[ (path.rindex('/') + 1)..path.length ]
+  begin
+      return File.open path, 'r'
+  rescue Errno::ENOENT
+      raise "Unknown list name. Could not find file `#{filename}` in directory `#{dir}`.\n
+             Is the file name spelled correctly, or altered somewhere in your code?\n
+             Contents of directory:
+             #{Dir.new(dir).entries}"
+  end
+end
+private
+  # Loads the file from the supplied path,
+  # and parses it with `JSON.parse`
+  # Returns a hash
+  def load_file path
+    text = ""
+    open_file_safely(path).
+      each_line(){|line| text << line }.
+      close
+    validate_json text
+  end
+end

data/lib/phoneme_maps/long.json ADDED Viewed

@@ -0,0 +1,41 @@
+{
+		"א": ["", "a"],
+		"ב": ["v", "bb"],
+		"בּ": ["v", "b", "bb"],
+		"ג": ["g", "gg"],
+		"ד": ["d", "dd"],
+		"ה": ["", "h"],
+		"ו": ["v", "w"],
+		"וּּ": ["oo", "ou"],
+		"ז": ["z", "s", "zz", "ss"],
+		"ח": ["ch", "h", "kh"],
+		"חַ": ["ach"],
+		"ט": ["t", "tt", "th"],
+		"י": ["y"],
+		"כ": ["ch", "h", "k", "c", "kk", "cc"],
+		"כּ": ["k", "c", "kk", "cc"],
+		"ל": ["l", "ll"],
+		"מ": ["m", "mm"],
+		"נ": ["n", "nn"],
+		"ס": ["s", "ss"],
+		"ע": [""],
+		"פ": ["f", "ff", "ph", "p", "pp"],
+		"פּ": ["p", "pp"],
+		"צ": ["ts", "tz", "s", "z"],
+		"ק": ["k", "c", "kk", "cc"],
+		"ר": ["r", "rr"],
+		"שׁ": ["sh", "ss", "s", "ch", "sch"],
+		"ש": ["s", "ss", "sh"],
+		"ת": ["s", "ss", "t", "tt", "th"],
+		"תּ": ["t", "t", "th"],
+		"ָ": ["o", "oo", "oh", "a", "ah", "aa", "e", "ee", "i", "a"],
+		"ַ": ["a", "o", "ah", "oh", ""],
+		"ֵ": ["e", "ei", "ey", "eh", "ay", "ai", ""],
+		"ֶ": ["e", "eh", "ei"],
+		"ִ": ["i", "e", "ee"],
+		"ֹ": ["o", "oh", "oi", "oy", "ey", "ow"],
+		"וֹ": ["o", "oh", "oi", "oy", "ey", "ow"],
+		"וּ": ["u", "oo", "i", "ee"],
+		"ֻ": ["u", "oo", "i", "ee"],
+		"ְ": ["u", "o", "e"]
+}

data/lib/phoneme_maps/short.json ADDED Viewed

@@ -0,0 +1,39 @@
+{
+  "א": [""],
+  "ב": ["v"],
+  "בּ": ["b","bb"],
+  "ג": ["g","gg"],
+  "ד": ["d","dd"],
+  "ה": ["","h"],
+  "ו": ["v"],
+  "ז": ["z","zz"],
+  "ח": ["ch"],
+  "חַ": ["ach"],
+  "ט": ["t","tt"],
+  "י": ["y",""],
+  "כ": ["ch"],
+  "כּ": ["k","c","kk","cc"],
+  "ל": ["l","ll"],
+  "מ": ["m","mm"],
+  "נ": ["n","nn"],
+  "ס": ["s","ss"],
+  "ע": ["a"],
+  "פ": ["f","ff","ph"],
+  "פּ": ["p","pp"],
+  "צ": ["ts","tz","tez","z"],
+  "ק": ["k","kk"],
+  "ר": ["r"],
+  "שׁ": ["sh"],
+  "ש": ["s","ss"],
+  "ת": ["s","ss","th","t"],
+  "תּ": ["t","tt"],
+  "ָ": ["o", "a", "e"],
+  "ַ": ["a"],
+  "ֵ": ["ay","ai","e","ei"],
+  "ֶ": ["e","a"],
+  "ִ": ["i","ee"],
+  "ֹ": ["a","o",""],
+  "וּ": ["u","oo","eu"],
+  "ֻ": ["u","oo","eu"],
+  "ְ": ["a","e","i","'"]
+}