RubyGems - emoji_data - Versions diffs - 0.1.0 → 0.2.0.rc1 - Mend

emoji_data 0.1.0 → 0.2.0.rc1

Files changed (16) hide show

checksums.yaml +4 -4
data/.editorconfig +18 -0
data/.gitattributes +2 -0
data/.travis.yml +5 -0
data/.yardopts +2 -0
data/CHANGELOG.md +29 -4
data/{LICENSE.txt → LICENSE} +0 -0
data/README.md +58 -30
data/emoji_data.gemspec +8 -6
data/lib/emoji_data.rb +138 -28
data/lib/emoji_data/emoji_char.rb +72 -16
data/lib/emoji_data/version.rb +2 -1
data/scripts/benchmark.rb +70 -0
data/spec/emoji_char_spec.rb +15 -9
data/spec/emoji_data_spec.rb +40 -9
metadata +43 -10

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: e42e74736faf460d0722d647c34ab68dad7130e8
-  data.tar.gz: d1c42bfa36cda5c7addc7079bf4daef60bf17a6f
+  metadata.gz: 7385039bbd2cb55d93480a7d389c88fc8f47bbfa
+  data.tar.gz: 481273df89feb32b0c6d7178711bb6485e110538
 SHA512:
-  metadata.gz: 20929861bc5b903569576f8cdca11d046ce970def2e1221d1eb31ad9a7fac7ad2e6f9b64485c17a49e87e12e3aaccecfb006bebd8f25662268a335435420121a
-  data.tar.gz: 501d44516538071d5df7b2c37c32c2e933661c68699a3d65d12872a88ad5411f11e26f419a292f51230fa2a155893cddb0748c43ee7218db492dd38f860fbea5
+  metadata.gz: 6be855ddc07996303eef279b6c840a91a27da97774635e71c500f184b14a0fe4e30977dc0cadb1b48cb7a9f4ff465bae6613fd7b93d1101ee785245bb69097ea
+  data.tar.gz: 9a5c308587f581f2a500ac7686664ebe9ab86c103022c372fccf5d3c5a51b359992b9fec612ba6323e805d633d84470ce825ac1f7a945da3b94348a8ad4f087b

data/.editorconfig ADDED

@@ -0,0 +1,18 @@
+# EditorConfig helps developers define and maintain consistent
+# coding styles between different editors and IDEs
+# editorconfig.org
+root = true
+[*]
+indent_style = space
+indent_size = 2
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+[*.md]
+trim_trailing_whitespace = false

data/.gitattributes ADDED

	@@ -0,0 +1,2 @@
1	+ * text=auto
2	+

data/.travis.yml CHANGED

@@ -5,7 +5,12 @@ rvm:
  - 1.9.3
  - 2.0.0
  - 2.1.0
+ - 2.1.1
+ - ruby-head
+ - jruby
 matrix:
   allow_failures:
     - rvm: 1.8.7
+    - rvm: 1.9.2
+    - rvm: ruby-head

data/.yardopts ADDED

	@@ -0,0 +1,2 @@
1	+ --markup markdown
2	+

data/CHANGELOG.md CHANGED

@@ -1,11 +1,33 @@
 # Changelog
+## 0.2.0 (TBD)
+ * Rename a number of methods to be clearer and more consistent with that they
+   actually do:
+     - `EmojiChar.char()` → `EmojiChar.render()`
+     - `EmojiData.find_by_unified()` → `EmojiData.from_unified()`
+     - `EmojiData.find_by_str()` → `EmojiData.scan()`
+   Don't worry, the old names are still aliased in so you don't have to change
+   anything in your existing code.  This change is make things clearer for
+   people new to the library.
+ * Add new `.from_short_name()` library method for fast keyword lookups.
+ * DEVELOPERS: Internal code cleanup and better comments.
+ * DEVELOPERS: Add benchmark suite for comparing method implementation time
+   across versions of this library.
 ## 0.1.0 (3 May 2014)
  * Add support for Unicode variant encodings, used by MacOSX 10.9 / iOS 7.
    - For more info: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
-   - By default, `EmojiChar.to_s()` and `.char()` will now use the variant encoding.
- * With adding support for variants, the speed of `find_by_str` regressed by approximately 20% (because there are more codepoints to match against). To counter this, we switched to a Regex based scan than improves performance of the method by over 250x(!).  A complete sorted search against 1000 strings now takes ~2ms where before it would take around a half second.
+   - By default, `EmojiChar.to_s()` and `.char()` will now use the variant
+     encoding.
+ * With adding support for variants, the speed of `find_by_str` regressed by
+   approximately 20% (because there are more codepoints to match against). To
+   counter this, we switched to a Regex based scan than improves performance of
+   the method by over 250x(!).  A complete sorted search against 1000 strings
+   now takes ~2ms where before it would take around a half second.
  * Import latest version of iamcal/emoji-data.
  * 100% test coverage. :sunglasses:
@@ -13,9 +35,12 @@
  * On initialization, create hashmaps to cache lookups for `.find_by_unified()`.
-   In a quick benchmark in MRI 2.1.1, this reduces the time needed for one million lookups from `13.5s` to `0.3s`!
+   In a quick benchmark in MRI 2.1.1, this reduces the time needed for one
+   million lookups from `13.5s` to `0.3s`!
-   This is only for lookup by unified ID for now, since the other `find_by_*()` methods are actually searches that can return multiple values.  I'll look at nested hashmaps for those if there is a pressing performance need later.
+   This is only for lookup by unified ID for now, since the other `find_by_*()`
+   methods are actually searches that can return multiple values.  I'll look at
+   nested hashmaps for those if there is a pressing performance need later.
 ## 0.0.2 (3 December 2013)

data/{LICENSE.txt → LICENSE} RENAMED

File without changes

data/README.md CHANGED

@@ -3,21 +3,29 @@
 [![Gem Version](http://img.shields.io/gem/v/emoji_data.svg?style=flat)](https://rubygems.org/gems/emoji_data)
 [![Build Status](http://img.shields.io/travis/mroth/emoji_data.rb.svg?style=flat)](https://travis-ci.org/mroth/emoji_data.rb)
 [![Dependency Status](http://img.shields.io/gemnasium/mroth/emoji_data.rb.svg?style=flat)](https://gemnasium.com/mroth/emoji_data.rb)
-[![CodeClimate Status](http://img.shields.io/codeclimate/github/mroth/emoji_data.rb.svg?style=flat)](https://codeclimate.com/github/mroth/emoji_data.rb)
 [![Coverage Status](http://img.shields.io/coveralls/mroth/emoji_data.rb.svg?style=flat)](https://coveralls.io/r/mroth/emoji_data.rb)
+Ruby library providing low level operations for dealing with Emoji
+glyphs in the Unicode standard. :cool:
-Provides classes and helpers for dealing with emoji character data as unicode.  Wraps a library of all known emoji characters and provides convenience methods.
+EmojiData is like a swiss-army knife for dealing with Emoji encoding issues. If
+all you need to do is translate `:poop:` into :poop:, then there are plenty of
+other libs out there that will probably do what you want.  But once you are
+dealing with Emoji as a fundamental part of your application, and you start to
+realize the nightmare of [doublebyte encoding][doublebyte] or
+[variants][variant], then this library may be your new best friend.
+:raised_hands:
-Note, this is mostly useful for low-level operations.  If you can avoid having to deal with unicode character data extensively and just want to encode/decode stuff, [rumoji](https://github.com/mwunsch/rumoji) might be a better bet for you.  If however, you are doing anything complicated involving emoji encoding/decoding, or you are just obsessed with understanding the details, this library is your new best friend.
+EmojiData is used in production by [Emojitracker.com][emojitracker] to parse
+well over 100M+ emoji glyphs daily. :dizzy:
-This library currently uses `iamcal/emoji-data` as it's dataset, and thus considers it to be the "source of truth" regarding certain things, such as how to represent doublebyte unified codepoint IDs as strings (seperated by a dash).
-This is basically a helper library for my [emojitrack](https://github.com/mroth/emojitrack) and [emojistatic](https://github.com/mroth/emojistatic) projects, but may be useful for other people.
+[doublebyte]: http://www.quora.com/Why-does-using-emoji-reduce-my-SMS-character-limit-to-70
+[variant]: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
+[emojitracker]: http://www.emojitracker.com
 ## Installation
-Add this line to your application's Gemfile:
+Add this line to your application's `Gemfile`:
     gem 'emoji_data'
@@ -29,42 +37,62 @@ Or install it yourself as:
     $ gem install emoji_data
-Currently requires `RUBY_VERSION >= 1.9.2`.
-## Library Usage
+Currently requires `RUBY_VERSION >= 1.9.3`.
-Pretty straightforward, read the source.  But here are some things you might care about:
+## Usage
-### EmojiData
+### Documentation
+Full API documentation is available via YARD or here:
+http://rubydoc.info/github/mroth/emoji_data.rb/master/frames
-  The `EmojiData` module provides some convenience methods for dealing with the library of known emoji characters.  Check out the source to see what's up.
+### Examples
+Here are some examples of the type of stuff you can do:
-Some notable methods to call out:
+```irb
+>> require 'emoji_data'
+=> true
- - `EmojiData.find_by_unified(id)` gives you a quick way to grab a specific EmojiChar.
+>> EmojiData.from_unified('1f680')
+=> #<EmojiData::EmojiChar:0x007f8fdba33b40 @variations=[], @name="ROCKET",
+@unified="1F680", @docomo=nil, @au="E5C8", @softbank="E10D", @google="FE7ED",
+@image="1f680.png", @sheet_x=25, @sheet_y=4, @short_name="rocket",
+@short_names=["rocket"], @text=nil, @apple_img=true, @hangouts_img=true,
+@twitter_img=true>
-		>> EmojiData.find_by_unified('1f680')
-	 	=> #<EmojiData::EmojiChar:0x007fd455ab2ff8 @name="ROCKET", @unified="1F680", @docomo="", @au="E5C8", @softbank="E10D", @google="FE7ED", @image="1f680.png", @sheet_x=21, @sheet_y=28, @short_name="rocket", @short_names=["rocket"], @text=nil>
+>> EmojiData.all.count
+=> 845
- - `EmojiData.find_by_name(name)` and `.find_by_short_name(name)` do pretty much what you'd expect:
+>> EmojiData.all_with_variants.count
+=> 107
-		>> EmojiData.find_by_name('thumb')
-		=> [#<EmojiData::EmojiChar:0x007f9db214a558 @name="THUMBS UP SIGN", @unified="1F44D", @docomo="E727", @au="E4F9", @softbank="E00E", @google="FEB97", @image="1f44d.png", @sheet_x=10, @sheet_y=17, @short_name="+1", @short_names=["+1", "thumbsup"], @text=nil>, #<EmojiData::EmojiChar:0x007f9db2149720 @name="THUMBS DOWN SIGN", @unified="1F44E", @docomo="E700", @au="EAD5", @softbank="E421", @google="FEBA0", @image="1f44e.png", @sheet_x=10, @sheet_y=18, @short_name="-1", @short_names=["-1", "thumbsdown"], @text=nil>]
+>> EmojiData.find_by_short_name("moon").count
+=> 13
- - `EmojiData.char_to_unified(char)` takes a string containing a unified unicode representation of an emoji character and gives you the unicode ID.
+>> EmojiData.all.select(&:doublebyte?).map(&:short_name)
+=> ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven",
+"eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
-		>> EmojiData.char_to_unified('🚀')
-		=> "1F680"
+>> EmojiData.find_by_name("tree").map { |c| [c.unified, c.name, c.render] }
+=> [["1F332", "EVERGREEN TREE", "🌲"], ["1F333", "DECIDUOUS TREE", "🌳"],
+["1F334", "PALM TREE", "🌴"], ["1F384", "CHRISTMAS TREE", "🎄"], ["1F38B",
+"TANABATA TREE", "🎋"]]
- - `EmojiData.all` will return an array of all known EmojiChars, so you can map or do whatever funky Enumerable stuff you want to do across the entire character set.
+>> EmojiData.scan("I ♥ when marketers talk about the ☁. #blessed").each do |ec|
+?>   puts "Found some #{ec.short_name}!"
+>> end
+Found some hearts!
+Found some cloud!
+=> [...]
+```
- 		#gimmie the shortname of all doublebyte chars
- 		>> EmojiData.all.select(&:doublebyte?).map(&:short_name)
-		=> ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
+## Contributing
+Please be sure to run `rake spec` and help keep test coverage at :100:.
-### EmojiData::EmojiChar
+There is a full benchmark suite available via `scripts/benchmark.rb`.  Please
+run before and after your changes to ensure you have not caused a performance
+regression.
-  `EmojiData::EmojiChar` is a class representing a single emoji character.  All the variables from the `iamcal/emoji-data` dataset have dynamically generated getter methods.
+## License
-There are some additional convenience methods, such as `#doublebyte?` etc. Most important addition is the `#char` method which will output a properly unicode encoded string containing the character.
+[The MIT License (MIT)](LICENSE)

data/emoji_data.gemspec CHANGED

@@ -18,11 +18,13 @@ Gem::Specification.new do |spec|
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
   spec.require_paths = ["lib"]
-  spec.add_development_dependency "bundler", "~> 1.3"
-  spec.add_development_dependency "rake"
-  spec.add_development_dependency "rspec"
-  spec.add_development_dependency 'simplecov', '~> 0.7.1'
-  spec.add_development_dependency 'coveralls', '~> 0.7.0'
+  spec.add_development_dependency 'bundler',        '~> 1.3'
+  spec.add_development_dependency 'rake'
+  spec.add_development_dependency 'rspec',          '~> 2.14.1'
+  spec.add_development_dependency 'simplecov',      '~> 0.7.1'
+  spec.add_development_dependency 'coveralls',      '~> 0.7.0'
+  spec.add_development_dependency 'benchmark-ips',  '~> 2.0.0'
+  spec.add_development_dependency 'yard',           '~> 0.8.7.4'
-  spec.required_ruby_version = '>= 1.9.2'
+  spec.required_ruby_version = '>= 1.9.3'
 end

data/lib/emoji_data.rb CHANGED

@@ -3,82 +3,192 @@ require 'emoji_data/emoji_char'
 require 'json'
 module EmojiData
+  # specify some location paths
   GEM_ROOT = File.join(File.dirname(__FILE__), '..')
-  RAW_JSON = IO.read(File.join(GEM_ROOT, 'vendor/emoji-data/emoji.json'))
-  EMOJI_MAP = JSON.parse( RAW_JSON )
-  EMOJI_CHARS = EMOJI_MAP.map { |em| EmojiChar.new(em) }
+  VENDOR_DATA = 'vendor/emoji-data/emoji.json'
-  #
-  # construct hashmap for fast precached lookups for `.find_by_unified`
-  #
-  EMOJICHAR_UNIFIED_MAP = Hash[EMOJI_CHARS.map { |u| [u.unified, u] }]
-  # merge variant encodings into map so we can look them up as well
-  EMOJI_CHARS.select(&:variant?).each do |char|
-    char.variations.each do |variant|
-      EMOJICHAR_UNIFIED_MAP.merge! Hash[variant,char]
-    end
+  # precomputed list of all possible emoji characters
+  EMOJI_CHARS = begin
+    raw_json = IO.read(File.join(GEM_ROOT, VENDOR_DATA))
+    vendordata = JSON.parse( raw_json )
+    vendordata.map { |em| EmojiChar.new(em) }
+  end
+  # precomputed hashmap for fast precached lookups in .from_unified
+  EMOJICHAR_UNIFIED_MAP = {}
+  EMOJI_CHARS.each do |ec|
+    EMOJICHAR_UNIFIED_MAP[ec.unified] = ec
+    ec.variations.each  { |variant| EMOJICHAR_UNIFIED_MAP[variant] = ec }
+  end
+  # precomputed hashmap for fast precached lookups in .from_short_name
+  EMOJICHAR_KEYWORD_MAP = {}
+  EMOJI_CHARS.each do |ec|
+    ec.short_names.each { |keyword| EMOJICHAR_KEYWORD_MAP[keyword] = ec }
   end
+  # our constants are only for usage internally
+  private_constant :GEM_ROOT, :VENDOR_DATA
+  private_constant :EMOJI_CHARS, :EMOJICHAR_UNIFIED_MAP, :EMOJICHAR_KEYWORD_MAP
+  # Returns a list of all known Emoji characters as `EmojiChar` objects.
+  #
+  # @return [Array<EmojiChar>] a list of all known `EmojiChar`.
   def self.all
     EMOJI_CHARS
   end
+  # Returns a list of all `EmojiChar` that are represented with doublebyte
+  # encoding.
+  #
+  # @return [Array<EmojiChar>] a list of all doublebyte `EmojiChar`.
   def self.all_doublebyte
     EMOJI_CHARS.select(&:doublebyte?)
   end
+  # Returns a list of all `EmojiChar` that have at least one variant encoding.
+  #
+  # @return [Array<EmojiChar>] a list of all `EmojiChar` with variant encoding.
   def self.all_with_variants
     EMOJI_CHARS.select(&:variant?)
   end
-  def self.chars(options={})
-    options = {include_variants: false}.merge(options)
+  # Returns a list of all known Emoji characters rendered as UTF-8 strings.
+  #
+  # By default, the default rendering options for this library will be used.
+  # However, if you pass an option hash with `include_variants: true` then all
+  # possible renderings of a single glyph will be included, meaning that:
+  #
+  # 1. You will have "duplicate" emojis in your list.
+  # 2. This list is now suitable for exhaustably matching against in a search.
+  #
+  # @option opts [Boolean] :include_variants whether or not to include all
+  #   possible encoding variants in the list
+  #
+  # @return [Array<String>] all Emoji characters rendered as UTF-8 strings
+  def self.chars(opts={})
+    options = {include_variants: false}.merge(opts)
-    normals = EMOJI_CHARS.map { |c| c.char({variant_encoding: false}) }
-    extras  = self.all_with_variants.map { |c| c.char({variant_encoding: true}) }
+    normals = EMOJI_CHARS.map { |c| c.render({variant_encoding: false}) }
     if options[:include_variants]
+      extras  = self.all_with_variants.map { |c| c.render({variant_encoding: true}) }
       return normals + extras
     end
     normals
   end
-  def self.codepoints(options={})
-    options = {include_variants: false}.merge(options)
+  # Returns a list of all known codepoints representing Emoji characters.
+  #
+  # @option (see .chars)
+  # @return [Array<String>] all codepoints represented as unified ID strings
+  def self.codepoints(opts={})
+    options = {include_variants: false}.merge(opts)
+    normals = EMOJI_CHARS.map(&:unified)
     if options[:include_variants]
-      return EMOJI_CHARS.map(&:unified) + self.all_with_variants.map {|c| c.variant}
+      extras = self.all_with_variants.map {|c| c.variant}
+      return normals + extras
     end
-    EMOJI_CHARS.map(&:unified)
+    normals
   end
+  # Convert a native UTF-8 string glyph to its unified codepoint ID.
+  #
+  # This is a conversion operation, not a match, so it may produce unexpected
+  # results with different types of values.
+  #
+  # @param char [String] a single rendered emoji glyph encoded as a UTF-8 string
+  # @return [String] the unified ID
+  #
+  # @example
+  #   >> EmojiData.unified_to_char("1F47E")
+  #   => "👾"
   def self.char_to_unified(char)
-    char.codepoints.to_a.map {|i| i.to_s(16).rjust(4,'0')}.join('-').upcase
+    char.codepoints.to_a.map { |i| i.to_s(16).rjust(4,'0')}.join('-').upcase
   end
-  def self.unified_to_char(cp)
-    EmojiChar::unified_to_char(cp)
+  # Convert a unified codepoint ID directly to its UTF-8 string representation.
+  #
+  # @param uid [String] the unified codepoint ID for an emoji
+  # @return [String] UTF-8 string rendering of the emoji character
+  #
+  # @example
+  #   >> EmojiData.char_to_unified("👾")
+  #   => "1F47E"
+  def self.unified_to_char(uid)
+    EmojiChar::unified_to_char(uid)
   end
-  def self.find_by_unified(cp)
-    EMOJICHAR_UNIFIED_MAP[cp.upcase]
+  # Finds a specific `EmojiChar` based on its unified codepoint ID.
+  #
+  # @param uid [String] the unified codepoint ID for an emoji
+  # @return [EmojiChar]
+  def self.from_unified(uid)
+    EMOJICHAR_UNIFIED_MAP[uid.upcase]
   end
-  FBS_REGEXP = Regexp.new("(?:#{EmojiData.chars({include_variants: true}).join("|")})")
-  def self.find_by_str(str)
+  # precompile regex pattern for fast matches in `.scan`
+  # needs to be defined after self.chars so not at top of file for now...
+  FBS_REGEXP = Regexp.new(
+    "(?:#{EmojiData.chars({include_variants: true}).join("|")})"
+  )
+  private_constant :FBS_REGEXP
+  # Scans a string for all encoded emoji characters contained within.
+  #
+  # @param str [String] the target string to search
+  # @return [Array<EmojiChar>] all emoji characters contained within the target
+  #    string, in the order they appeared.
+  #
+  # @example
+  #   >> EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
+  #   => [#<EmojiData::EmojiChar... @name="ROCKET", @unified="1F680", ...>,
+  #   #<EmojiData::EmojiChar... @name="ALIEN MONSTER", @unified="1F47E", ...>]
+  def self.scan(str)
     matches = str.scan(FBS_REGEXP)
-    matches.map { |m| EmojiData.find_by_unified(EmojiData.char_to_unified(m)) }
+    matches.map { |m| EmojiData.from_unified(EmojiData.char_to_unified(m)) }
   end
+  # Finds any `EmojiChar` that contains given string in its official name.
+  #
+  # @param name [String]
+  # @return [Array<EmojiChar>]
   def self.find_by_name(name)
     self.find_by_value(:name, name.upcase)
   end
+  # Find all `EmojiChar` that match string in any of their associated short
+  # name keywords.
+  #
+  # @param short_name [String]
+  # @return [Array<EmojiChar>]
   def self.find_by_short_name(short_name)
     self.find_by_value(:short_name, short_name.downcase)
   end
+  # Finds a specific `EmojiChar` based on the unified codepoint ID.
+  #
+  # Must be exact match.
+  #
+  # @param short_name [String]
+  # @return [EmojiChar]
+  def self.from_short_name(short_name)
+    EMOJICHAR_KEYWORD_MAP[short_name.downcase]
+  end
+  # alias old method names for legacy apps
+  class << self
+    alias_method :find_by_unified, :from_unified
+    alias_method :find_by_str, :scan
+  end
   protected
   def self.find_by_value(field,value)
     self.all.select { |char| char.send(field).include? value }
   end

data/lib/emoji_data/emoji_char.rb CHANGED

@@ -1,13 +1,42 @@
 module EmojiData
+  # EmojiChar represents a single Emoji character and its associated metadata.
+  #
+  # @!attribute name
+  #   @return [String] The standardized name used in the Unicode specification
+  #     to represent this emoji character.
+  #
+  # @!attribute unified
+  #   @return [String] The primary unified codepoint ID for the emoji character.
+  #
+  # @!attribute variations
+  #   @return [Array<String>] A list of all variant codepoints that may also
+  #     represent this emoji.
+  #
+  # @!attribute short_name
+  #   @return [String] The canonical "short name" or keyword used in many
+  #     systems to refer to this emoji. Often surrounded by `:colons:` in
+  #     systems like GitHub & Campfire.
+  #
+  # @!attribute short_names
+  #   @return [Array<String>] A full list of possible keywords for the emoji.
+  #
+  # @!attribute text
+  #   @return [String] An alternate textual representation of the emoji, for
+  #   example a smiley face emoji may be represented with an ASCII alternative.
+  #   Most emoji do not have a text alternative. This is typically used when
+  #   building an automatic translation from typed emoticons.
+  #
   class EmojiChar
     def initialize(emoji_hash)
       # work around inconsistency in emoji.json for now by just setting a blank
       # array for instance value, and let it get overriden in main
       # deserialization loop if variable is present.
       @variations = []
-      # http://stackoverflow.com/questions/1615190/declaring-instance-variables-iterating-over-a-hash
+      # trick for declaring instance variables while iterating over a hash
+      # http://stackoverflow.com/questions/1615190/
       emoji_hash.each do |k,v|
         instance_variable_set("@#{k}",v)
         eigenclass = class<<self; self; end
@@ -15,51 +44,78 @@ module EmojiData
       end
     end
-    # Returns a version of the character for rendering to screen.
+    # Renders an `EmojiChar` to its string glyph representation, suitable for
+    # printing to screen.
+    #
+    # @option opts [Boolean] :variant_encoding specify whether the variant
+    #   encoding selector should be used to hint to rendering devices that
+    #   "graphic" representation should be used. By default, we use this for all
+    #   Emoji characters that contain a possible variant.
     #
-    # By default this will now use the variant encoding if it exists.
-    def char(options = {})
-      options = {variant_encoding: true}.merge(options)
+    # @return [String] the emoji character rendered to a UTF-8 string
+    def render(opts = {})
+      options = {variant_encoding: true}.merge(opts)
       #decide whether to use the normal unified ID or the variant for encoding to str
       target = (self.variant? && options[:variant_encoding]) ? self.variant : @unified
       EmojiChar::unified_to_char(target)
     end
-    # Return ALL known possible string encodings of the emoji char.
+    alias_method :to_s, :render
+    alias_method :char, :render
+    # Returns a list of all possible UTF-8 string renderings of an `EmojiChar`.
     #
-    # Mostly useful for doing find operations when you need them all.
+    # E.g., normal, with variant selectors, etc. This is useful if you want to
+    # have all possible values to match against when searching for the emoji in
+    # a string representation.
+    #
+    # @return [Array<String>] all possible UTF-8 string renderings
     def chars
-      results = [self.char({variant_encoding: false})]
+      results = [self.render({variant_encoding: false})]
       @variations.each do |variation|
         results << EmojiChar::unified_to_char(variation)
       end
       @chars ||= results
     end
-    # Public: Is the character represented by a doublebyte unicode codepoint in unicode?
+    # Is the `EmojiChar` represented by a doublebyte codepoint in Unicode?
+    #
+    # @return [Boolean]
     def doublebyte?
-      @unified.match(/-/)
+      @unified.include? "-"
     end
-    # does the emojichar have an alternate variant encoding?
+    # Does the `EmojiChar` have an alternate Unicode variant encoding?
+    #
+    # @return [Boolean]
     def variant?
       @variations.length > 0
     end
-    # return whatever is the most likely variant ID for the emojichar
-    # for now, there can only be one, so just return first.
-    # (in the future, there may be multiple variants, who knows!)
+    # Returns the most likely variant-encoding codepoint ID for an `EmojiChar`.
+    #
+    # For now we only know of one possible variant encoding for certain
+    # characters, but there could be others in the future.
+    #
+    # This is typically used to force Emoji rendering for characters that could
+    # be represented in standard font glyphs on certain operating systems.
+    #
+    # The resulting encoded string will be two codepoints, or three codepoints
+    # for doublebyte Emoji characters.
+    #
+    # @return [String, nil]
+    #   The most likely variant-encoding codepoint ID.
+    #   If there is no variant-encoding for a character, returns nil.
     def variant
       @variations.first
     end
-    alias_method :to_s, :char
     protected
     def self.unified_to_char(cps)
       cps.split('-').map { |i| i.hex }.pack("U*")
     end
   end
 end

data/lib/emoji_data/version.rb CHANGED

@@ -1,3 +1,4 @@
 module EmojiData
-  VERSION = "0.1.0"
+  # Current version of the module, for bundling to rubygems.org
+  VERSION = "0.2.0.rc1"
 end

data/scripts/benchmark.rb ADDED

@@ -0,0 +1,70 @@
+# encoding: UTF-8
+require './lib/emoji_data'
+require 'benchmark/ips'
+suites = []
+s0 = "I liek to eat cake oh so very much cake eating is nice!! #cake #food"
+s1 = "🚀"
+s2 = "flying on my 🚀 to visit the 👾 people."
+s3 = "first a \u{0023}\u{FE0F}\u{20E3} then a 🚀"
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiData.scan(s0)") { EmojiData.scan(s0) }
+  x.report("EmojiData.scan(s1)") { EmojiData.scan(s1) }
+  x.report("EmojiData.scan(s2)") { EmojiData.scan(s2) }
+  x.report("EmojiData.scan(s3)") { EmojiData.scan(s3) }
+end
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiData.all")                       { EmojiData.all() }
+  x.report("EmojiData.all_doublebyte")            { EmojiData.all_doublebyte() }
+  x.report("EmojiData.all_with_variants")         { EmojiData.all_with_variants() }
+  x.report("EmojiData.from_unified")              { EmojiData.from_unified("1F680") }
+  x.report("EmojiData.chars")                     { EmojiData.chars() }
+  x.report("EmojiData.codepoints")                { EmojiData.codepoints() }
+  x.report("EmojiData.find_by_name - many")       { EmojiData.find_by_name("tree") }
+  x.report("EmojiData.find_by_name - none")       { EmojiData.find_by_name("zzzz") }
+  x.report("EmojiData.find_by_short_name - many") { EmojiData.find_by_short_name("MOON") }
+  x.report("EmojiData.find_by_short_name - none") { EmojiData.find_by_short_name("zzzz") }
+  x.report("EmojiData.char_to_unified - single")  { EmojiData.char_to_unified("🚀") }
+  x.report("EmojiData.char_to_unified - double")  { EmojiData.char_to_unified("\u{2601}\u{FE0F}") }
+  x.report("EmojiData.unified_to_char - single")  { EmojiData.unified_to_char("1F47E") }
+  x.report("EmojiData.unified_to_char - double")  { EmojiData.unified_to_char("2764-fe0f") }
+  x.report("EmojiData.unified_to_char - triple")  { EmojiData.unified_to_char("0030-FE0F-20E3") }
+end
+invader   = EmojiData::EmojiChar.new({unified: '1F47E'})
+usflag    = EmojiData::EmojiChar.new({unified: '1F1FA-1F1F8'})
+hourglass = EmojiData::EmojiChar.new({unified: '231B', variations: ['231B-FE0F']})
+cloud     = EmojiData::EmojiChar.new({unified: '2601', variations: ['2601-FE0F']})
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiChar.render - single")  { invader.render() }
+  x.report("EmojiChar.render - double")  { usflag.render() }
+  x.report("EmojiChar.render - variant") { cloud.render({variant_encoding: true}) }
+  x.report("EmojiChar.chars")            { cloud.chars() }
+  x.report("EmojiChar.doublebyte?")      { invader.doublebyte?() }
+  x.report("EmojiChar.variant?")         { invader.variant?() }
+  x.report("EmojiChar.variant")          { invader.variant() }
+end
+def micros(hz)
+  1_000_000 / hz
+end
+suites.each do |report|
+  results = report.entries.sort { |a,b| b.ips <=> a.ips }
+  print "\n"
+  results.each do |r|
+    printf "%-45s %10u   %.2f µs/op\n", r.label, r.iterations, micros(r.ips)
+  end
+end

data/spec/emoji_char_spec.rb CHANGED

@@ -38,22 +38,28 @@ describe EmojiChar do
       end
     end
-    describe "#char" do
+    describe "#render" do
       it "should render as happy shiny unicode" do
-        @invader.char.should eq("👾")
+        @invader.render.should eq("👾")
       end
       it "should render as happy shiny unicode for doublebyte chars too" do
-        @usflag.char.should eq("🇺🇸")
+        @usflag.render.should eq("🇺🇸")
       end
       it "should have a flag to output forced emoji variant char encoding if requested" do
-        @cloud.char(    {variant_encoding: false}).should eq("\u{2601}")
-        @cloud.char(    {variant_encoding:  true}).should eq("\u{2601}\u{FE0F}")
-        @invader.char(  {variant_encoding: false}).should eq("\u{1F47E}")
-        @invader.char(  {variant_encoding:  true}).should eq("\u{1F47E}")
+        @cloud.render(    {variant_encoding: false}).should eq("\u{2601}")
+        @cloud.render(    {variant_encoding:  true}).should eq("\u{2601}\u{FE0F}")
+        @invader.render(  {variant_encoding: false}).should eq("\u{1F47E}")
+        @invader.render(  {variant_encoding:  true}).should eq("\u{1F47E}")
       end
       it "should default to variant encoding for chars with a variant present" do
-        @cloud.char.should eq("\u{2601}\u{FE0F}")
-        @hourglass.char.should eq("\u{231B}\u{FE0F}")
+        @cloud.render.should eq("\u{2601}\u{FE0F}")
+        @hourglass.render.should eq("\u{231B}\u{FE0F}")
+      end
+    end
+    describe "#char - DEPRECATED" do
+      it "should maintain compatibility with old method name for .render" do
+        @cloud.char.should eq(@cloud.render)
       end
     end

data/spec/emoji_data_spec.rb CHANGED

@@ -56,12 +56,12 @@ describe EmojiData do
     end
   end
-  describe ".find_by_str" do
+  describe ".scan" do
     before(:all) do
-      @exact_results   = EmojiData.find_by_str("🚀")
-      @multi_results   = EmojiData.find_by_str("flying on my 🚀 to visit the 👾 people.")
-      @variant_results = EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}")
-      @variant_multi   = EmojiData.find_by_str("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
+      @exact_results   = EmojiData.scan("🚀")
+      @multi_results   = EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
+      @variant_results = EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}")
+      @variant_multi   = EmojiData.scan("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
     end
     it "should find the proper EmojiChar object from a single string char" do
       @exact_results.should be_kind_of(Array)
@@ -89,22 +89,34 @@ describe EmojiData do
     end
   end
-  describe ".find_by_unified" do
+  describe ".find_by_str - DEPRECATED" do
+    it "should maintain compatibility with old method name for .scan" do
+      EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}").should eq(EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}"))
+    end
+  end
+  describe ".from_unified" do
     it "should find the proper EmojiChar object" do
-      results = EmojiData.find_by_unified('1f680')
+      results = EmojiData.from_unified('1f680')
       results.should be_kind_of(EmojiChar)
       results.name.should eq('ROCKET')
     end
     it "should normalise capitalization for hex values" do
-      EmojiData.find_by_unified('1f680').should_not be_nil
+      EmojiData.from_unified('1f680').should_not be_nil
     end
     it "should find via variant encoding ID format as well" do
-      results = EmojiData.find_by_unified('2764-fe0f')
+      results = EmojiData.from_unified('2764-fe0f')
       results.should_not be_nil
       results.name.should eq('HEAVY BLACK HEART')
     end
   end
+  describe ".find_by_unified - DEPRECATED" do
+    it "should maintain compatibility with old method name for .from_unified" do
+      EmojiData.find_by_unified('1f680').should eq(EmojiData.from_unified('1f680'))
+    end
+  end
   describe ".find_by_name" do
     it "returns an array of results, upcasing input if needed" do
       EmojiData.find_by_name('tree').should be_kind_of(Array)
@@ -129,6 +141,25 @@ describe EmojiData do
     end
   end
+  describe ".from_short_name" do
+    it "returns exact matches on a short name" do
+      results = EmojiData.from_short_name('scream')
+      results.should be_kind_of(EmojiChar)
+      results.name.should eq('FACE SCREAMING IN FEAR')
+    end
+    it "handles lowercasing input if required" do
+      EmojiData.from_short_name('SCREAM').should eq( EmojiData.from_short_name('scream') )
+    end
+    it "works on secondary keywords" do
+      primary = EmojiData.from_short_name('hankey')
+      EmojiData.from_short_name('poop').should eq(primary)
+      EmojiData.from_short_name('shit').should eq(primary)
+    end
+    it "returns nil if nothing matches" do
+      EmojiData.from_short_name('taco').should be_nil
+    end
+  end
   describe ".char_to_unified" do
     it "converts normal emoji to unified codepoint" do
       EmojiData.char_to_unified("👾").should eq('1F47E')

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: emoji_data
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0.rc1
 platform: ruby
 authors:
 - Matthew Rothenberg
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-03 00:00:00.000000000 Z
+date: 2014-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -42,16 +42,16 @@ dependencies:
   name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 2.14.1
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 2.14.1
 - !ruby/object:Gem::Dependency
   name: simplecov
   requirement: !ruby/object:Gem::Requirement
@@ -80,6 +80,34 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: 0.7.0
+- !ruby/object:Gem::Dependency
+  name: benchmark-ips
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+- !ruby/object:Gem::Dependency
+  name: yard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7.4
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7.4
 description: Provides classes and helpers for dealing with emoji character data as
   unicode.  Wraps a library of all known emoji characters and provides convenience
   methods.
@@ -90,17 +118,21 @@ extensions: []
 extra_rdoc_files: []
 files:
 - ".coveralls.yml"
+- ".editorconfig"
+- ".gitattributes"
 - ".gitignore"
 - ".travis.yml"
+- ".yardopts"
 - CHANGELOG.md
 - Gemfile
-- LICENSE.txt
+- LICENSE
 - README.md
 - Rakefile
 - emoji_data.gemspec
 - lib/emoji_data.rb
 - lib/emoji_data/emoji_char.rb
 - lib/emoji_data/version.rb
+- scripts/benchmark.rb
 - spec/emoji_char_spec.rb
 - spec/emoji_data_spec.rb
 - spec/spec_helper.rb
@@ -118,12 +150,12 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 1.9.2
+      version: 1.9.3
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">="
+  - - ">"
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 1.3.1
 requirements: []
 rubyforge_project:
 rubygems_version: 2.2.2
@@ -134,3 +166,4 @@ test_files:
 - spec/emoji_char_spec.rb
 - spec/emoji_data_spec.rb
 - spec/spec_helper.rb
+has_rdoc: