RubyGems - emoji_data - Versions diffs - 0.1.0 → 0.2.0.rc1 - Mend

emoji_data 0.1.0 → 0.2.0.rc1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/.editorconfig +18 -0
data/.gitattributes +2 -0
data/.travis.yml +5 -0
data/.yardopts +2 -0
data/CHANGELOG.md +29 -4
data/{LICENSE.txt → LICENSE} +0 -0
data/README.md +58 -30
data/emoji_data.gemspec +8 -6
data/lib/emoji_data.rb +138 -28
data/lib/emoji_data/emoji_char.rb +72 -16
data/lib/emoji_data/version.rb +2 -1
data/scripts/benchmark.rb +70 -0
data/spec/emoji_char_spec.rb +15 -9
data/spec/emoji_data_spec.rb +40 -9
metadata +43 -10

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: e42e74736faf460d0722d647c34ab68dad7130e8
-  data.tar.gz: d1c42bfa36cda5c7addc7079bf4daef60bf17a6f
+  metadata.gz: 7385039bbd2cb55d93480a7d389c88fc8f47bbfa
+  data.tar.gz: 481273df89feb32b0c6d7178711bb6485e110538
 SHA512:
-  metadata.gz: 20929861bc5b903569576f8cdca11d046ce970def2e1221d1eb31ad9a7fac7ad2e6f9b64485c17a49e87e12e3aaccecfb006bebd8f25662268a335435420121a
-  data.tar.gz: 501d44516538071d5df7b2c37c32c2e933661c68699a3d65d12872a88ad5411f11e26f419a292f51230fa2a155893cddb0748c43ee7218db492dd38f860fbea5
+  metadata.gz: 6be855ddc07996303eef279b6c840a91a27da97774635e71c500f184b14a0fe4e30977dc0cadb1b48cb7a9f4ff465bae6613fd7b93d1101ee785245bb69097ea
+  data.tar.gz: 9a5c308587f581f2a500ac7686664ebe9ab86c103022c372fccf5d3c5a51b359992b9fec612ba6323e805d633d84470ce825ac1f7a945da3b94348a8ad4f087b

data/.editorconfig ADDED

@@ -0,0 +1,18 @@
+# EditorConfig helps developers define and maintain consistent
+# coding styles between different editors and IDEs
+# editorconfig.org
+root = true
+[*]
+indent_style = space
+indent_size = 2
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+[*.md]
+trim_trailing_whitespace = false

data/.gitattributes ADDED

	@@ -0,0 +1,2 @@
1	+ * text=auto
2	+

data/.travis.yml CHANGED

@@ -5,7 +5,12 @@ rvm:
  - 1.9.3
  - 2.0.0
  - 2.1.0
+ - 2.1.1
+ - ruby-head
+ - jruby
 matrix:
   allow_failures:
     - rvm: 1.8.7
+    - rvm: 1.9.2
+    - rvm: ruby-head

data/.yardopts ADDED

	@@ -0,0 +1,2 @@
1	+ --markup markdown
2	+

data/CHANGELOG.md CHANGED

@@ -1,11 +1,33 @@
 # Changelog
+## 0.2.0 (TBD)
+ * Rename a number of methods to be clearer and more consistent with that they
+   actually do:
+     - `EmojiChar.char()` → `EmojiChar.render()`
+     - `EmojiData.find_by_unified()` → `EmojiData.from_unified()`
+     - `EmojiData.find_by_str()` → `EmojiData.scan()`
+   Don't worry, the old names are still aliased in so you don't have to change
+   anything in your existing code.  This change is make things clearer for
+   people new to the library.
+ * Add new `.from_short_name()` library method for fast keyword lookups.
+ * DEVELOPERS: Internal code cleanup and better comments.
+ * DEVELOPERS: Add benchmark suite for comparing method implementation time
+   across versions of this library.
 ## 0.1.0 (3 May 2014)
  * Add support for Unicode variant encodings, used by MacOSX 10.9 / iOS 7.
    - For more info: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
-   - By default, `EmojiChar.to_s()` and `.char()` will now use the variant encoding.
- * With adding support for variants, the speed of `find_by_str` regressed by approximately 20% (because there are more codepoints to match against). To counter this, we switched to a Regex based scan than improves performance of the method by over 250x(!).  A complete sorted search against 1000 strings now takes ~2ms where before it would take around a half second.
+   - By default, `EmojiChar.to_s()` and `.char()` will now use the variant
+     encoding.
+ * With adding support for variants, the speed of `find_by_str` regressed by
+   approximately 20% (because there are more codepoints to match against). To
+   counter this, we switched to a Regex based scan than improves performance of
+   the method by over 250x(!).  A complete sorted search against 1000 strings
+   now takes ~2ms where before it would take around a half second.
  * Import latest version of iamcal/emoji-data.
  * 100% test coverage. :sunglasses:
@@ -13,9 +35,12 @@
  * On initialization, create hashmaps to cache lookups for `.find_by_unified()`.
-   In a quick benchmark in MRI 2.1.1, this reduces the time needed for one million lookups from `13.5s` to `0.3s`!
+   In a quick benchmark in MRI 2.1.1, this reduces the time needed for one
+   million lookups from `13.5s` to `0.3s`!
-   This is only for lookup by unified ID for now, since the other `find_by_*()` methods are actually searches that can return multiple values.  I'll look at nested hashmaps for those if there is a pressing performance need later.
+   This is only for lookup by unified ID for now, since the other `find_by_*()`
+   methods are actually searches that can return multiple values.  I'll look at
+   nested hashmaps for those if there is a pressing performance need later.
 ## 0.0.2 (3 December 2013)

data/{LICENSE.txt → LICENSE} RENAMED

File without changes

data/README.md CHANGED

@@ -3,21 +3,29 @@
 [![Gem Version](http://img.shields.io/gem/v/emoji_data.svg?style=flat)](https://rubygems.org/gems/emoji_data)
 [![Build Status](http://img.shields.io/travis/mroth/emoji_data.rb.svg?style=flat)](https://travis-ci.org/mroth/emoji_data.rb)
 [![Dependency Status](http://img.shields.io/gemnasium/mroth/emoji_data.rb.svg?style=flat)](https://gemnasium.com/mroth/emoji_data.rb)
-[![CodeClimate Status](http://img.shields.io/codeclimate/github/mroth/emoji_data.rb.svg?style=flat)](https://codeclimate.com/github/mroth/emoji_data.rb)
 [![Coverage Status](http://img.shields.io/coveralls/mroth/emoji_data.rb.svg?style=flat)](https://coveralls.io/r/mroth/emoji_data.rb)
+Ruby library providing low level operations for dealing with Emoji
+glyphs in the Unicode standard. :cool:
-Provides classes and helpers for dealing with emoji character data as unicode.  Wraps a library of all known emoji characters and provides convenience methods.
+EmojiData is like a swiss-army knife for dealing with Emoji encoding issues. If
+all you need to do is translate `:poop:` into :poop:, then there are plenty of
+other libs out there that will probably do what you want.  But once you are
+dealing with Emoji as a fundamental part of your application, and you start to
+realize the nightmare of [doublebyte encoding][doublebyte] or
+[variants][variant], then this library may be your new best friend.
+:raised_hands:
-Note, this is mostly useful for low-level operations.  If you can avoid having to deal with unicode character data extensively and just want to encode/decode stuff, [rumoji](https://github.com/mwunsch/rumoji) might be a better bet for you.  If however, you are doing anything complicated involving emoji encoding/decoding, or you are just obsessed with understanding the details, this library is your new best friend.
+EmojiData is used in production by [Emojitracker.com][emojitracker] to parse
+well over 100M+ emoji glyphs daily. :dizzy:
-This library currently uses `iamcal/emoji-data` as it's dataset, and thus considers it to be the "source of truth" regarding certain things, such as how to represent doublebyte unified codepoint IDs as strings (seperated by a dash).
-This is basically a helper library for my [emojitrack](https://github.com/mroth/emojitrack) and [emojistatic](https://github.com/mroth/emojistatic) projects, but may be useful for other people.
+[doublebyte]: http://www.quora.com/Why-does-using-emoji-reduce-my-SMS-character-limit-to-70
+[variant]: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
+[emojitracker]: http://www.emojitracker.com
 ## Installation
-Add this line to your application's Gemfile:
+Add this line to your application's `Gemfile`:
     gem 'emoji_data'
@@ -29,42 +37,62 @@ Or install it yourself as:
     $ gem install emoji_data
-Currently requires `RUBY_VERSION >= 1.9.2`.
-## Library Usage
+Currently requires `RUBY_VERSION >= 1.9.3`.
-Pretty straightforward, read the source.  But here are some things you might care about:
+## Usage
-### EmojiData
+### Documentation
+Full API documentation is available via YARD or here:
+http://rubydoc.info/github/mroth/emoji_data.rb/master/frames
-  The `EmojiData` module provides some convenience methods for dealing with the library of known emoji characters.  Check out the source to see what's up.
+### Examples
+Here are some examples of the type of stuff you can do:
-Some notable methods to call out:
+```irb
+>> require 'emoji_data'
+=> true
- - `EmojiData.find_by_unified(id)` gives you a quick way to grab a specific EmojiChar.
+>> EmojiData.from_unified('1f680')
+=> #<EmojiData::EmojiChar:0x007f8fdba33b40 @variations=[], @name="ROCKET",
+@unified="1F680", @docomo=nil, @au="E5C8", @softbank="E10D", @google="FE7ED",
+@image="1f680.png", @sheet_x=25, @sheet_y=4, @short_name="rocket",
+@short_names=["rocket"], @text=nil, @apple_img=true, @hangouts_img=true,
+@twitter_img=true>
-		>> EmojiData.find_by_unified('1f680')
-	 	=> #<EmojiData::EmojiChar:0x007fd455ab2ff8 @name="ROCKET", @unified="1F680", @docomo="", @au="E5C8", @softbank="E10D", @google="FE7ED", @image="1f680.png", @sheet_x=21, @sheet_y=28, @short_name="rocket", @short_names=["rocket"], @text=nil>
+>> EmojiData.all.count
+=> 845
- - `EmojiData.find_by_name(name)` and `.find_by_short_name(name)` do pretty much what you'd expect:
+>> EmojiData.all_with_variants.count
+=> 107
-		>> EmojiData.find_by_name('thumb')
-		=> [#<EmojiData::EmojiChar:0x007f9db214a558 @name="THUMBS UP SIGN", @unified="1F44D", @docomo="E727", @au="E4F9", @softbank="E00E", @google="FEB97", @image="1f44d.png", @sheet_x=10, @sheet_y=17, @short_name="+1", @short_names=["+1", "thumbsup"], @text=nil>, #<EmojiData::EmojiChar:0x007f9db2149720 @name="THUMBS DOWN SIGN", @unified="1F44E", @docomo="E700", @au="EAD5", @softbank="E421", @google="FEBA0", @image="1f44e.png", @sheet_x=10, @sheet_y=18, @short_name="-1", @short_names=["-1", "thumbsdown"], @text=nil>]
+>> EmojiData.find_by_short_name("moon").count
+=> 13
- - `EmojiData.char_to_unified(char)` takes a string containing a unified unicode representation of an emoji character and gives you the unicode ID.
+>> EmojiData.all.select(&:doublebyte?).map(&:short_name)
+=> ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven",
+"eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
-		>> EmojiData.char_to_unified('🚀')
-		=> "1F680"
+>> EmojiData.find_by_name("tree").map { |c| [c.unified, c.name, c.render] }
+=> [["1F332", "EVERGREEN TREE", "🌲"], ["1F333", "DECIDUOUS TREE", "🌳"],
+["1F334", "PALM TREE", "🌴"], ["1F384", "CHRISTMAS TREE", "🎄"], ["1F38B",
+"TANABATA TREE", "🎋"]]
- - `EmojiData.all` will return an array of all known EmojiChars, so you can map or do whatever funky Enumerable stuff you want to do across the entire character set.
+>> EmojiData.scan("I ♥ when marketers talk about the ☁. #blessed").each do |ec|
+?>   puts "Found some #{ec.short_name}!"
+>> end
+Found some hearts!
+Found some cloud!
+=> [...]
+```
- 		#gimmie the shortname of all doublebyte chars
- 		>> EmojiData.all.select(&:doublebyte?).map(&:short_name)
-		=> ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
+## Contributing
+Please be sure to run `rake spec` and help keep test coverage at :100:.
-### EmojiData::EmojiChar
+There is a full benchmark suite available via `scripts/benchmark.rb`.  Please
+run before and after your changes to ensure you have not caused a performance
+regression.
-  `EmojiData::EmojiChar` is a class representing a single emoji character.  All the variables from the `iamcal/emoji-data` dataset have dynamically generated getter methods.
+## License
-There are some additional convenience methods, such as `#doublebyte?` etc. Most important addition is the `#char` method which will output a properly unicode encoded string containing the character.
+[The MIT License (MIT)](LICENSE)

data/emoji_data.gemspec CHANGED

@@ -18,11 +18,13 @@ Gem::Specification.new do |spec|
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
   spec.require_paths = ["lib"]
-  spec.add_development_dependency "bundler", "~> 1.3"
-  spec.add_development_dependency "rake"
-  spec.add_development_dependency "rspec"
-  spec.add_development_dependency 'simplecov', '~> 0.7.1'
-  spec.add_development_dependency 'coveralls', '~> 0.7.0'
+  spec.add_development_dependency 'bundler',        '~> 1.3'
+  spec.add_development_dependency 'rake'
+  spec.add_development_dependency 'rspec',          '~> 2.14.1'
+  spec.add_development_dependency 'simplecov',      '~> 0.7.1'
+  spec.add_development_dependency 'coveralls',      '~> 0.7.0'
+  spec.add_development_dependency 'benchmark-ips',  '~> 2.0.0'
+  spec.add_development_dependency 'yard',           '~> 0.8.7.4'
-  spec.required_ruby_version = '>= 1.9.2'
+  spec.required_ruby_version = '>= 1.9.3'
 end

data/lib/emoji_data.rb CHANGED

@@ -3,82 +3,192 @@ require 'emoji_data/emoji_char'
 require 'json'
 module EmojiData
+  # specify some location paths
   GEM_ROOT = File.join(File.dirname(__FILE__), '..')
-  RAW_JSON = IO.read(File.join(GEM_ROOT, 'vendor/emoji-data/emoji.json'))
-  EMOJI_MAP = JSON.parse( RAW_JSON )
-  EMOJI_CHARS = EMOJI_MAP.map { |em| EmojiChar.new(em) }
+  VENDOR_DATA = 'vendor/emoji-data/emoji.json'
-  #
-  # construct hashmap for fast precached lookups for `.find_by_unified`
-  #
-  EMOJICHAR_UNIFIED_MAP = Hash[EMOJI_CHARS.map { |u| [u.unified, u] }]
-  # merge variant encodings into map so we can look them up as well
-  EMOJI_CHARS.select(&:variant?).each do |char|
-    char.variations.each do |variant|
-      EMOJICHAR_UNIFIED_MAP.merge! Hash[variant,char]
-    end
+  # precomputed list of all possible emoji characters
+  EMOJI_CHARS = begin
+    raw_json = IO.read(File.join(GEM_ROOT, VENDOR_DATA))
+    vendordata = JSON.parse( raw_json )
+    vendordata.map { |em| EmojiChar.new(em) }
+  end
+  # precomputed hashmap for fast precached lookups in .from_unified
+  EMOJICHAR_UNIFIED_MAP = {}
+  EMOJI_CHARS.each do |ec|
+    EMOJICHAR_UNIFIED_MAP[ec.unified] = ec
+    ec.variations.each  { |variant| EMOJICHAR_UNIFIED_MAP[variant] = ec }
+  end
+  # precomputed hashmap for fast precached lookups in .from_short_name
+  EMOJICHAR_KEYWORD_MAP = {}
+  EMOJI_CHARS.each do |ec|
+    ec.short_names.each { |keyword| EMOJICHAR_KEYWORD_MAP[keyword] = ec }
   end
+  # our constants are only for usage internally
+  private_constant :GEM_ROOT, :VENDOR_DATA
+  private_constant :EMOJI_CHARS, :EMOJICHAR_UNIFIED_MAP, :EMOJICHAR_KEYWORD_MAP
+  # Returns a list of all known Emoji characters as `EmojiChar` objects.
+  #
+  # @return [Array<EmojiChar>] a list of all known `EmojiChar`.
   def self.all
     EMOJI_CHARS
   end
+  # Returns a list of all `EmojiChar` that are represented with doublebyte
+  # encoding.
+  #
+  # @return [Array<EmojiChar>] a list of all doublebyte `EmojiChar`.
   def self.all_doublebyte
     EMOJI_CHARS.select(&:doublebyte?)
   end
+  # Returns a list of all `EmojiChar` that have at least one variant encoding.
+  #
+  # @return [Array<EmojiChar>] a list of all `EmojiChar` with variant encoding.
   def self.all_with_variants
     EMOJI_CHARS.select(&:variant?)
   end
-  def self.chars(options={})
-    options = {include_variants: false}.merge(options)
+  # Returns a list of all known Emoji characters rendered as UTF-8 strings.
+  #
+  # By default, the default rendering options for this library will be used.
+  # However, if you pass an option hash with `include_variants: true` then all
+  # possible renderings of a single glyph will be included, meaning that:
+  #
+  # 1. You will have "duplicate" emojis in your list.
+  # 2. This list is now suitable for exhaustably matching against in a search.
+  #
+  # @option opts [Boolean] :include_variants whether or not to include all
+  #   possible encoding variants in the list
+  #
+  # @return [Array<String>] all Emoji characters rendered as UTF-8 strings
+  def self.chars(opts={})
+    options = {include_variants: false}.merge(opts)
-    normals = EMOJI_CHARS.map { |c| c.char({variant_encoding: false}) }
-    extras  = self.all_with_variants.map { |c| c.char({variant_encoding: true}) }
+    normals = EMOJI_CHARS.map { |c| c.render({variant_encoding: false}) }
     if options[:include_variants]
+      extras  = self.all_with_variants.map { |c| c.render({variant_encoding: true}) }
       return normals + extras
     end
     normals
   end
-  def self.codepoints(options={})
-    options = {include_variants: false}.merge(options)
+  # Returns a list of all known codepoints representing Emoji characters.
+  #
+  # @option (see .chars)
+  # @return [Array<String>] all codepoints represented as unified ID strings
+  def self.codepoints(opts={})
+    options = {include_variants: false}.merge(opts)
+    normals = EMOJI_CHARS.map(&:unified)
     if options[:include_variants]
-      return EMOJI_CHARS.map(&:unified) + self.all_with_variants.map {|c| c.variant}
+      extras = self.all_with_variants.map {|c| c.variant}
+      return normals + extras
     end
-    EMOJI_CHARS.map(&:unified)
+    normals
   end
+  # Convert a native UTF-8 string glyph to its unified codepoint ID.
+  #
+  # This is a conversion operation, not a match, so it may produce unexpected
+  # results with different types of values.
+  #
+  # @param char [String] a single rendered emoji glyph encoded as a UTF-8 string
+  # @return [String] the unified ID
+  #
+  # @example
+  #   >> EmojiData.unified_to_char("1F47E")
+  #   => "👾"
   def self.char_to_unified(char)
-    char.codepoints.to_a.map {|i| i.to_s(16).rjust(4,'0')}.join('-').upcase
+    char.codepoints.to_a.map { |i| i.to_s(16).rjust(4,'0')}.join('-').upcase
   end
-  def self.unified_to_char(cp)
-    EmojiChar::unified_to_char(cp)
+  # Convert a unified codepoint ID directly to its UTF-8 string representation.
+  #
+  # @param uid [String] the unified codepoint ID for an emoji
+  # @return [String] UTF-8 string rendering of the emoji character
+  #
+  # @example
+  #   >> EmojiData.char_to_unified("👾")
+  #   => "1F47E"
+  def self.unified_to_char(uid)
+    EmojiChar::unified_to_char(uid)
   end
-  def self.find_by_unified(cp)
-    EMOJICHAR_UNIFIED_MAP[cp.upcase]
+  # Finds a specific `EmojiChar` based on its unified codepoint ID.
+  #
+  # @param uid [String] the unified codepoint ID for an emoji
+  # @return [EmojiChar]
+  def self.from_unified(uid)
+    EMOJICHAR_UNIFIED_MAP[uid.upcase]
   end
-  FBS_REGEXP = Regexp.new("(?:#{EmojiData.chars({include_variants: true}).join("|")})")
-  def self.find_by_str(str)
+  # precompile regex pattern for fast matches in `.scan`
+  # needs to be defined after self.chars so not at top of file for now...
+  FBS_REGEXP = Regexp.new(
+    "(?:#{EmojiData.chars({include_variants: true}).join("|")})"
+  )
+  private_constant :FBS_REGEXP
+  # Scans a string for all encoded emoji characters contained within.
+  #
+  # @param str [String] the target string to search
+  # @return [Array<EmojiChar>] all emoji characters contained within the target
+  #    string, in the order they appeared.
+  #
+  # @example
+  #   >> EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
+  #   => [#<EmojiData::EmojiChar... @name="ROCKET", @unified="1F680", ...>,
+  #   #<EmojiData::EmojiChar... @name="ALIEN MONSTER", @unified="1F47E", ...>]
+  def self.scan(str)
     matches = str.scan(FBS_REGEXP)
-    matches.map { |m| EmojiData.find_by_unified(EmojiData.char_to_unified(m)) }
+    matches.map { |m| EmojiData.from_unified(EmojiData.char_to_unified(m)) }
   end
+  # Finds any `EmojiChar` that contains given string in its official name.
+  #
+  # @param name [String]
+  # @return [Array<EmojiChar>]
   def self.find_by_name(name)
     self.find_by_value(:name, name.upcase)
   end
+  # Find all `EmojiChar` that match string in any of their associated short
+  # name keywords.
+  #
+  # @param short_name [String]
+  # @return [Array<EmojiChar>]
   def self.find_by_short_name(short_name)
     self.find_by_value(:short_name, short_name.downcase)
   end
+  # Finds a specific `EmojiChar` based on the unified codepoint ID.
+  #
+  # Must be exact match.
+  #
+  # @param short_name [String]
+  # @return [EmojiChar]
+  def self.from_short_name(short_name)
+    EMOJICHAR_KEYWORD_MAP[short_name.downcase]
+  end
+  # alias old method names for legacy apps
+  class << self
+    alias_method :find_by_unified, :from_unified
+    alias_method :find_by_str, :scan
+  end
   protected
   def self.find_by_value(field,value)
     self.all.select { |char| char.send(field).include? value }
   end

data/lib/emoji_data/emoji_char.rb CHANGED

@@ -1,13 +1,42 @@
 module EmojiData
+  # EmojiChar represents a single Emoji character and its associated metadata.
+  #
+  # @!attribute name
+  #   @return [String] The standardized name used in the Unicode specification
+  #     to represent this emoji character.
+  #
+  # @!attribute unified
+  #   @return [String] The primary unified codepoint ID for the emoji character.
+  #
+  # @!attribute variations
+  #   @return [Array<String>] A list of all variant codepoints that may also
+  #     represent this emoji.
+  #
+  # @!attribute short_name
+  #   @return [String] The canonical "short name" or keyword used in many
+  #     systems to refer to this emoji. Often surrounded by `:colons:` in
+  #     systems like GitHub & Campfire.
+  #
+  # @!attribute short_names
+  #   @return [Array<String>] A full list of possible keywords for the emoji.
+  #
+  # @!attribute text
+  #   @return [String] An alternate textual representation of the emoji, for
+  #   example a smiley face emoji may be represented with an ASCII alternative.
+  #   Most emoji do not have a text alternative. This is typically used when
+  #   building an automatic translation from typed emoticons.
+  #
   class EmojiChar
     def initialize(emoji_hash)
       # work around inconsistency in emoji.json for now by just setting a blank
       # array for instance value, and let it get overriden in main
       # deserialization loop if variable is present.
       @variations = []
-      # http://stackoverflow.com/questions/1615190/declaring-instance-variables-iterating-over-a-hash
+      # trick for declaring instance variables while iterating over a hash
+      # http://stackoverflow.com/questions/1615190/
       emoji_hash.each do |k,v|
         instance_variable_set("@#{k}",v)
         eigenclass = class<<self; self; end
@@ -15,51 +44,78 @@ module EmojiData
       end
     end
-    # Returns a version of the character for rendering to screen.
+    # Renders an `EmojiChar` to its string glyph representation, suitable for
+    # printing to screen.
+    #
+    # @option opts [Boolean] :variant_encoding specify whether the variant
+    #   encoding selector should be used to hint to rendering devices that
+    #   "graphic" representation should be used. By default, we use this for all
+    #   Emoji characters that contain a possible variant.
     #
-    # By default this will now use the variant encoding if it exists.
-    def char(options = {})
-      options = {variant_encoding: true}.merge(options)
+    # @return [String] the emoji character rendered to a UTF-8 string
+    def render(opts = {})
+      options = {variant_encoding: true}.merge(opts)
       #decide whether to use the normal unified ID or the variant for encoding to str
       target = (self.variant? && options[:variant_encoding]) ? self.variant : @unified
       EmojiChar::unified_to_char(target)
     end
-    # Return ALL known possible string encodings of the emoji char.
+    alias_method :to_s, :render
+    alias_method :char, :render
+    # Returns a list of all possible UTF-8 string renderings of an `EmojiChar`.
     #
-    # Mostly useful for doing find operations when you need them all.
+    # E.g., normal, with variant selectors, etc. This is useful if you want to
+    # have all possible values to match against when searching for the emoji in
+    # a string representation.
+    #
+    # @return [Array<String>] all possible UTF-8 string renderings
     def chars
-      results = [self.char({variant_encoding: false})]
+      results = [self.render({variant_encoding: false})]
       @variations.each do |variation|
         results << EmojiChar::unified_to_char(variation)
       end
       @chars ||= results
     end
-    # Public: Is the character represented by a doublebyte unicode codepoint in unicode?
+    # Is the `EmojiChar` represented by a doublebyte codepoint in Unicode?
+    #
+    # @return [Boolean]
     def doublebyte?
-      @unified.match(/-/)
+      @unified.include? "-"
     end
-    # does the emojichar have an alternate variant encoding?
+    # Does the `EmojiChar` have an alternate Unicode variant encoding?
+    #
+    # @return [Boolean]
     def variant?
       @variations.length > 0
     end
-    # return whatever is the most likely variant ID for the emojichar
-    # for now, there can only be one, so just return first.
-    # (in the future, there may be multiple variants, who knows!)
+    # Returns the most likely variant-encoding codepoint ID for an `EmojiChar`.
+    #
+    # For now we only know of one possible variant encoding for certain
+    # characters, but there could be others in the future.
+    #
+    # This is typically used to force Emoji rendering for characters that could
+    # be represented in standard font glyphs on certain operating systems.
+    #
+    # The resulting encoded string will be two codepoints, or three codepoints
+    # for doublebyte Emoji characters.
+    #
+    # @return [String, nil]
+    #   The most likely variant-encoding codepoint ID.
+    #   If there is no variant-encoding for a character, returns nil.
     def variant
       @variations.first
     end
-    alias_method :to_s, :char
     protected
     def self.unified_to_char(cps)
       cps.split('-').map { |i| i.hex }.pack("U*")
     end
   end
 end

data/lib/emoji_data/version.rb CHANGED

@@ -1,3 +1,4 @@
 module EmojiData
-  VERSION = "0.1.0"
+  # Current version of the module, for bundling to rubygems.org
+  VERSION = "0.2.0.rc1"
 end

data/scripts/benchmark.rb ADDED

@@ -0,0 +1,70 @@
+# encoding: UTF-8
+require './lib/emoji_data'
+require 'benchmark/ips'
+suites = []
+s0 = "I liek to eat cake oh so very much cake eating is nice!! #cake #food"
+s1 = "🚀"
+s2 = "flying on my 🚀 to visit the 👾 people."
+s3 = "first a \u{0023}\u{FE0F}\u{20E3} then a 🚀"
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiData.scan(s0)") { EmojiData.scan(s0) }
+  x.report("EmojiData.scan(s1)") { EmojiData.scan(s1) }
+  x.report("EmojiData.scan(s2)") { EmojiData.scan(s2) }
+  x.report("EmojiData.scan(s3)") { EmojiData.scan(s3) }
+end
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiData.all")                       { EmojiData.all() }
+  x.report("EmojiData.all_doublebyte")            { EmojiData.all_doublebyte() }
+  x.report("EmojiData.all_with_variants")         { EmojiData.all_with_variants() }
+  x.report("EmojiData.from_unified")              { EmojiData.from_unified("1F680") }
+  x.report("EmojiData.chars")                     { EmojiData.chars() }
+  x.report("EmojiData.codepoints")                { EmojiData.codepoints() }
+  x.report("EmojiData.find_by_name - many")       { EmojiData.find_by_name("tree") }
+  x.report("EmojiData.find_by_name - none")       { EmojiData.find_by_name("zzzz") }
+  x.report("EmojiData.find_by_short_name - many") { EmojiData.find_by_short_name("MOON") }
+  x.report("EmojiData.find_by_short_name - none") { EmojiData.find_by_short_name("zzzz") }
+  x.report("EmojiData.char_to_unified - single")  { EmojiData.char_to_unified("🚀") }
+  x.report("EmojiData.char_to_unified - double")  { EmojiData.char_to_unified("\u{2601}\u{FE0F}") }
+  x.report("EmojiData.unified_to_char - single")  { EmojiData.unified_to_char("1F47E") }
+  x.report("EmojiData.unified_to_char - double")  { EmojiData.unified_to_char("2764-fe0f") }
+  x.report("EmojiData.unified_to_char - triple")  { EmojiData.unified_to_char("0030-FE0F-20E3") }
+end
+invader   = EmojiData::EmojiChar.new({unified: '1F47E'})
+usflag    = EmojiData::EmojiChar.new({unified: '1F1FA-1F1F8'})
+hourglass = EmojiData::EmojiChar.new({unified: '231B', variations: ['231B-FE0F']})
+cloud     = EmojiData::EmojiChar.new({unified: '2601', variations: ['2601-FE0F']})
+suites << Benchmark.ips do |x|
+  x.config(:time => 1, :warmup => 0)
+  x.report("EmojiChar.render - single")  { invader.render() }
+  x.report("EmojiChar.render - double")  { usflag.render() }
+  x.report("EmojiChar.render - variant") { cloud.render({variant_encoding: true}) }
+  x.report("EmojiChar.chars")            { cloud.chars() }
+  x.report("EmojiChar.doublebyte?")      { invader.doublebyte?() }
+  x.report("EmojiChar.variant?")         { invader.variant?() }
+  x.report("EmojiChar.variant")          { invader.variant() }
+end
+def micros(hz)
+  1_000_000 / hz
+end
+suites.each do |report|
+  results = report.entries.sort { |a,b| b.ips <=> a.ips }
+  print "\n"
+  results.each do |r|
+    printf "%-45s %10u   %.2f µs/op\n", r.label, r.iterations, micros(r.ips)
+  end
+end

data/spec/emoji_char_spec.rb CHANGED

@@ -38,22 +38,28 @@ describe EmojiChar do
       end
     end
-    describe "#char" do
+    describe "#render" do
       it "should render as happy shiny unicode" do
-        @invader.char.should eq("👾")
+        @invader.render.should eq("👾")
       end
       it "should render as happy shiny unicode for doublebyte chars too" do
-        @usflag.char.should eq("🇺🇸")
+        @usflag.render.should eq("🇺🇸")
       end
       it "should have a flag to output forced emoji variant char encoding if requested" do
-        @cloud.char(    {variant_encoding: false}).should eq("\u{2601}")
-        @cloud.char(    {variant_encoding:  true}).should eq("\u{2601}\u{FE0F}")
-        @invader.char(  {variant_encoding: false}).should eq("\u{1F47E}")
-        @invader.char(  {variant_encoding:  true}).should eq("\u{1F47E}")
+        @cloud.render(    {variant_encoding: false}).should eq("\u{2601}")
+        @cloud.render(    {variant_encoding:  true}).should eq("\u{2601}\u{FE0F}")
+        @invader.render(  {variant_encoding: false}).should eq("\u{1F47E}")
+        @invader.render(  {variant_encoding:  true}).should eq("\u{1F47E}")
       end
       it "should default to variant encoding for chars with a variant present" do
-        @cloud.char.should eq("\u{2601}\u{FE0F}")
-        @hourglass.char.should eq("\u{231B}\u{FE0F}")
+        @cloud.render.should eq("\u{2601}\u{FE0F}")
+        @hourglass.render.should eq("\u{231B}\u{FE0F}")
+      end
+    end
+    describe "#char - DEPRECATED" do
+      it "should maintain compatibility with old method name for .render" do
+        @cloud.char.should eq(@cloud.render)
       end
     end

data/spec/emoji_data_spec.rb CHANGED

@@ -56,12 +56,12 @@ describe EmojiData do
     end
   end
-  describe ".find_by_str" do
+  describe ".scan" do
     before(:all) do
-      @exact_results   = EmojiData.find_by_str("🚀")
-      @multi_results   = EmojiData.find_by_str("flying on my 🚀 to visit the 👾 people.")
-      @variant_results = EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}")
-      @variant_multi   = EmojiData.find_by_str("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
+      @exact_results   = EmojiData.scan("🚀")
+      @multi_results   = EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
+      @variant_results = EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}")
+      @variant_multi   = EmojiData.scan("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
     end
     it "should find the proper EmojiChar object from a single string char" do
       @exact_results.should be_kind_of(Array)
@@ -89,22 +89,34 @@ describe EmojiData do
     end
   end
-  describe ".find_by_unified" do
+  describe ".find_by_str - DEPRECATED" do
+    it "should maintain compatibility with old method name for .scan" do
+      EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}").should eq(EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}"))
+    end
+  end
+  describe ".from_unified" do
     it "should find the proper EmojiChar object" do
-      results = EmojiData.find_by_unified('1f680')
+      results = EmojiData.from_unified('1f680')
       results.should be_kind_of(EmojiChar)
       results.name.should eq('ROCKET')
     end
     it "should normalise capitalization for hex values" do
-      EmojiData.find_by_unified('1f680').should_not be_nil
+      EmojiData.from_unified('1f680').should_not be_nil
     end
     it "should find via variant encoding ID format as well" do
-      results = EmojiData.find_by_unified('2764-fe0f')
+      results = EmojiData.from_unified('2764-fe0f')
       results.should_not be_nil
       results.name.should eq('HEAVY BLACK HEART')
     end
   end
+  describe ".find_by_unified - DEPRECATED" do
+    it "should maintain compatibility with old method name for .from_unified" do
+      EmojiData.find_by_unified('1f680').should eq(EmojiData.from_unified('1f680'))
+    end
+  end
   describe ".find_by_name" do
     it "returns an array of results, upcasing input if needed" do
       EmojiData.find_by_name('tree').should be_kind_of(Array)
@@ -129,6 +141,25 @@ describe EmojiData do
     end
   end
+  describe ".from_short_name" do
+    it "returns exact matches on a short name" do
+      results = EmojiData.from_short_name('scream')
+      results.should be_kind_of(EmojiChar)
+      results.name.should eq('FACE SCREAMING IN FEAR')
+    end
+    it "handles lowercasing input if required" do
+      EmojiData.from_short_name('SCREAM').should eq( EmojiData.from_short_name('scream') )
+    end
+    it "works on secondary keywords" do
+      primary = EmojiData.from_short_name('hankey')
+      EmojiData.from_short_name('poop').should eq(primary)
+      EmojiData.from_short_name('shit').should eq(primary)
+    end
+    it "returns nil if nothing matches" do
+      EmojiData.from_short_name('taco').should be_nil
+    end
+  end
   describe ".char_to_unified" do
     it "converts normal emoji to unified codepoint" do
       EmojiData.char_to_unified("👾").should eq('1F47E')

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: emoji_data
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0.rc1
 platform: ruby
 authors:
 - Matthew Rothenberg
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-05-03 00:00:00.000000000 Z
+date: 2014-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -42,16 +42,16 @@ dependencies:
   name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 2.14.1
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - ">="
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0'
+        version: 2.14.1
 - !ruby/object:Gem::Dependency
   name: simplecov
   requirement: !ruby/object:Gem::Requirement
@@ -80,6 +80,34 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: 0.7.0
+- !ruby/object:Gem::Dependency
+  name: benchmark-ips
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 2.0.0
+- !ruby/object:Gem::Dependency
+  name: yard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7.4
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7.4
 description: Provides classes and helpers for dealing with emoji character data as
   unicode.  Wraps a library of all known emoji characters and provides convenience
   methods.
@@ -90,17 +118,21 @@ extensions: []
 extra_rdoc_files: []
 files:
 - ".coveralls.yml"
+- ".editorconfig"
+- ".gitattributes"
 - ".gitignore"
 - ".travis.yml"
+- ".yardopts"
 - CHANGELOG.md
 - Gemfile
-- LICENSE.txt
+- LICENSE
 - README.md
 - Rakefile
 - emoji_data.gemspec
 - lib/emoji_data.rb
 - lib/emoji_data/emoji_char.rb
 - lib/emoji_data/version.rb
+- scripts/benchmark.rb
 - spec/emoji_char_spec.rb
 - spec/emoji_data_spec.rb
 - spec/spec_helper.rb
@@ -118,12 +150,12 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 1.9.2
+      version: 1.9.3
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - ">="
+  - - ">"
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 1.3.1
 requirements: []
 rubyforge_project:
 rubygems_version: 2.2.2
@@ -134,3 +166,4 @@ test_files:
 - spec/emoji_char_spec.rb
 - spec/emoji_data_spec.rb
 - spec/spec_helper.rb
+has_rdoc: