RubyGems - emoji_data - Versions diffs - 0.0.3 → 0.1.0.rc1 - Mend

emoji_data 0.0.3 → 0.1.0.rc1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/.coveralls.yml +2 -0
data/CHANGELOG.md +8 -0
data/README.md +9 -7
data/emoji_data.gemspec +2 -0
data/lib/emoji_data.rb +52 -11
data/lib/emoji_data/emoji_char.rb +42 -3
data/lib/emoji_data/version.rb +1 -1
data/spec/emoji_char_spec.rb +41 -5
data/spec/emoji_data_spec.rb +75 -6
data/spec/spec_helper.rb +11 -0
data/vendor/emoji-data/README.md +7 -0
data/vendor/emoji-data/emoji.json +1 -1
metadata +34 -4

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: c7a2b01097e18162bdb426c05bfb76f871ca7f39
-  data.tar.gz: 5e75e18ab2252aab634cd17863a6b7cbb471119b
+  metadata.gz: 7c2cca7a834cd49e348c09b50c042596f93f5076
+  data.tar.gz: 8402df0f4d2d828ec9ed322d093da13602a4d64c
 SHA512:
-  metadata.gz: 23f834f7c47f8015bf14a80623cdaf95ebfd45a2d33f9eeeda0434232146311c439b895476ac7b65382e12c0a8b7038cd7508b6352871d643474acf20f32d838
-  data.tar.gz: 7a93b457a18d7bfec72cf3914527a9426abe03aab8b186812b6e7d8fd3cf3ee576cf57c1f88ba415862ce10dba388215e014e10f204ad625d52d6e1e2aeb6526
+  metadata.gz: 95dba4e1fbb80d741b5c535b091ecf60fd6666c0a810f6e69eaf15a64a13aab72eae8b1b5b1969b3b8d5894bf5f4dd646261a1560a28b02ef331678dbf8481c7
+  data.tar.gz: fdebfa5d68128ec34273fd4e21192e0203fd2f377a2c083b9282688bc205500b65f69fc410ffa6cc4f80133623bff0bac0e090352177f10cd5702b557ad86814

data/.coveralls.yml ADDED

	@@ -0,0 +1,2 @@
1	+ service_name: travis-ci
2	+

data/CHANGELOG.md CHANGED

@@ -1,5 +1,13 @@
 # Changelog
+## 0.1.0 (Pending)
+ * Add support for Unicode variant encodings, used by MacOSX 10.9 / iOS 7.
+   - For more info: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
+   - By default, `EmojiChar.to_s()` and `.char()` will now use the variant encoding.
+ * Import latest version of iamcal/emoji-data.
+ * 100% test coverage. :sunglasses:
 ## 0.0.3 (1 April 2013)
  * On initialization, create hashmaps to cache lookups for `.find_by_unified()`.

data/README.md CHANGED

@@ -1,13 +1,17 @@
 # emoji_data.rb
-[![Gem Version](https://badge.fury.io/rb/emoji_data.png)](http://badge.fury.io/rb/emoji_data)
-[![Build Status](https://travis-ci.org/mroth/emoji_data.rb.png?branch=master)](https://travis-ci.org/mroth/emoji_data.rb)
+[![Gem Version](http://img.shields.io/gem/v/emoji_data.svg?style=flat)](https://rubygems.org/gems/emoji_data)
+[![Build Status](http://img.shields.io/travis/mroth/emoji_data.rb.svg?style=flat)](https://travis-ci.org/mroth/emoji_data.rb)
+[![Dependency Status](http://img.shields.io/gemnasium/mroth/emoji_data.rb.svg?style=flat)](https://gemnasium.com/mroth/emoji_data.rb)
+[![CodeClimate Status](http://img.shields.io/codeclimate/github/mroth/emoji_data.rb.svg?style=flat)](https://codeclimate.com/github/mroth/emoji_data.rb)
+[![Coverage Status](http://img.shields.io/coveralls/mroth/emoji_data.rb.svg?style=flat)](https://coveralls.io/r/mroth/emoji_data.rb)
 Provides classes and helpers for dealing with emoji character data as unicode.  Wraps a library of all known emoji characters and provides convenience methods.
-Note, this is mostly useful for low-level operations.  If you can avoid having to deal with unicode character data extensively and just want to encode/decode stuff, [rumoji](https://github.com/mwunsch/rumoji) might be a better bet for you.
+Note, this is mostly useful for low-level operations.  If you can avoid having to deal with unicode character data extensively and just want to encode/decode stuff, [rumoji](https://github.com/mwunsch/rumoji) might be a better bet for you.  If however, you are doing anything complicated involving emoji encoding/decoding, or you are just obsessed with understanding the details, this library is your new best friend.
-This library currently uses `iamcal/emoji-data` as it's library dataset, and thus considers it to be the "source of truth" regarding certain things, such as how to represent doublebyte unified codepoint IDs as strings (seperated by a dash).
+This library currently uses `iamcal/emoji-data` as it's dataset, and thus considers it to be the "source of truth" regarding certain things, such as how to represent doublebyte unified codepoint IDs as strings (seperated by a dash).
 This is basically a helper library for my [emojitrack](https://github.com/mroth/emojitrack) and [emojistatic](https://github.com/mroth/emojistatic) projects, but may be useful for other people.
@@ -25,7 +29,7 @@ Or install it yourself as:
     $ gem install emoji_data
-Currently requires Ruby 1.9 or more recent.
+Currently requires `RUBY_VERSION >= 1.9.2`.
 ## Library Usage
@@ -64,5 +68,3 @@ Some notable methods to call out:
   `EmojiData::EmojiChar` is a class representing a single emoji character.  All the variables from the `iamcal/emoji-data` dataset have dynamically generated getter methods.
 There are some additional convenience methods, such as `#doublebyte?` etc. Most important addition is the `#char` method which will output a properly unicode encoded string containing the character.

data/emoji_data.gemspec CHANGED

@@ -21,6 +21,8 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency "bundler", "~> 1.3"
   spec.add_development_dependency "rake"
   spec.add_development_dependency "rspec"
+  spec.add_development_dependency 'simplecov', '~> 0.7.1'
+  spec.add_development_dependency 'coveralls', '~> 0.7.0'
   spec.required_ruby_version = '>= 1.9.2'
 end

data/lib/emoji_data.rb CHANGED

@@ -8,19 +8,48 @@ module EmojiData
   EMOJI_MAP = JSON.parse( RAW_JSON )
   EMOJI_CHARS = EMOJI_MAP.map { |em| EmojiChar.new(em) }
-  # hashmap for fast unified lookups
+  #
+  # construct hashmap for fast precached lookups for `.find_by_unified`
+  #
   EMOJICHAR_UNIFIED_MAP = Hash[EMOJI_CHARS.map { |u| [u.unified, u] }]
+  # merge variant encodings into map so we can look them up as well
+  EMOJI_CHARS.select(&:variant?).each do |char|
+    char.variations.each do |variant|
+      EMOJICHAR_UNIFIED_MAP.merge! Hash[variant,char]
+    end
+  end
   def self.all
     EMOJI_CHARS
   end
-  def self.chars
-    @chars ||= EMOJI_CHARS.map(&:char)
+  def self.all_doublebyte
+    EMOJI_CHARS.select(&:doublebyte?)
+  end
+  def self.all_with_variants
+    EMOJI_CHARS.select(&:variant?)
+  end
+  def self.chars(options={})
+    options = {include_variants: false}.merge(options)
+    normals = EMOJI_CHARS.map { |c| c.char({variant_encoding: false}) }
+    extras  = self.all_with_variants.map { |c| c.char({variant_encoding: true}) }
+    if options[:include_variants]
+      return normals + extras
+    end
+    normals
   end
-  def self.codepoints
-    @codepoints ||= EMOJI_CHARS.map(&:unified)
+  def self.codepoints(options={})
+    options = {include_variants: false}.merge(options)
+    if options[:include_variants]
+      return EMOJI_CHARS.map(&:unified) + self.all_with_variants.map {|c| c.variant}
+    end
+    EMOJI_CHARS.map(&:unified)
   end
   def self.char_to_unified(char)
@@ -28,26 +57,25 @@ module EmojiData
   end
   def self.unified_to_char(cp)
-    find_by_unified(cp).char
+    EmojiChar::unified_to_char(cp)
   end
   def self.find_by_unified(cp)
-    # EMOJI_CHARS.detect { |ec| ec.unified == cp.upcase }
     EMOJICHAR_UNIFIED_MAP[cp.upcase]
   end
   def self.find_by_str(str)
-    matches = EMOJI_CHARS.select { |ec| str.include? ec.char }
-    matches.sort_by { |matched_char| str.index(matched_char.char) }
+    str.extend EmojiData::StringUtils
+    matches = EMOJI_CHARS.select { |ec| str.include_any? ec.chars }
+    matches.sort_by { |mc| str.index_first(mc.chars) }
   end
   def self.find_by_name(name)
-    # self.all.select { |char| char.name.include? name.upcase }
     self.find_by_value(:name, name.upcase)
   end
   def self.find_by_short_name(short_name)
-    # self.all.select { |char| char.short_name.include? name.downcase }
     self.find_by_value(:short_name, short_name.downcase)
   end
@@ -56,4 +84,17 @@ module EmojiData
     self.all.select { |char| char.send(field).include? value }
   end
+  module StringUtils
+    def include_any?(charstr)
+      charstr.any? { |char| self.include? char }
+    end
+    def index_first(charstr)
+      charstr.each do |char|
+        return self.index(char) if !self.index(char).nil?
+      end
+      nil
+    end
+  end
 end

data/lib/emoji_data/emoji_char.rb CHANGED

@@ -2,6 +2,11 @@ module EmojiData
   class EmojiChar
     def initialize(emoji_hash)
+      # work around inconsistency in emoji.json for now by just setting a blank
+      # array for instance value, and let it get overriden in main
+      # deserialization loop if variable is present.
+      @variations = []
       # http://stackoverflow.com/questions/1615190/declaring-instance-variables-iterating-over-a-hash
       emoji_hash.each do |k,v|
         instance_variable_set("@#{k}",v)
@@ -10,9 +15,25 @@ module EmojiData
       end
     end
-    # Public: Returns a version of the character for rendering to screen.
-    def char
-      @char ||= @unified.split('-').map { |i| i.hex }.pack("U*")
+    # Returns a version of the character for rendering to screen.
+    #
+    # By default this will now use the variant encoding if it exists.
+    def char(options = {})
+      options = {variant_encoding: true}.merge(options)
+      #decide whether to use the normal unified ID or the variant for encoding to str
+      target = (self.variant? && options[:variant_encoding]) ? self.variant : @unified
+      EmojiChar::unified_to_char(target)
+    end
+    # Return ALL known possible string encodings of the emoji char.
+    #
+    # Mostly useful for doing find operations when you need them all.
+    def chars
+      results = [self.char({variant_encoding: false})]
+      @variations.each do |variation|
+        results << EmojiChar::unified_to_char(variation)
+      end
+      @chars ||= results
     end
     # Public: Is the character represented by a doublebyte unicode codepoint in unicode?
@@ -20,7 +41,25 @@ module EmojiData
       @unified.match(/-/)
     end
+    # does the emojichar have an alternate variant encoding?
+    def variant?
+      @variations.length > 0
+    end
+    # return whatever is the most likely variant ID for the emojichar
+    # for now, there can only be one, so just return first.
+    # (in the future, there may be multiple variants, who knows!)
+    def variant
+      @variations.first
+    end
     alias_method :to_s, :char
+    protected
+    def self.unified_to_char(cps)
+      cps.split('-').map { |i| i.hex }.pack("U*")
+    end
   end
 end

data/lib/emoji_data/version.rb CHANGED

@@ -1,3 +1,3 @@
 module EmojiData
-  VERSION = "0.0.3"
+  VERSION = "0.1.0.rc1"
 end

data/spec/emoji_char_spec.rb CHANGED

@@ -4,18 +4,19 @@ require 'spec_helper'
 describe EmojiChar do
   describe ".new" do
     before(:all) do
-      poop_json = %q/{"name":"PILE OF POO","unified":"1F4A9","docomo":"","au":"E4F5","softbank":"E05A","google":"FE4F4","image":"1f4a9.png","sheet_x":13,"sheet_y":19,"short_name":"hankey","short_names":["hankey","poop","shit"],"text":null}/
+      poop_json = %q/{"name":"PILE OF POO","unified":"1F4A9","variations":[],"docomo":"","au":"E4F5","softbank":"E05A","google":"FE4F4","image":"1f4a9.png","sheet_x":11,"sheet_y":19,"short_name":"hankey","short_names":["hankey","poop","shit"],"text":null}/
       @poop = EmojiChar.new(JSON.parse poop_json)
     end
     it "should create instance getters for all key-values in emoji.json, with blanks as nil" do
       @poop.name.should eq('PILE OF POO')
       @poop.unified.should eq('1F4A9')
+      @poop.variations.should eq([])
       @poop.docomo.should eq('')
       @poop.au.should eq('E4F5')
       @poop.softbank.should eq('E05A')
       @poop.google.should eq('FE4F4')
       @poop.image.should eq('1f4a9.png')
-      @poop.sheet_x.should eq(13)
+      @poop.sheet_x.should eq(11)
       @poop.sheet_y.should eq(19)
       @poop.short_name.should eq('hankey')
       @poop.short_names.should eq(["hankey","poop","shit"])
@@ -25,8 +26,10 @@ describe EmojiChar do
   context "instance methods" do
     before(:all) do
-      @invader = EmojiChar.new({'unified' => '1F47E'})
-      @usflag = EmojiChar.new({'unified' => '1F1FA-1F1F8'})
+      @invader   = EmojiChar.new({'unified' => '1F47E'})
+      @usflag    = EmojiChar.new({'unified' => '1F1FA-1F1F8'})
+      @hourglass = EmojiChar.new({'unified' => '231B', 'variations' => ['231B-FE0F']})
+      @cloud     = EmojiChar.new({'unified' => '2601', 'variations' => ['2601-FE0F']})
     end
     describe "#to_s" do
@@ -42,6 +45,23 @@ describe EmojiChar do
       it "should render as happy shiny unicode for doublebyte chars too" do
         @usflag.char.should eq("🇺🇸")
       end
+      it "should have a flag to output forced emoji variant char encoding if requested" do
+        @cloud.char(    {variant_encoding: false}).should eq("\u{2601}")
+        @cloud.char(    {variant_encoding:  true}).should eq("\u{2601}\u{FE0F}")
+        @invader.char(  {variant_encoding: false}).should eq("\u{1F47E}")
+        @invader.char(  {variant_encoding:  true}).should eq("\u{1F47E}")
+      end
+      it "should default to variant encoding for chars with a variant present" do
+        @cloud.char.should eq("\u{2601}\u{FE0F}")
+        @hourglass.char.should eq("\u{231B}\u{FE0F}")
+      end
+    end
+    describe "#chars" do
+      it "should return an array of all possible string render variations" do
+        @invader.chars.should eq(["\u{1F47E}"])
+        @cloud.chars.should   eq(["\u{2601}","\u{2601}\u{FE0F}"])
+      end
     end
     describe "#doublebyte?" do
@@ -50,5 +70,21 @@ describe EmojiChar do
         @invader.doublebyte?.should be_false
       end
     end
+    describe "#variant?" do
+      it "should indicate when a character has an alternate variant encoding" do
+        @hourglass.variant?.should be_true
+        @usflag.variant?.should be_false
+      end
+    end
+    describe "#variant" do
+      it "should return the most likely variant encoding ID representation for the char" do
+        @hourglass.variant.should eq('231B-FE0F')
+      end
+      it "should return nil if no variant encoding for the char exists" do
+        @usflag.variant.should be_nil
+      end
+    end
   end
-end
+end

data/spec/emoji_data_spec.rb CHANGED

@@ -3,18 +3,65 @@ require 'spec_helper'
 describe EmojiData do
   describe ".all" do
-    it "should return an array of all known emoji chars" do
-      EmojiData.all.count.should eq(842)
+    it "should return an array of all 845 known emoji chars" do
+      EmojiData.all.count.should eq(845)
     end
     it "should return all EmojiChar objects" do
       EmojiData.all.all? {|char| char.class == EmojiData::EmojiChar}.should be_true
     end
   end
+  describe ".all_doublebyte" do
+    it "should return an array of all 21 known emoji chars with doublebyte encoding" do
+      EmojiData.all_doublebyte.count.should eq(21)
+    end
+  end
+  describe ".all_with_variants" do
+    it "should return an array of all 107 known emoji chars with variant encodings" do
+      EmojiData.all_with_variants.count.should eq(107)
+    end
+  end
+  describe ".chars" do
+    it "should return an array of all chars in unicode string format" do
+      EmojiData.chars.all? {|char| char.class == String}.should be_true
+    end
+    it "should by default return one entry per known EmojiChar" do
+      EmojiData.chars.count.should eq(EmojiData.all.count)
+    end
+    it "should include variants in list when options {include_variants: true}" do
+      results = EmojiData.chars({include_variants: true})
+      numChars    = EmojiData.all.count
+      numVariants = EmojiData.all_with_variants.count
+      results.count.should eq(numChars + numVariants)
+    end
+    it "should not have any duplicates in list when variants are included" do
+      results = EmojiData.chars({include_variants: true})
+      results.count.should eq(results.uniq.count)
+    end
+  end
+  describe ".codepoints" do
+    it "should return an array of all known codepoints in dashed string representation" do
+      EmojiData.codepoints.all? {|cp| cp.class == String}.should be_true
+      EmojiData.codepoints.all? {|cp| cp.match(/^[0-9A-F\-]{4,11}$/)}.should be_true
+    end
+    it "should include variants in list when options {include_variants: true}" do
+      results = EmojiData.codepoints({include_variants: true})
+      numChars    = EmojiData.all.count
+      numVariants = EmojiData.all_with_variants.count
+      results.count.should eq(numChars + numVariants)
+      results.all? {|cp| cp.match(/^[0-9A-F\-]{4,16}$/)}.should be_true
+    end
+  end
   describe ".find_by_str" do
     before(:all) do
-      @exact_results = EmojiData.find_by_str("🚀")
-      @multi_results = EmojiData.find_by_str("flying on my 🚀 to visit the 👾 people.")
+      @exact_results   = EmojiData.find_by_str("🚀")
+      @multi_results   = EmojiData.find_by_str("flying on my 🚀 to visit the 👾 people.")
+      @variant_results = EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}")
+      @variant_multi   = EmojiData.find_by_str("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
     end
     it "should find the proper EmojiChar object from a single string char" do
       @exact_results.should be_kind_of(Array)
@@ -22,6 +69,10 @@ describe EmojiData do
       @exact_results.first.should be_kind_of(EmojiChar)
       @exact_results.first.name.should eq('ROCKET')
     end
+    it "should find the proper EmojiChar object from a variant encoded char" do
+      @variant_results.length.should eq(1)
+      @variant_results.first.name.should eq('HASH KEY')
+    end
     it "should match multiple chars from within a string" do
       @multi_results.should be_kind_of(Array)
       @multi_results.length.should eq(2)
@@ -32,6 +83,10 @@ describe EmojiData do
       @multi_results[0].name.should eq('ROCKET')
       @multi_results[1].name.should eq('ALIEN MONSTER')
     end
+    it "should return multiple matches in the proper order for variant encodings" do
+      @variant_multi[0].name.should eq('HASH KEY')
+      @variant_multi[1].name.should eq('ROCKET')
+    end
   end
   describe ".find_by_unified" do
@@ -40,9 +95,14 @@ describe EmojiData do
       results.should be_kind_of(EmojiChar)
       results.name.should eq('ROCKET')
     end
-    it "should normallise capitalization for hex values" do
+    it "should normalise capitalization for hex values" do
       EmojiData.find_by_unified('1f680').should_not be_nil
     end
+    it "should find via variant encoding ID format as well" do
+      results = EmojiData.find_by_unified('2764-fe0f')
+      results.should_not be_nil
+      results.name.should eq('HEAVY BLACK HEART')
+    end
   end
   describe ".find_by_name" do
@@ -78,6 +138,9 @@ describe EmojiData do
       EmojiData.char_to_unified("🇺🇸").should eq('1F1FA-1F1F8')
       EmojiData.char_to_unified("#⃣").should eq('0023-20E3')
     end
+    it "converts variant encoded emoji to variant unified codepoint" do
+      EmojiData.char_to_unified("\u{2601}\u{FE0F}").should eq('2601-FE0F')
+    end
   end
   # TODO: below is kinda redundant but it is helpful as a helper method so maybe still test
@@ -90,5 +153,11 @@ describe EmojiData do
       EmojiData.unified_to_char('1F1FA-1F1F8').should eq("🇺🇸")
       EmojiData.unified_to_char('0023-20E3').should eq("#⃣")
     end
+    it "converts variant unified codepoints to unicode strings" do
+      EmojiData.unified_to_char('2764-fe0f').should eq("\u{2764}\u{FE0F}")
+    end
+    it "converts variant+doublebyte chars (triplets!) to unicode strings" do
+      EmojiData.unified_to_char('0030-FE0F-20E3').should eq("\u{0030}\u{FE0F}\u{20E3}")
+    end
   end
-end
+end