emoji_data 0.1.0 → 0.2.0.rc1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: e42e74736faf460d0722d647c34ab68dad7130e8
4
- data.tar.gz: d1c42bfa36cda5c7addc7079bf4daef60bf17a6f
3
+ metadata.gz: 7385039bbd2cb55d93480a7d389c88fc8f47bbfa
4
+ data.tar.gz: 481273df89feb32b0c6d7178711bb6485e110538
5
5
  SHA512:
6
- metadata.gz: 20929861bc5b903569576f8cdca11d046ce970def2e1221d1eb31ad9a7fac7ad2e6f9b64485c17a49e87e12e3aaccecfb006bebd8f25662268a335435420121a
7
- data.tar.gz: 501d44516538071d5df7b2c37c32c2e933661c68699a3d65d12872a88ad5411f11e26f419a292f51230fa2a155893cddb0748c43ee7218db492dd38f860fbea5
6
+ metadata.gz: 6be855ddc07996303eef279b6c840a91a27da97774635e71c500f184b14a0fe4e30977dc0cadb1b48cb7a9f4ff465bae6613fd7b93d1101ee785245bb69097ea
7
+ data.tar.gz: 9a5c308587f581f2a500ac7686664ebe9ab86c103022c372fccf5d3c5a51b359992b9fec612ba6323e805d633d84470ce825ac1f7a945da3b94348a8ad4f087b
@@ -0,0 +1,18 @@
1
+ # EditorConfig helps developers define and maintain consistent
2
+ # coding styles between different editors and IDEs
3
+ # editorconfig.org
4
+
5
+ root = true
6
+
7
+ [*]
8
+ indent_style = space
9
+ indent_size = 2
10
+
11
+ end_of_line = lf
12
+ charset = utf-8
13
+ trim_trailing_whitespace = true
14
+ insert_final_newline = true
15
+
16
+ [*.md]
17
+ trim_trailing_whitespace = false
18
+
@@ -0,0 +1,2 @@
1
+ * text=auto
2
+
@@ -5,7 +5,12 @@ rvm:
5
5
  - 1.9.3
6
6
  - 2.0.0
7
7
  - 2.1.0
8
+ - 2.1.1
9
+ - ruby-head
10
+ - jruby
8
11
 
9
12
  matrix:
10
13
  allow_failures:
11
14
  - rvm: 1.8.7
15
+ - rvm: 1.9.2
16
+ - rvm: ruby-head
@@ -0,0 +1,2 @@
1
+ --markup markdown
2
+
@@ -1,11 +1,33 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.0 (TBD)
4
+
5
+ * Rename a number of methods to be clearer and more consistent with that they
6
+ actually do:
7
+ - `EmojiChar.char()` → `EmojiChar.render()`
8
+ - `EmojiData.find_by_unified()` → `EmojiData.from_unified()`
9
+ - `EmojiData.find_by_str()` → `EmojiData.scan()`
10
+
11
+ Don't worry, the old names are still aliased in so you don't have to change
12
+ anything in your existing code. This change is make things clearer for
13
+ people new to the library.
14
+
15
+ * Add new `.from_short_name()` library method for fast keyword lookups.
16
+ * DEVELOPERS: Internal code cleanup and better comments.
17
+ * DEVELOPERS: Add benchmark suite for comparing method implementation time
18
+ across versions of this library.
19
+
3
20
  ## 0.1.0 (3 May 2014)
4
21
 
5
22
  * Add support for Unicode variant encodings, used by MacOSX 10.9 / iOS 7.
6
23
  - For more info: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
7
- - By default, `EmojiChar.to_s()` and `.char()` will now use the variant encoding.
8
- * With adding support for variants, the speed of `find_by_str` regressed by approximately 20% (because there are more codepoints to match against). To counter this, we switched to a Regex based scan than improves performance of the method by over 250x(!). A complete sorted search against 1000 strings now takes ~2ms where before it would take around a half second.
24
+ - By default, `EmojiChar.to_s()` and `.char()` will now use the variant
25
+ encoding.
26
+ * With adding support for variants, the speed of `find_by_str` regressed by
27
+ approximately 20% (because there are more codepoints to match against). To
28
+ counter this, we switched to a Regex based scan than improves performance of
29
+ the method by over 250x(!). A complete sorted search against 1000 strings
30
+ now takes ~2ms where before it would take around a half second.
9
31
  * Import latest version of iamcal/emoji-data.
10
32
  * 100% test coverage. :sunglasses:
11
33
 
@@ -13,9 +35,12 @@
13
35
 
14
36
  * On initialization, create hashmaps to cache lookups for `.find_by_unified()`.
15
37
 
16
- In a quick benchmark in MRI 2.1.1, this reduces the time needed for one million lookups from `13.5s` to `0.3s`!
38
+ In a quick benchmark in MRI 2.1.1, this reduces the time needed for one
39
+ million lookups from `13.5s` to `0.3s`!
17
40
 
18
- This is only for lookup by unified ID for now, since the other `find_by_*()` methods are actually searches that can return multiple values. I'll look at nested hashmaps for those if there is a pressing performance need later.
41
+ This is only for lookup by unified ID for now, since the other `find_by_*()`
42
+ methods are actually searches that can return multiple values. I'll look at
43
+ nested hashmaps for those if there is a pressing performance need later.
19
44
 
20
45
  ## 0.0.2 (3 December 2013)
21
46
 
File without changes
data/README.md CHANGED
@@ -3,21 +3,29 @@
3
3
  [![Gem Version](http://img.shields.io/gem/v/emoji_data.svg?style=flat)](https://rubygems.org/gems/emoji_data)
4
4
  [![Build Status](http://img.shields.io/travis/mroth/emoji_data.rb.svg?style=flat)](https://travis-ci.org/mroth/emoji_data.rb)
5
5
  [![Dependency Status](http://img.shields.io/gemnasium/mroth/emoji_data.rb.svg?style=flat)](https://gemnasium.com/mroth/emoji_data.rb)
6
- [![CodeClimate Status](http://img.shields.io/codeclimate/github/mroth/emoji_data.rb.svg?style=flat)](https://codeclimate.com/github/mroth/emoji_data.rb)
7
6
  [![Coverage Status](http://img.shields.io/coveralls/mroth/emoji_data.rb.svg?style=flat)](https://coveralls.io/r/mroth/emoji_data.rb)
8
7
 
8
+ Ruby library providing low level operations for dealing with Emoji
9
+ glyphs in the Unicode standard. :cool:
9
10
 
10
- Provides classes and helpers for dealing with emoji character data as unicode. Wraps a library of all known emoji characters and provides convenience methods.
11
+ EmojiData is like a swiss-army knife for dealing with Emoji encoding issues. If
12
+ all you need to do is translate `:poop:` into :poop:, then there are plenty of
13
+ other libs out there that will probably do what you want. But once you are
14
+ dealing with Emoji as a fundamental part of your application, and you start to
15
+ realize the nightmare of [doublebyte encoding][doublebyte] or
16
+ [variants][variant], then this library may be your new best friend.
17
+ :raised_hands:
11
18
 
12
- Note, this is mostly useful for low-level operations. If you can avoid having to deal with unicode character data extensively and just want to encode/decode stuff, [rumoji](https://github.com/mwunsch/rumoji) might be a better bet for you. If however, you are doing anything complicated involving emoji encoding/decoding, or you are just obsessed with understanding the details, this library is your new best friend.
19
+ EmojiData is used in production by [Emojitracker.com][emojitracker] to parse
20
+ well over 100M+ emoji glyphs daily. :dizzy:
13
21
 
14
- This library currently uses `iamcal/emoji-data` as it's dataset, and thus considers it to be the "source of truth" regarding certain things, such as how to represent doublebyte unified codepoint IDs as strings (seperated by a dash).
15
-
16
- This is basically a helper library for my [emojitrack](https://github.com/mroth/emojitrack) and [emojistatic](https://github.com/mroth/emojistatic) projects, but may be useful for other people.
22
+ [doublebyte]: http://www.quora.com/Why-does-using-emoji-reduce-my-SMS-character-limit-to-70
23
+ [variant]: http://www.unicode.org/L2/L2011/11438-emoji-var.pdf
24
+ [emojitracker]: http://www.emojitracker.com
17
25
 
18
26
  ## Installation
19
27
 
20
- Add this line to your application's Gemfile:
28
+ Add this line to your application's `Gemfile`:
21
29
 
22
30
  gem 'emoji_data'
23
31
 
@@ -29,42 +37,62 @@ Or install it yourself as:
29
37
 
30
38
  $ gem install emoji_data
31
39
 
32
- Currently requires `RUBY_VERSION >= 1.9.2`.
33
-
34
- ## Library Usage
40
+ Currently requires `RUBY_VERSION >= 1.9.3`.
35
41
 
36
- Pretty straightforward, read the source. But here are some things you might care about:
42
+ ## Usage
37
43
 
38
- ### EmojiData
44
+ ### Documentation
45
+ Full API documentation is available via YARD or here:
46
+ http://rubydoc.info/github/mroth/emoji_data.rb/master/frames
39
47
 
40
- The `EmojiData` module provides some convenience methods for dealing with the library of known emoji characters. Check out the source to see what's up.
48
+ ### Examples
49
+ Here are some examples of the type of stuff you can do:
41
50
 
42
- Some notable methods to call out:
51
+ ```irb
52
+ >> require 'emoji_data'
53
+ => true
43
54
 
44
- - `EmojiData.find_by_unified(id)` gives you a quick way to grab a specific EmojiChar.
55
+ >> EmojiData.from_unified('1f680')
56
+ => #<EmojiData::EmojiChar:0x007f8fdba33b40 @variations=[], @name="ROCKET",
57
+ @unified="1F680", @docomo=nil, @au="E5C8", @softbank="E10D", @google="FE7ED",
58
+ @image="1f680.png", @sheet_x=25, @sheet_y=4, @short_name="rocket",
59
+ @short_names=["rocket"], @text=nil, @apple_img=true, @hangouts_img=true,
60
+ @twitter_img=true>
45
61
 
46
- >> EmojiData.find_by_unified('1f680')
47
- => #<EmojiData::EmojiChar:0x007fd455ab2ff8 @name="ROCKET", @unified="1F680", @docomo="", @au="E5C8", @softbank="E10D", @google="FE7ED", @image="1f680.png", @sheet_x=21, @sheet_y=28, @short_name="rocket", @short_names=["rocket"], @text=nil>
62
+ >> EmojiData.all.count
63
+ => 845
48
64
 
49
- - `EmojiData.find_by_name(name)` and `.find_by_short_name(name)` do pretty much what you'd expect:
65
+ >> EmojiData.all_with_variants.count
66
+ => 107
50
67
 
51
- >> EmojiData.find_by_name('thumb')
52
- => [#<EmojiData::EmojiChar:0x007f9db214a558 @name="THUMBS UP SIGN", @unified="1F44D", @docomo="E727", @au="E4F9", @softbank="E00E", @google="FEB97", @image="1f44d.png", @sheet_x=10, @sheet_y=17, @short_name="+1", @short_names=["+1", "thumbsup"], @text=nil>, #<EmojiData::EmojiChar:0x007f9db2149720 @name="THUMBS DOWN SIGN", @unified="1F44E", @docomo="E700", @au="EAD5", @softbank="E421", @google="FEBA0", @image="1f44e.png", @sheet_x=10, @sheet_y=18, @short_name="-1", @short_names=["-1", "thumbsdown"], @text=nil>]
68
+ >> EmojiData.find_by_short_name("moon").count
69
+ => 13
53
70
 
54
- - `EmojiData.char_to_unified(char)` takes a string containing a unified unicode representation of an emoji character and gives you the unicode ID.
71
+ >> EmojiData.all.select(&:doublebyte?).map(&:short_name)
72
+ => ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven",
73
+ "eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
55
74
 
56
- >> EmojiData.char_to_unified('🚀')
57
- => "1F680"
75
+ >> EmojiData.find_by_name("tree").map { |c| [c.unified, c.name, c.render] }
76
+ => [["1F332", "EVERGREEN TREE", "🌲"], ["1F333", "DECIDUOUS TREE", "🌳"],
77
+ ["1F334", "PALM TREE", "🌴"], ["1F384", "CHRISTMAS TREE", "🎄"], ["1F38B",
78
+ "TANABATA TREE", "🎋"]]
58
79
 
59
- - `EmojiData.all` will return an array of all known EmojiChars, so you can map or do whatever funky Enumerable stuff you want to do across the entire character set.
80
+ >> EmojiData.scan("I when marketers talk about the ☁. #blessed").each do |ec|
81
+ ?> puts "Found some #{ec.short_name}!"
82
+ >> end
83
+ Found some hearts!
84
+ Found some cloud!
85
+ => [...]
86
+ ```
60
87
 
61
- #gimmie the shortname of all doublebyte chars
62
- >> EmojiData.all.select(&:doublebyte?).map(&:short_name)
63
- => ["hash", "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "cn", "de", "es", "fr", "gb", "it", "jp", "kr", "ru", "us"]
88
+ ## Contributing
64
89
 
90
+ Please be sure to run `rake spec` and help keep test coverage at :100:.
65
91
 
66
- ### EmojiData::EmojiChar
92
+ There is a full benchmark suite available via `scripts/benchmark.rb`. Please
93
+ run before and after your changes to ensure you have not caused a performance
94
+ regression.
67
95
 
68
- `EmojiData::EmojiChar` is a class representing a single emoji character. All the variables from the `iamcal/emoji-data` dataset have dynamically generated getter methods.
96
+ ## License
69
97
 
70
- There are some additional convenience methods, such as `#doublebyte?` etc. Most important addition is the `#char` method which will output a properly unicode encoded string containing the character.
98
+ [The MIT License (MIT)](LICENSE)
@@ -18,11 +18,13 @@ Gem::Specification.new do |spec|
18
18
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
19
  spec.require_paths = ["lib"]
20
20
 
21
- spec.add_development_dependency "bundler", "~> 1.3"
22
- spec.add_development_dependency "rake"
23
- spec.add_development_dependency "rspec"
24
- spec.add_development_dependency 'simplecov', '~> 0.7.1'
25
- spec.add_development_dependency 'coveralls', '~> 0.7.0'
21
+ spec.add_development_dependency 'bundler', '~> 1.3'
22
+ spec.add_development_dependency 'rake'
23
+ spec.add_development_dependency 'rspec', '~> 2.14.1'
24
+ spec.add_development_dependency 'simplecov', '~> 0.7.1'
25
+ spec.add_development_dependency 'coveralls', '~> 0.7.0'
26
+ spec.add_development_dependency 'benchmark-ips', '~> 2.0.0'
27
+ spec.add_development_dependency 'yard', '~> 0.8.7.4'
26
28
 
27
- spec.required_ruby_version = '>= 1.9.2'
29
+ spec.required_ruby_version = '>= 1.9.3'
28
30
  end
@@ -3,82 +3,192 @@ require 'emoji_data/emoji_char'
3
3
  require 'json'
4
4
 
5
5
  module EmojiData
6
+
7
+ # specify some location paths
6
8
  GEM_ROOT = File.join(File.dirname(__FILE__), '..')
7
- RAW_JSON = IO.read(File.join(GEM_ROOT, 'vendor/emoji-data/emoji.json'))
8
- EMOJI_MAP = JSON.parse( RAW_JSON )
9
- EMOJI_CHARS = EMOJI_MAP.map { |em| EmojiChar.new(em) }
9
+ VENDOR_DATA = 'vendor/emoji-data/emoji.json'
10
10
 
11
- #
12
- # construct hashmap for fast precached lookups for `.find_by_unified`
13
- #
14
- EMOJICHAR_UNIFIED_MAP = Hash[EMOJI_CHARS.map { |u| [u.unified, u] }]
15
- # merge variant encodings into map so we can look them up as well
16
- EMOJI_CHARS.select(&:variant?).each do |char|
17
- char.variations.each do |variant|
18
- EMOJICHAR_UNIFIED_MAP.merge! Hash[variant,char]
19
- end
11
+ # precomputed list of all possible emoji characters
12
+ EMOJI_CHARS = begin
13
+ raw_json = IO.read(File.join(GEM_ROOT, VENDOR_DATA))
14
+ vendordata = JSON.parse( raw_json )
15
+ vendordata.map { |em| EmojiChar.new(em) }
16
+ end
17
+
18
+ # precomputed hashmap for fast precached lookups in .from_unified
19
+ EMOJICHAR_UNIFIED_MAP = {}
20
+ EMOJI_CHARS.each do |ec|
21
+ EMOJICHAR_UNIFIED_MAP[ec.unified] = ec
22
+ ec.variations.each { |variant| EMOJICHAR_UNIFIED_MAP[variant] = ec }
23
+ end
24
+
25
+ # precomputed hashmap for fast precached lookups in .from_short_name
26
+ EMOJICHAR_KEYWORD_MAP = {}
27
+ EMOJI_CHARS.each do |ec|
28
+ ec.short_names.each { |keyword| EMOJICHAR_KEYWORD_MAP[keyword] = ec }
20
29
  end
21
30
 
31
+ # our constants are only for usage internally
32
+ private_constant :GEM_ROOT, :VENDOR_DATA
33
+ private_constant :EMOJI_CHARS, :EMOJICHAR_UNIFIED_MAP, :EMOJICHAR_KEYWORD_MAP
34
+
35
+
36
+ # Returns a list of all known Emoji characters as `EmojiChar` objects.
37
+ #
38
+ # @return [Array<EmojiChar>] a list of all known `EmojiChar`.
22
39
  def self.all
23
40
  EMOJI_CHARS
24
41
  end
25
42
 
43
+ # Returns a list of all `EmojiChar` that are represented with doublebyte
44
+ # encoding.
45
+ #
46
+ # @return [Array<EmojiChar>] a list of all doublebyte `EmojiChar`.
26
47
  def self.all_doublebyte
27
48
  EMOJI_CHARS.select(&:doublebyte?)
28
49
  end
29
50
 
51
+ # Returns a list of all `EmojiChar` that have at least one variant encoding.
52
+ #
53
+ # @return [Array<EmojiChar>] a list of all `EmojiChar` with variant encoding.
30
54
  def self.all_with_variants
31
55
  EMOJI_CHARS.select(&:variant?)
32
56
  end
33
57
 
34
- def self.chars(options={})
35
- options = {include_variants: false}.merge(options)
58
+ # Returns a list of all known Emoji characters rendered as UTF-8 strings.
59
+ #
60
+ # By default, the default rendering options for this library will be used.
61
+ # However, if you pass an option hash with `include_variants: true` then all
62
+ # possible renderings of a single glyph will be included, meaning that:
63
+ #
64
+ # 1. You will have "duplicate" emojis in your list.
65
+ # 2. This list is now suitable for exhaustably matching against in a search.
66
+ #
67
+ # @option opts [Boolean] :include_variants whether or not to include all
68
+ # possible encoding variants in the list
69
+ #
70
+ # @return [Array<String>] all Emoji characters rendered as UTF-8 strings
71
+ def self.chars(opts={})
72
+ options = {include_variants: false}.merge(opts)
36
73
 
37
- normals = EMOJI_CHARS.map { |c| c.char({variant_encoding: false}) }
38
- extras = self.all_with_variants.map { |c| c.char({variant_encoding: true}) }
74
+ normals = EMOJI_CHARS.map { |c| c.render({variant_encoding: false}) }
39
75
 
40
76
  if options[:include_variants]
77
+ extras = self.all_with_variants.map { |c| c.render({variant_encoding: true}) }
41
78
  return normals + extras
42
79
  end
43
80
  normals
44
81
  end
45
82
 
46
- def self.codepoints(options={})
47
- options = {include_variants: false}.merge(options)
83
+ # Returns a list of all known codepoints representing Emoji characters.
84
+ #
85
+ # @option (see .chars)
86
+ # @return [Array<String>] all codepoints represented as unified ID strings
87
+ def self.codepoints(opts={})
88
+ options = {include_variants: false}.merge(opts)
89
+
90
+ normals = EMOJI_CHARS.map(&:unified)
48
91
 
49
92
  if options[:include_variants]
50
- return EMOJI_CHARS.map(&:unified) + self.all_with_variants.map {|c| c.variant}
93
+ extras = self.all_with_variants.map {|c| c.variant}
94
+ return normals + extras
51
95
  end
52
- EMOJI_CHARS.map(&:unified)
96
+ normals
53
97
  end
54
98
 
99
+ # Convert a native UTF-8 string glyph to its unified codepoint ID.
100
+ #
101
+ # This is a conversion operation, not a match, so it may produce unexpected
102
+ # results with different types of values.
103
+ #
104
+ # @param char [String] a single rendered emoji glyph encoded as a UTF-8 string
105
+ # @return [String] the unified ID
106
+ #
107
+ # @example
108
+ # >> EmojiData.unified_to_char("1F47E")
109
+ # => "👾"
55
110
  def self.char_to_unified(char)
56
- char.codepoints.to_a.map {|i| i.to_s(16).rjust(4,'0')}.join('-').upcase
111
+ char.codepoints.to_a.map { |i| i.to_s(16).rjust(4,'0')}.join('-').upcase
57
112
  end
58
113
 
59
- def self.unified_to_char(cp)
60
- EmojiChar::unified_to_char(cp)
114
+ # Convert a unified codepoint ID directly to its UTF-8 string representation.
115
+ #
116
+ # @param uid [String] the unified codepoint ID for an emoji
117
+ # @return [String] UTF-8 string rendering of the emoji character
118
+ #
119
+ # @example
120
+ # >> EmojiData.char_to_unified("👾")
121
+ # => "1F47E"
122
+ def self.unified_to_char(uid)
123
+ EmojiChar::unified_to_char(uid)
61
124
  end
62
125
 
63
- def self.find_by_unified(cp)
64
- EMOJICHAR_UNIFIED_MAP[cp.upcase]
126
+ # Finds a specific `EmojiChar` based on its unified codepoint ID.
127
+ #
128
+ # @param uid [String] the unified codepoint ID for an emoji
129
+ # @return [EmojiChar]
130
+ def self.from_unified(uid)
131
+ EMOJICHAR_UNIFIED_MAP[uid.upcase]
65
132
  end
66
133
 
67
- FBS_REGEXP = Regexp.new("(?:#{EmojiData.chars({include_variants: true}).join("|")})")
68
- def self.find_by_str(str)
134
+ # precompile regex pattern for fast matches in `.scan`
135
+ # needs to be defined after self.chars so not at top of file for now...
136
+ FBS_REGEXP = Regexp.new(
137
+ "(?:#{EmojiData.chars({include_variants: true}).join("|")})"
138
+ )
139
+ private_constant :FBS_REGEXP
140
+
141
+ # Scans a string for all encoded emoji characters contained within.
142
+ #
143
+ # @param str [String] the target string to search
144
+ # @return [Array<EmojiChar>] all emoji characters contained within the target
145
+ # string, in the order they appeared.
146
+ #
147
+ # @example
148
+ # >> EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
149
+ # => [#<EmojiData::EmojiChar... @name="ROCKET", @unified="1F680", ...>,
150
+ # #<EmojiData::EmojiChar... @name="ALIEN MONSTER", @unified="1F47E", ...>]
151
+ def self.scan(str)
69
152
  matches = str.scan(FBS_REGEXP)
70
- matches.map { |m| EmojiData.find_by_unified(EmojiData.char_to_unified(m)) }
153
+ matches.map { |m| EmojiData.from_unified(EmojiData.char_to_unified(m)) }
71
154
  end
72
155
 
156
+ # Finds any `EmojiChar` that contains given string in its official name.
157
+ #
158
+ # @param name [String]
159
+ # @return [Array<EmojiChar>]
73
160
  def self.find_by_name(name)
74
161
  self.find_by_value(:name, name.upcase)
75
162
  end
76
163
 
164
+ # Find all `EmojiChar` that match string in any of their associated short
165
+ # name keywords.
166
+ #
167
+ # @param short_name [String]
168
+ # @return [Array<EmojiChar>]
77
169
  def self.find_by_short_name(short_name)
78
170
  self.find_by_value(:short_name, short_name.downcase)
79
171
  end
80
172
 
173
+ # Finds a specific `EmojiChar` based on the unified codepoint ID.
174
+ #
175
+ # Must be exact match.
176
+ #
177
+ # @param short_name [String]
178
+ # @return [EmojiChar]
179
+ def self.from_short_name(short_name)
180
+ EMOJICHAR_KEYWORD_MAP[short_name.downcase]
181
+ end
182
+
183
+ # alias old method names for legacy apps
184
+ class << self
185
+ alias_method :find_by_unified, :from_unified
186
+ alias_method :find_by_str, :scan
187
+ end
188
+
189
+
81
190
  protected
191
+
82
192
  def self.find_by_value(field,value)
83
193
  self.all.select { |char| char.send(field).include? value }
84
194
  end
@@ -1,13 +1,42 @@
1
1
  module EmojiData
2
2
 
3
+ # EmojiChar represents a single Emoji character and its associated metadata.
4
+ #
5
+ # @!attribute name
6
+ # @return [String] The standardized name used in the Unicode specification
7
+ # to represent this emoji character.
8
+ #
9
+ # @!attribute unified
10
+ # @return [String] The primary unified codepoint ID for the emoji character.
11
+ #
12
+ # @!attribute variations
13
+ # @return [Array<String>] A list of all variant codepoints that may also
14
+ # represent this emoji.
15
+ #
16
+ # @!attribute short_name
17
+ # @return [String] The canonical "short name" or keyword used in many
18
+ # systems to refer to this emoji. Often surrounded by `:colons:` in
19
+ # systems like GitHub & Campfire.
20
+ #
21
+ # @!attribute short_names
22
+ # @return [Array<String>] A full list of possible keywords for the emoji.
23
+ #
24
+ # @!attribute text
25
+ # @return [String] An alternate textual representation of the emoji, for
26
+ # example a smiley face emoji may be represented with an ASCII alternative.
27
+ # Most emoji do not have a text alternative. This is typically used when
28
+ # building an automatic translation from typed emoticons.
29
+ #
3
30
  class EmojiChar
31
+
4
32
  def initialize(emoji_hash)
5
33
  # work around inconsistency in emoji.json for now by just setting a blank
6
34
  # array for instance value, and let it get overriden in main
7
35
  # deserialization loop if variable is present.
8
36
  @variations = []
9
37
 
10
- # http://stackoverflow.com/questions/1615190/declaring-instance-variables-iterating-over-a-hash
38
+ # trick for declaring instance variables while iterating over a hash
39
+ # http://stackoverflow.com/questions/1615190/
11
40
  emoji_hash.each do |k,v|
12
41
  instance_variable_set("@#{k}",v)
13
42
  eigenclass = class<<self; self; end
@@ -15,51 +44,78 @@ module EmojiData
15
44
  end
16
45
  end
17
46
 
18
- # Returns a version of the character for rendering to screen.
47
+ # Renders an `EmojiChar` to its string glyph representation, suitable for
48
+ # printing to screen.
49
+ #
50
+ # @option opts [Boolean] :variant_encoding specify whether the variant
51
+ # encoding selector should be used to hint to rendering devices that
52
+ # "graphic" representation should be used. By default, we use this for all
53
+ # Emoji characters that contain a possible variant.
19
54
  #
20
- # By default this will now use the variant encoding if it exists.
21
- def char(options = {})
22
- options = {variant_encoding: true}.merge(options)
55
+ # @return [String] the emoji character rendered to a UTF-8 string
56
+ def render(opts = {})
57
+ options = {variant_encoding: true}.merge(opts)
23
58
  #decide whether to use the normal unified ID or the variant for encoding to str
24
59
  target = (self.variant? && options[:variant_encoding]) ? self.variant : @unified
25
60
  EmojiChar::unified_to_char(target)
26
61
  end
27
62
 
28
- # Return ALL known possible string encodings of the emoji char.
63
+ alias_method :to_s, :render
64
+ alias_method :char, :render
65
+
66
+ # Returns a list of all possible UTF-8 string renderings of an `EmojiChar`.
29
67
  #
30
- # Mostly useful for doing find operations when you need them all.
68
+ # E.g., normal, with variant selectors, etc. This is useful if you want to
69
+ # have all possible values to match against when searching for the emoji in
70
+ # a string representation.
71
+ #
72
+ # @return [Array<String>] all possible UTF-8 string renderings
31
73
  def chars
32
- results = [self.char({variant_encoding: false})]
74
+ results = [self.render({variant_encoding: false})]
33
75
  @variations.each do |variation|
34
76
  results << EmojiChar::unified_to_char(variation)
35
77
  end
36
78
  @chars ||= results
37
79
  end
38
80
 
39
- # Public: Is the character represented by a doublebyte unicode codepoint in unicode?
81
+ # Is the `EmojiChar` represented by a doublebyte codepoint in Unicode?
82
+ #
83
+ # @return [Boolean]
40
84
  def doublebyte?
41
- @unified.match(/-/)
85
+ @unified.include? "-"
42
86
  end
43
87
 
44
- # does the emojichar have an alternate variant encoding?
88
+ # Does the `EmojiChar` have an alternate Unicode variant encoding?
89
+ #
90
+ # @return [Boolean]
45
91
  def variant?
46
92
  @variations.length > 0
47
93
  end
48
94
 
49
- # return whatever is the most likely variant ID for the emojichar
50
- # for now, there can only be one, so just return first.
51
- # (in the future, there may be multiple variants, who knows!)
95
+ # Returns the most likely variant-encoding codepoint ID for an `EmojiChar`.
96
+ #
97
+ # For now we only know of one possible variant encoding for certain
98
+ # characters, but there could be others in the future.
99
+ #
100
+ # This is typically used to force Emoji rendering for characters that could
101
+ # be represented in standard font glyphs on certain operating systems.
102
+ #
103
+ # The resulting encoded string will be two codepoints, or three codepoints
104
+ # for doublebyte Emoji characters.
105
+ #
106
+ # @return [String, nil]
107
+ # The most likely variant-encoding codepoint ID.
108
+ # If there is no variant-encoding for a character, returns nil.
52
109
  def variant
53
110
  @variations.first
54
111
  end
55
112
 
56
- alias_method :to_s, :char
57
113
 
58
114
  protected
115
+
59
116
  def self.unified_to_char(cps)
60
117
  cps.split('-').map { |i| i.hex }.pack("U*")
61
118
  end
62
119
 
63
120
  end
64
-
65
121
  end
@@ -1,3 +1,4 @@
1
1
  module EmojiData
2
- VERSION = "0.1.0"
2
+ # Current version of the module, for bundling to rubygems.org
3
+ VERSION = "0.2.0.rc1"
3
4
  end
@@ -0,0 +1,70 @@
1
+ # encoding: UTF-8
2
+
3
+ require './lib/emoji_data'
4
+ require 'benchmark/ips'
5
+
6
+ suites = []
7
+
8
+ s0 = "I liek to eat cake oh so very much cake eating is nice!! #cake #food"
9
+ s1 = "🚀"
10
+ s2 = "flying on my 🚀 to visit the 👾 people."
11
+ s3 = "first a \u{0023}\u{FE0F}\u{20E3} then a 🚀"
12
+
13
+ suites << Benchmark.ips do |x|
14
+ x.config(:time => 1, :warmup => 0)
15
+ x.report("EmojiData.scan(s0)") { EmojiData.scan(s0) }
16
+ x.report("EmojiData.scan(s1)") { EmojiData.scan(s1) }
17
+ x.report("EmojiData.scan(s2)") { EmojiData.scan(s2) }
18
+ x.report("EmojiData.scan(s3)") { EmojiData.scan(s3) }
19
+ end
20
+
21
+
22
+ suites << Benchmark.ips do |x|
23
+ x.config(:time => 1, :warmup => 0)
24
+ x.report("EmojiData.all") { EmojiData.all() }
25
+ x.report("EmojiData.all_doublebyte") { EmojiData.all_doublebyte() }
26
+ x.report("EmojiData.all_with_variants") { EmojiData.all_with_variants() }
27
+ x.report("EmojiData.from_unified") { EmojiData.from_unified("1F680") }
28
+ x.report("EmojiData.chars") { EmojiData.chars() }
29
+ x.report("EmojiData.codepoints") { EmojiData.codepoints() }
30
+ x.report("EmojiData.find_by_name - many") { EmojiData.find_by_name("tree") }
31
+ x.report("EmojiData.find_by_name - none") { EmojiData.find_by_name("zzzz") }
32
+ x.report("EmojiData.find_by_short_name - many") { EmojiData.find_by_short_name("MOON") }
33
+ x.report("EmojiData.find_by_short_name - none") { EmojiData.find_by_short_name("zzzz") }
34
+ x.report("EmojiData.char_to_unified - single") { EmojiData.char_to_unified("🚀") }
35
+ x.report("EmojiData.char_to_unified - double") { EmojiData.char_to_unified("\u{2601}\u{FE0F}") }
36
+ x.report("EmojiData.unified_to_char - single") { EmojiData.unified_to_char("1F47E") }
37
+ x.report("EmojiData.unified_to_char - double") { EmojiData.unified_to_char("2764-fe0f") }
38
+ x.report("EmojiData.unified_to_char - triple") { EmojiData.unified_to_char("0030-FE0F-20E3") }
39
+ end
40
+
41
+
42
+ invader = EmojiData::EmojiChar.new({unified: '1F47E'})
43
+ usflag = EmojiData::EmojiChar.new({unified: '1F1FA-1F1F8'})
44
+ hourglass = EmojiData::EmojiChar.new({unified: '231B', variations: ['231B-FE0F']})
45
+ cloud = EmojiData::EmojiChar.new({unified: '2601', variations: ['2601-FE0F']})
46
+
47
+ suites << Benchmark.ips do |x|
48
+ x.config(:time => 1, :warmup => 0)
49
+ x.report("EmojiChar.render - single") { invader.render() }
50
+ x.report("EmojiChar.render - double") { usflag.render() }
51
+ x.report("EmojiChar.render - variant") { cloud.render({variant_encoding: true}) }
52
+ x.report("EmojiChar.chars") { cloud.chars() }
53
+ x.report("EmojiChar.doublebyte?") { invader.doublebyte?() }
54
+ x.report("EmojiChar.variant?") { invader.variant?() }
55
+ x.report("EmojiChar.variant") { invader.variant() }
56
+ end
57
+
58
+
59
+ def micros(hz)
60
+ 1_000_000 / hz
61
+ end
62
+
63
+ suites.each do |report|
64
+ results = report.entries.sort { |a,b| b.ips <=> a.ips }
65
+
66
+ print "\n"
67
+ results.each do |r|
68
+ printf "%-45s %10u %.2f µs/op\n", r.label, r.iterations, micros(r.ips)
69
+ end
70
+ end
@@ -38,22 +38,28 @@ describe EmojiChar do
38
38
  end
39
39
  end
40
40
 
41
- describe "#char" do
41
+ describe "#render" do
42
42
  it "should render as happy shiny unicode" do
43
- @invader.char.should eq("👾")
43
+ @invader.render.should eq("👾")
44
44
  end
45
45
  it "should render as happy shiny unicode for doublebyte chars too" do
46
- @usflag.char.should eq("🇺🇸")
46
+ @usflag.render.should eq("🇺🇸")
47
47
  end
48
48
  it "should have a flag to output forced emoji variant char encoding if requested" do
49
- @cloud.char( {variant_encoding: false}).should eq("\u{2601}")
50
- @cloud.char( {variant_encoding: true}).should eq("\u{2601}\u{FE0F}")
51
- @invader.char( {variant_encoding: false}).should eq("\u{1F47E}")
52
- @invader.char( {variant_encoding: true}).should eq("\u{1F47E}")
49
+ @cloud.render( {variant_encoding: false}).should eq("\u{2601}")
50
+ @cloud.render( {variant_encoding: true}).should eq("\u{2601}\u{FE0F}")
51
+ @invader.render( {variant_encoding: false}).should eq("\u{1F47E}")
52
+ @invader.render( {variant_encoding: true}).should eq("\u{1F47E}")
53
53
  end
54
54
  it "should default to variant encoding for chars with a variant present" do
55
- @cloud.char.should eq("\u{2601}\u{FE0F}")
56
- @hourglass.char.should eq("\u{231B}\u{FE0F}")
55
+ @cloud.render.should eq("\u{2601}\u{FE0F}")
56
+ @hourglass.render.should eq("\u{231B}\u{FE0F}")
57
+ end
58
+ end
59
+
60
+ describe "#char - DEPRECATED" do
61
+ it "should maintain compatibility with old method name for .render" do
62
+ @cloud.char.should eq(@cloud.render)
57
63
  end
58
64
  end
59
65
 
@@ -56,12 +56,12 @@ describe EmojiData do
56
56
  end
57
57
  end
58
58
 
59
- describe ".find_by_str" do
59
+ describe ".scan" do
60
60
  before(:all) do
61
- @exact_results = EmojiData.find_by_str("🚀")
62
- @multi_results = EmojiData.find_by_str("flying on my 🚀 to visit the 👾 people.")
63
- @variant_results = EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}")
64
- @variant_multi = EmojiData.find_by_str("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
61
+ @exact_results = EmojiData.scan("🚀")
62
+ @multi_results = EmojiData.scan("flying on my 🚀 to visit the 👾 people.")
63
+ @variant_results = EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}")
64
+ @variant_multi = EmojiData.scan("first a \u{0023}\u{FE0F}\u{20E3} then a 🚀")
65
65
  end
66
66
  it "should find the proper EmojiChar object from a single string char" do
67
67
  @exact_results.should be_kind_of(Array)
@@ -89,22 +89,34 @@ describe EmojiData do
89
89
  end
90
90
  end
91
91
 
92
- describe ".find_by_unified" do
92
+ describe ".find_by_str - DEPRECATED" do
93
+ it "should maintain compatibility with old method name for .scan" do
94
+ EmojiData.find_by_str("\u{0023}\u{FE0F}\u{20E3}").should eq(EmojiData.scan("\u{0023}\u{FE0F}\u{20E3}"))
95
+ end
96
+ end
97
+
98
+ describe ".from_unified" do
93
99
  it "should find the proper EmojiChar object" do
94
- results = EmojiData.find_by_unified('1f680')
100
+ results = EmojiData.from_unified('1f680')
95
101
  results.should be_kind_of(EmojiChar)
96
102
  results.name.should eq('ROCKET')
97
103
  end
98
104
  it "should normalise capitalization for hex values" do
99
- EmojiData.find_by_unified('1f680').should_not be_nil
105
+ EmojiData.from_unified('1f680').should_not be_nil
100
106
  end
101
107
  it "should find via variant encoding ID format as well" do
102
- results = EmojiData.find_by_unified('2764-fe0f')
108
+ results = EmojiData.from_unified('2764-fe0f')
103
109
  results.should_not be_nil
104
110
  results.name.should eq('HEAVY BLACK HEART')
105
111
  end
106
112
  end
107
113
 
114
+ describe ".find_by_unified - DEPRECATED" do
115
+ it "should maintain compatibility with old method name for .from_unified" do
116
+ EmojiData.find_by_unified('1f680').should eq(EmojiData.from_unified('1f680'))
117
+ end
118
+ end
119
+
108
120
  describe ".find_by_name" do
109
121
  it "returns an array of results, upcasing input if needed" do
110
122
  EmojiData.find_by_name('tree').should be_kind_of(Array)
@@ -129,6 +141,25 @@ describe EmojiData do
129
141
  end
130
142
  end
131
143
 
144
+ describe ".from_short_name" do
145
+ it "returns exact matches on a short name" do
146
+ results = EmojiData.from_short_name('scream')
147
+ results.should be_kind_of(EmojiChar)
148
+ results.name.should eq('FACE SCREAMING IN FEAR')
149
+ end
150
+ it "handles lowercasing input if required" do
151
+ EmojiData.from_short_name('SCREAM').should eq( EmojiData.from_short_name('scream') )
152
+ end
153
+ it "works on secondary keywords" do
154
+ primary = EmojiData.from_short_name('hankey')
155
+ EmojiData.from_short_name('poop').should eq(primary)
156
+ EmojiData.from_short_name('shit').should eq(primary)
157
+ end
158
+ it "returns nil if nothing matches" do
159
+ EmojiData.from_short_name('taco').should be_nil
160
+ end
161
+ end
162
+
132
163
  describe ".char_to_unified" do
133
164
  it "converts normal emoji to unified codepoint" do
134
165
  EmojiData.char_to_unified("👾").should eq('1F47E')
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: emoji_data
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0.rc1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matthew Rothenberg
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-03 00:00:00.000000000 Z
11
+ date: 2014-09-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -42,16 +42,16 @@ dependencies:
42
42
  name: rspec
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ">="
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '0'
47
+ version: 2.14.1
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ">="
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '0'
54
+ version: 2.14.1
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: simplecov
57
57
  requirement: !ruby/object:Gem::Requirement
@@ -80,6 +80,34 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: 0.7.0
83
+ - !ruby/object:Gem::Dependency
84
+ name: benchmark-ips
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: 2.0.0
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: 2.0.0
97
+ - !ruby/object:Gem::Dependency
98
+ name: yard
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - "~>"
102
+ - !ruby/object:Gem::Version
103
+ version: 0.8.7.4
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - "~>"
109
+ - !ruby/object:Gem::Version
110
+ version: 0.8.7.4
83
111
  description: Provides classes and helpers for dealing with emoji character data as
84
112
  unicode. Wraps a library of all known emoji characters and provides convenience
85
113
  methods.
@@ -90,17 +118,21 @@ extensions: []
90
118
  extra_rdoc_files: []
91
119
  files:
92
120
  - ".coveralls.yml"
121
+ - ".editorconfig"
122
+ - ".gitattributes"
93
123
  - ".gitignore"
94
124
  - ".travis.yml"
125
+ - ".yardopts"
95
126
  - CHANGELOG.md
96
127
  - Gemfile
97
- - LICENSE.txt
128
+ - LICENSE
98
129
  - README.md
99
130
  - Rakefile
100
131
  - emoji_data.gemspec
101
132
  - lib/emoji_data.rb
102
133
  - lib/emoji_data/emoji_char.rb
103
134
  - lib/emoji_data/version.rb
135
+ - scripts/benchmark.rb
104
136
  - spec/emoji_char_spec.rb
105
137
  - spec/emoji_data_spec.rb
106
138
  - spec/spec_helper.rb
@@ -118,12 +150,12 @@ required_ruby_version: !ruby/object:Gem::Requirement
118
150
  requirements:
119
151
  - - ">="
120
152
  - !ruby/object:Gem::Version
121
- version: 1.9.2
153
+ version: 1.9.3
122
154
  required_rubygems_version: !ruby/object:Gem::Requirement
123
155
  requirements:
124
- - - ">="
156
+ - - ">"
125
157
  - !ruby/object:Gem::Version
126
- version: '0'
158
+ version: 1.3.1
127
159
  requirements: []
128
160
  rubyforge_project:
129
161
  rubygems_version: 2.2.2
@@ -134,3 +166,4 @@ test_files:
134
166
  - spec/emoji_char_spec.rb
135
167
  - spec/emoji_data_spec.rb
136
168
  - spec/spec_helper.rb
169
+ has_rdoc: