gemoji-parser 1.0.0 → 1.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +84 -27
- data/gemoji-parser.gemspec +1 -2
- data/lib/gemoji-parser.rb +251 -23
- data/lib/gemoji-parser/version.rb +1 -1
- data/spec/emoji_parser_spec.rb +222 -0
- metadata +6 -6
- data/spec/emoji_helper_spec.rb +0 -88
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3621544eb0c0dfe923ad4639f4ec51fefbf42cdc
|
4
|
+
data.tar.gz: 412e5b97b12b82fd2fcb1ae4a95546bb0eac8bff
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: dec9825da6d1d409f98c5afb6cce0363bc00c0ee34b5fa5f6b0ed3c8faca12a1228b7d20b10e75174ed36621756075ba6cb145b766acecd2bb0756fc12d57d5f
|
7
|
+
data.tar.gz: 67e7e574e303eda0b25f2fb1dca6227802afc86210827df428d9d5423b1cf98a92e0abba5788cc156a0f89a9d12281cb634d459832f3e3401af0b7cff555273d
|
data/README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# gemoji-parser
|
2
2
|
|
3
|
-
The missing helper methods for [GitHub's
|
3
|
+
The missing helper methods for [GitHub's gemoji](https://github.com/github/gemoji) gem. This utility provides a parsing API for the `Emoji` corelib (provided by *gemoji*). The parser handles transformations of emoji symbols between unicode (😃), token (`:smile:`), and emoticon (`:-D`) formats; and may perform arbitrary replacement of emoji symbols into custom display formats (such as image tags). Internally, highly-optimized regular expressions are generated and cached to maximize parsing efficiency.
|
4
4
|
|
5
5
|
## Installation
|
6
6
|
|
@@ -12,7 +12,7 @@ gem 'gemoji-parser'
|
|
12
12
|
|
13
13
|
And then execute:
|
14
14
|
|
15
|
-
$ bundle
|
15
|
+
$ bundle install
|
16
16
|
|
17
17
|
Or install it yourself as:
|
18
18
|
|
@@ -20,14 +20,13 @@ Or install it yourself as:
|
|
20
20
|
|
21
21
|
To run tests:
|
22
22
|
|
23
|
-
|
23
|
+
$ bundle exec rake spec
|
24
24
|
|
25
25
|
## Usage
|
26
26
|
|
27
|
-
|
28
27
|
### Tokenizing
|
29
28
|
|
30
|
-
|
29
|
+
The tokenizer methods perform basic conversions of unicode symbols into token symbols, and vice versa.
|
31
30
|
|
32
31
|
```ruby
|
33
32
|
EmojiParser.tokenize("Test 🙈 🙊 🙉")
|
@@ -37,56 +36,114 @@ EmojiParser.detokenize("Test :see_no_evil: :speak_no_evil: :hear_no_evil:")
|
|
37
36
|
# "Test 🙈 🙊 🙉"
|
38
37
|
```
|
39
38
|
|
40
|
-
###
|
39
|
+
### Symbol Parsing
|
41
40
|
|
42
|
-
|
41
|
+
Use the symbol parser methods for custom transformations. All symbol parsers yield [Emoji::Character](https://github.com/github/gemoji/blob/master/lib/emoji/character.rb) instances into the parsing block for custom formatting.
|
43
42
|
|
44
43
|
**Unicode symbols**
|
45
44
|
|
46
45
|
```ruby
|
47
|
-
EmojiParser.parse_unicode(
|
46
|
+
EmojiParser.parse_unicode("Test 🐠") do |emoji|
|
48
47
|
%Q(<img src="#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
|
49
48
|
end
|
50
49
|
|
51
|
-
# 'Test <img src="unicode/
|
50
|
+
# 'Test <img src="unicode/1f420.png" alt=":tropical_fish:">'
|
52
51
|
```
|
53
52
|
|
54
53
|
**Token symbols**
|
55
54
|
|
56
55
|
```ruby
|
57
|
-
EmojiParser.parse_tokens(
|
56
|
+
EmojiParser.parse_tokens("Test :tropical_fish:") do |emoji|
|
58
57
|
%Q(<img src="#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
|
59
58
|
end
|
60
59
|
|
61
|
-
# 'Test <img src="unicode/
|
60
|
+
# 'Test <img src="unicode/1f420.png" alt=":tropical_fish:">'
|
62
61
|
```
|
63
62
|
|
64
|
-
**
|
63
|
+
**Emoticon symbols**
|
65
64
|
|
66
65
|
```ruby
|
67
|
-
EmojiParser.
|
66
|
+
EmojiParser.parse_emoticons("Test ;-)") do |emoji|
|
67
|
+
%Q(<img src="#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
|
68
|
+
end
|
68
69
|
|
69
|
-
# 'Test
|
70
|
+
# 'Test <img src="unicode/1f609.png" alt=":wink:">'
|
70
71
|
```
|
71
72
|
|
72
|
-
|
73
|
+
**All symbol types**
|
73
74
|
|
74
|
-
|
75
|
+
Use the `parse` method to target all symbol types with a single parsing pass. Specific symbol types may be excluded using options:
|
75
76
|
|
76
77
|
```ruby
|
77
|
-
|
78
|
-
|
79
|
-
|
78
|
+
EmojiParser.parse("Test 🐠 :scream: ;-)") { |emoji| "[#{emoji.name}]" }
|
79
|
+
# 'Test [tropical_fish] [scream] [wink]'
|
80
|
+
|
81
|
+
EmojiParser.parse("Test 🐠 :scream: ;-)", emoticons: false) do |emoji|
|
82
|
+
"[#{emoji.name}]"
|
83
|
+
end
|
84
|
+
# 'Test [tropical_fish] [scream] ;-)'
|
80
85
|
```
|
81
86
|
|
82
|
-
|
87
|
+
While the `parse` method is heavier to run than the discrete parsing methods for each symbol type (`parse_unicode`, `parse_tokens`, etc...), it has the advantage of avoiding multiple parsing passes. This is handy if you want parsed symbols to output new symbols in a different format, such as generating image tags that include a symbol in their alt text:
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
EmojiParser.parse("Test 🐠 ;-)") do |emoji|
|
91
|
+
%Q(<img src="#{emoji.image_filename}" alt=":#{emoji.name}:">).html_safe
|
92
|
+
end
|
93
|
+
|
94
|
+
# 'Test <img src="unicode/1f420.png" alt=":tropical_fish:"> <img src="unicode/1f609.png" alt=":wink:">'
|
95
|
+
```
|
96
|
+
|
97
|
+
### Lookups & File Paths
|
98
|
+
|
99
|
+
Use the `find` method to derive [Emoji::Character](https://github.com/github/gemoji/blob/master/lib/emoji/character.rb) instances from any symbol format (unicode, token, emoticon):
|
100
|
+
|
101
|
+
```ruby
|
102
|
+
emoji = EmojiParser.find(🐠)
|
103
|
+
emoji = EmojiParser.find('see_no_evil')
|
104
|
+
emoji = EmojiParser.find(';-)')
|
105
|
+
```
|
106
|
+
|
107
|
+
Use the `image_path` helper to derive an image filepath from any symbol format (unicode, token, emoticon). You may optionally provide a custom path that overrides the *gemoji* default location (this is useful if you'd like to reference your images from a CDN):
|
83
108
|
|
84
|
-
|
109
|
+
```ruby
|
110
|
+
EmojiParser.image_path('tropical_fish')
|
111
|
+
# "unicode/1f420.png"
|
112
|
+
|
113
|
+
EmojiParser.image_path('tropical_fish', '//cdn.fu/emoji/')
|
114
|
+
# "//cdn.fu/emoji/1f420.png"
|
115
|
+
```
|
116
|
+
|
117
|
+
## Custom Symbols
|
118
|
+
|
119
|
+
**Emoji**
|
120
|
+
|
121
|
+
The parser plays nicely with custom emoji defined through the *gemoji* core. You just need to call `rehash!` once after adding new emoji symbols to regenerate the parser's regex cache:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
Emoji.create('boxing_kangaroo') # << WHY IS THIS NOT STANDARD?!
|
125
|
+
EmojiParser.rehash!
|
126
|
+
```
|
127
|
+
|
128
|
+
**Emoticons**
|
129
|
+
|
130
|
+
Emoticon patterns are defined through the parser, and are simply mapped to an emoji name that exists within the *gemoji* core (this can be a standard emoji, or a custom emoji that you have added). To see default emoticons, inspect the `EmojiParser.emoticons` hash. For custom emoticons:
|
131
|
+
|
132
|
+
```ruby
|
133
|
+
# Alias a standard emoji:
|
134
|
+
EmojiParser.emoticons[':@'] = :angry
|
135
|
+
|
136
|
+
# Create a custom emoji, and alias it:
|
137
|
+
Emoji.create('bill_clinton')
|
138
|
+
EmojiParser.emoticons['=:o]'] = :bill_clinton
|
139
|
+
|
140
|
+
# IMPORTANT:
|
141
|
+
# Rehash once after adding new symbols to Emoji core, or to the EmojiParser:
|
142
|
+
EmojiParser.rehash!
|
143
|
+
```
|
144
|
+
|
145
|
+
## Shoutout
|
85
146
|
|
86
|
-
|
147
|
+
Thanks to the GitHub team for the [gemoji](https://github.com/github/gemoji) gem, and my esteemed colleague Michael Lovitt for the fantastic [Rubular](http://rubular.com/) regex tool (it has been invaluable for this project).
|
87
148
|
|
88
|
-
|
89
|
-
2. Create your feature branch (`git checkout -b my-new-feature`)
|
90
|
-
3. Commit your changes (`git commit -am 'Add some feature'`)
|
91
|
-
4. Push to the branch (`git push origin my-new-feature`)
|
92
|
-
5. Create a new Pull Request
|
149
|
+
🙈 🙊 🙉
|
data/gemoji-parser.gemspec
CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |s|
|
|
9
9
|
s.authors = ["Greg MacWilliam"]
|
10
10
|
s.email = ["greg.macwilliam@voxmedia.com"]
|
11
11
|
s.summary = %q{The missing helper methods for GitHub's Gemoji gem.}
|
12
|
-
s.description = %q{
|
12
|
+
s.description = %q{Expands GitHub Gemoji to parse unicode and token emoji symbols into custom formats.}
|
13
13
|
s.homepage = "https://github.com/gmac/gemoji-parser"
|
14
14
|
s.license = "MIT"
|
15
15
|
|
@@ -19,7 +19,6 @@ Gem::Specification.new do |s|
|
|
19
19
|
s.require_paths = ["lib"]
|
20
20
|
|
21
21
|
s.required_ruby_version = '> 1.9'
|
22
|
-
|
23
22
|
s.add_dependency "gemoji", ">= 2.1.0"
|
24
23
|
s.add_development_dependency "bundler", "~> 1.6"
|
25
24
|
s.add_development_dependency "rake", "~> 10.0"
|
data/lib/gemoji-parser.rb
CHANGED
@@ -4,46 +4,215 @@ require 'gemoji'
|
|
4
4
|
module EmojiParser
|
5
5
|
extend self
|
6
6
|
|
7
|
-
#
|
8
|
-
#
|
9
|
-
|
10
|
-
|
11
|
-
|
7
|
+
# Emoticons
|
8
|
+
# ---------
|
9
|
+
# The base emoticons set (below) is generated with "noseless" variants, ie: :-) and :)
|
10
|
+
# The generated `EmojiParser.emoticons` hash is formatted as:
|
11
|
+
# ---
|
12
|
+
# > {
|
13
|
+
# > ":-)" => :blush,
|
14
|
+
# > ":)" => :blush,
|
15
|
+
# > ":-D" => :smile,
|
16
|
+
# > ":D" => :smile,
|
17
|
+
# > }
|
18
|
+
#
|
19
|
+
# This base set is selected for commonality and high degrees of author intention.
|
20
|
+
# If you want more/different emoticons:
|
21
|
+
# - Please DO customize the `EmojiParser.emoticons` hash in your app runtime.
|
22
|
+
# - Please DO NOT customize this source code and issue a pull request.
|
23
|
+
#
|
24
|
+
# To add an emoticon:
|
25
|
+
# ---
|
26
|
+
# > EmojiParser.emoticons[':-$'] = :grimacing
|
27
|
+
# > EmojiParser.rehash!
|
28
|
+
#
|
29
|
+
# To remove an emoticon:
|
30
|
+
# ---
|
31
|
+
# > EmojiParser.emoticons.delete(':-$')
|
32
|
+
# > EmojiParser.rehash!
|
33
|
+
#
|
34
|
+
# NOTE: call `rehash!` after making changes to Emoji/emoticon sets.
|
35
|
+
# Rehashing updates the parser's regex cache with the latest icons.
|
36
|
+
#
|
37
|
+
def emoticons
|
38
|
+
return @emoticons if defined? @emoticons
|
39
|
+
@emoticons = {}
|
40
|
+
emoticons = {
|
41
|
+
angry: ">:-(",
|
42
|
+
blush: ":-)",
|
43
|
+
cry: ":'(",
|
44
|
+
confused: [":-\\", ":-/"],
|
45
|
+
disappointed: ":-(",
|
46
|
+
kiss: ":-*",
|
47
|
+
neutral_face: ":-|",
|
48
|
+
monkey_face: ":o)",
|
49
|
+
open_mouth: ":-o",
|
50
|
+
smiley: "=-)",
|
51
|
+
smile: ":-D",
|
52
|
+
stuck_out_tongue: [":-p", ":-P", ":-b"],
|
53
|
+
stuck_out_tongue_winking_eye: [";-p", ";-P", ";-b"],
|
54
|
+
wink: ";-)"
|
55
|
+
}
|
56
|
+
|
57
|
+
# Parse all named patterns into a flat hash table,
|
58
|
+
# where pattern is the key and its token is the value.
|
59
|
+
# all patterns are duplicated with the "noseless" variants, ie: :-) and :)
|
60
|
+
emoticons.each_pair do |name, patterns|
|
61
|
+
patterns = [patterns] unless patterns.is_a?(Array)
|
62
|
+
patterns.each do |pattern|
|
63
|
+
@emoticons[pattern] = name
|
64
|
+
@emoticons[pattern.sub(/(?<=:|;|=)-/, '')] = name
|
65
|
+
end
|
66
|
+
end
|
67
|
+
|
68
|
+
@emoticons
|
69
|
+
end
|
70
|
+
|
71
|
+
attr_writer :emoticons
|
72
|
+
|
73
|
+
# Rehashes all cached regular expressions.
|
74
|
+
# IMPORTANT: call this once after changing emoji characters or emoticon patterns.
|
75
|
+
def rehash!
|
76
|
+
unicode_regex(rehash: true)
|
77
|
+
token_regex(rehash: true)
|
78
|
+
emoticon_regex(rehash: true)
|
79
|
+
end
|
80
|
+
|
81
|
+
# Creates an optimized regular expression for matching unicode symbols.
|
82
|
+
# - Options: rehash:boolean
|
83
|
+
def unicode_regex(opts={})
|
84
|
+
return @unicode_regex if defined?(@unicode_regex) && !opts[:rehash]
|
85
|
+
pattern = []
|
12
86
|
|
13
87
|
Emoji.all.each do |emoji|
|
14
88
|
u = emoji.unicode_aliases.map do |str|
|
15
89
|
str.codepoints.map { |c| '\u{%s}' % c.to_s(16).rjust(4, '0') }.join('')
|
16
90
|
end
|
17
|
-
#
|
18
|
-
|
91
|
+
# Simple method: x10 slower!
|
92
|
+
# pattern.concat u.sort! { |a, b| b.length - a.length }
|
93
|
+
pattern << unicode_matcher(u) if u.any?
|
94
|
+
end
|
95
|
+
|
96
|
+
@unicode_pattern = pattern.join('|')
|
97
|
+
@unicode_regex = Regexp.new("(#{@unicode_pattern})")
|
98
|
+
end
|
99
|
+
|
100
|
+
# Creates a regular expression for matching token symbols.
|
101
|
+
# - Options: rehash:boolean (currently unused)
|
102
|
+
def token_regex(opts={})
|
103
|
+
return @token_regex if defined?(@token_regex)
|
104
|
+
@token_pattern = ':([\w+-]+):'
|
105
|
+
@token_regex = Regexp.new(@token_pattern)
|
106
|
+
end
|
107
|
+
|
108
|
+
# Creates an optimized regular expression for matching emoticon symbols.
|
109
|
+
# - Options: rehash:boolean
|
110
|
+
def emoticon_regex(opts={})
|
111
|
+
return @emoticon_regex if defined?(@emoticon_regex) && !opts[:rehash]
|
112
|
+
pattern = {}
|
113
|
+
|
114
|
+
emoticons.keys.each do |icon|
|
115
|
+
compact_icon = icon.gsub('-', '')
|
116
|
+
|
117
|
+
# Check to see if this icon has a compact version, ex: :-) versus :)
|
118
|
+
# One expression will match as many nose/noseless variants as possible.
|
119
|
+
if compact_icon != icon && emoticons[compact_icon]
|
120
|
+
compact_regex = Regexp.escape(icon).gsub('-', '-?')
|
121
|
+
|
122
|
+
# Keep this expression if it hasn't been defined yet,
|
123
|
+
# or if it's longer than a previously defined pattern.
|
124
|
+
if !pattern[compact_icon] || pattern[compact_icon].length < compact_regex.length
|
125
|
+
pattern[compact_icon] = compact_regex
|
126
|
+
end
|
127
|
+
elsif !pattern[icon]
|
128
|
+
pattern[icon] = Regexp.escape(icon)
|
129
|
+
end
|
130
|
+
end
|
131
|
+
|
132
|
+
@emoticon_pattern = "(?<=^|\\s)(?:#{ pattern.values.join('|') })(?=\\s|$)"
|
133
|
+
@emoticon_regex = Regexp.new("(#{@emoticon_pattern})")
|
134
|
+
end
|
135
|
+
|
136
|
+
# Generates a macro regex for matching one or more symbol sets.
|
137
|
+
# Regex uses various formats, based on symbol sets. Yields match as $1 OR $2
|
138
|
+
# T/EU: (token-$1)|(emoticon-unicode-$2)
|
139
|
+
# T/E or T/U: (token-$1)|(emoticon/unicode-$2)
|
140
|
+
# EU: (emoticon/unicode-$1)
|
141
|
+
# - Options: unicode:boolean, tokens:boolean, emoticons:boolean
|
142
|
+
def macro_regex(opts={})
|
143
|
+
unicode_regex if opts[:unicode]
|
144
|
+
token_regex if opts[:tokens]
|
145
|
+
emoticon_regex if opts[:emoticons]
|
146
|
+
pattern = []
|
147
|
+
|
148
|
+
if opts[:emoticons] && opts[:unicode]
|
149
|
+
pattern << "(?:#{ @emoticon_pattern })"
|
150
|
+
pattern << @unicode_pattern
|
151
|
+
else
|
152
|
+
pattern << @emoticon_pattern if opts[:emoticons]
|
153
|
+
pattern << @unicode_pattern if opts[:unicode]
|
154
|
+
end
|
155
|
+
|
156
|
+
pattern = pattern.any? ? "(#{ pattern.join('|') })" : ""
|
157
|
+
|
158
|
+
if opts[:tokens]
|
159
|
+
if pattern.empty?
|
160
|
+
pattern = @token_pattern
|
161
|
+
else
|
162
|
+
pattern = "(?:#{ @token_pattern })|#{ pattern }"
|
163
|
+
end
|
19
164
|
end
|
20
165
|
|
21
|
-
|
166
|
+
Regexp.new(pattern)
|
22
167
|
end
|
23
168
|
|
24
|
-
# Parses all unicode
|
25
|
-
#
|
169
|
+
# Parses all unicode symbols within a string.
|
170
|
+
# - Block: performs all symbol transformations.
|
26
171
|
def parse_unicode(text)
|
27
|
-
text.gsub(
|
172
|
+
text.gsub(unicode_regex) do |match|
|
28
173
|
emoji = Emoji.find_by_unicode($1)
|
29
174
|
block_given? && emoji ? yield(emoji) : match
|
30
175
|
end
|
31
176
|
end
|
32
177
|
|
33
|
-
# Parses all
|
34
|
-
#
|
178
|
+
# Parses all token symbols within a string.
|
179
|
+
# - Block: performs all symbol transformations.
|
35
180
|
def parse_tokens(text)
|
36
|
-
text.gsub(
|
37
|
-
emoji = Emoji.find_by_alias($1
|
181
|
+
text.gsub(token_regex) do |match|
|
182
|
+
emoji = Emoji.find_by_alias($1)
|
38
183
|
block_given? && emoji ? yield(emoji) : match
|
39
184
|
end
|
40
185
|
end
|
41
186
|
|
42
|
-
# Parses all
|
43
|
-
#
|
44
|
-
def
|
45
|
-
text
|
46
|
-
|
187
|
+
# Parses all emoticon symbols within a string.
|
188
|
+
# - Block: performs all symbol transformations.
|
189
|
+
def parse_emoticons(text)
|
190
|
+
text.gsub(emoticon_regex) do |match|
|
191
|
+
if emoticons.has_key?($1)
|
192
|
+
emoji = Emoji.find_by_alias(emoticons[$1].to_s)
|
193
|
+
block_given? && emoji ? yield(emoji) : match
|
194
|
+
else
|
195
|
+
match
|
196
|
+
end
|
197
|
+
end
|
198
|
+
end
|
199
|
+
|
200
|
+
# Parses all emoji unicode, tokens, and emoticons within a string.
|
201
|
+
# - Block: performs all symbol transformations.
|
202
|
+
# - Options: unicode:boolean, tokens:boolean, emoticons:boolean
|
203
|
+
def parse(text, opts={})
|
204
|
+
opts = { unicode: true, tokens: true, emoticons: true }.merge(opts)
|
205
|
+
if opts.one?
|
206
|
+
return parse_unicode(text) { |e| yield e } if opts[:unicode]
|
207
|
+
return parse_tokens(text) { |e| yield e } if opts[:tokens]
|
208
|
+
return parse_emoticons(text) { |e| yield e } if opts[:emoticons]
|
209
|
+
end
|
210
|
+
text.gsub(macro_regex(opts)) do |match|
|
211
|
+
a = defined?($1) ? $1 : nil
|
212
|
+
b = defined?($2) ? $2 : nil
|
213
|
+
emoji = find(a || b)
|
214
|
+
block_given? && emoji ? yield(emoji) : match
|
215
|
+
end
|
47
216
|
end
|
48
217
|
|
49
218
|
# Transforms all unicode emoji into token strings.
|
@@ -56,8 +225,67 @@ module EmojiParser
|
|
56
225
|
parse_tokens(text) { |emoji| emoji.raw }
|
57
226
|
end
|
58
227
|
|
59
|
-
#
|
60
|
-
|
61
|
-
|
228
|
+
# Finds an Emoji::Character instance for an unknown symbol type.
|
229
|
+
# - symbol: an <Emoji::Character>, or a unicode/token/emoticon string.
|
230
|
+
def find(symbol)
|
231
|
+
return symbol if (symbol.is_a?(Emoji::Character))
|
232
|
+
symbol = emoticons[symbol].to_s if emoticons.has_key?(symbol)
|
233
|
+
Emoji.find_by_alias(symbol) || Emoji.find_by_unicode(symbol) || nil
|
234
|
+
end
|
235
|
+
|
236
|
+
# Gets the image file reference for a symbol; optionally with a custom path.
|
237
|
+
# - symbol: an <Emoji::Character>, or a unicode/token/emoticon string.
|
238
|
+
# - path: a file path to sub into symbol's filename.
|
239
|
+
def image_path(symbol, path=nil)
|
240
|
+
emoji = find(symbol)
|
241
|
+
return nil unless emoji
|
242
|
+
return emoji.image_filename unless path
|
243
|
+
"#{ path.sub(/\/$/, '') }/#{ emoji.image_filename.split('/').pop }"
|
244
|
+
end
|
245
|
+
|
246
|
+
private
|
247
|
+
|
248
|
+
# Compiles an optimized unicode pattern for fast matching.
|
249
|
+
# Matchers use as small a base as possible, with added options. Ex:
|
250
|
+
# 1-char base \w option: \u{1f6a9}\u{fe0f}?
|
251
|
+
# 2-char base \w option: \u{1f1ef}\u{1f1f5}\u{fe0f}?
|
252
|
+
# 1-char base \w options: \u{0031}(?:\u{fe0f}\u{20e3}|\u{20e3}\u{fe0f})?
|
253
|
+
def unicode_matcher(patterns)
|
254
|
+
return patterns.first if patterns.length == 1
|
255
|
+
|
256
|
+
# Sort patterns, longest to shortest:
|
257
|
+
patterns.sort! { |a, b| b.length - a.length }
|
258
|
+
|
259
|
+
# Select a base pattern:
|
260
|
+
# this is the shortest prefix contained by all patterns.
|
261
|
+
base = patterns.last
|
262
|
+
|
263
|
+
if patterns.all? { |p| p.start_with?(base) }
|
264
|
+
base = patterns.pop
|
265
|
+
else
|
266
|
+
base = base.match(/\\u\{.+?\}/).to_s
|
267
|
+
base = nil unless patterns.all? { |p| p.start_with?(base) }
|
268
|
+
end
|
269
|
+
|
270
|
+
# Collect base options and/or alternate patterns:
|
271
|
+
opts = []
|
272
|
+
alts = []
|
273
|
+
patterns.each do |pattern|
|
274
|
+
if base && pattern.start_with?(base)
|
275
|
+
opts << pattern.sub(base, '')
|
276
|
+
else
|
277
|
+
alts << pattern
|
278
|
+
end
|
279
|
+
end
|
280
|
+
|
281
|
+
# Format base options:
|
282
|
+
if opts.length == 1
|
283
|
+
base += "#{ opts.first }?"
|
284
|
+
elsif opts.length > 1
|
285
|
+
base += "(?:#{ opts.join('|') })?"
|
286
|
+
end
|
287
|
+
|
288
|
+
alts << base if base
|
289
|
+
alts.join('|')
|
62
290
|
end
|
63
291
|
end
|
@@ -0,0 +1,222 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
require 'gemoji-parser'
|
3
|
+
|
4
|
+
describe EmojiParser do
|
5
|
+
let(:test_unicode) { 'Test 🙈 🙊 🙉 😰 :invalid: 🐠. :o)' }
|
6
|
+
let(:test_mixed) { 'Test 🙈 🙊 🙉 :cold_sweat: :invalid: :tropical_fish:. :o)' }
|
7
|
+
let(:test_tokens) { 'Test :see_no_evil: :speak_no_evil: :hear_no_evil: :cold_sweat: :invalid: :tropical_fish:. :o)' }
|
8
|
+
let(:test_emoticons) { ';-) Test (:cold_sweat:) :) :-D' }
|
9
|
+
let(:test_custom) { Emoji.create('custom') }
|
10
|
+
|
11
|
+
describe '#emoticons' do
|
12
|
+
it 'should provide a hash with emoticons and their tokens as key/value pairs.' do
|
13
|
+
expect(EmojiParser.emoticons[':o)']).to eq :monkey_face
|
14
|
+
end
|
15
|
+
end
|
16
|
+
|
17
|
+
describe '#unicode_regex' do
|
18
|
+
it 'generates once and remains cached.' do
|
19
|
+
first = EmojiParser.unicode_regex
|
20
|
+
second = EmojiParser.unicode_regex
|
21
|
+
expect(first).to be second
|
22
|
+
end
|
23
|
+
|
24
|
+
it 'regenerates when called with a :rehash option.' do
|
25
|
+
first = EmojiParser.unicode_regex
|
26
|
+
second = EmojiParser.unicode_regex(rehash: true)
|
27
|
+
expect(first).not_to be second
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
describe '#token_regex' do
|
32
|
+
it 'generates once and remains cached.' do
|
33
|
+
first = EmojiParser.token_regex
|
34
|
+
second = EmojiParser.token_regex
|
35
|
+
expect(first).to be second
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
describe '#emoticon_regex' do
|
40
|
+
it 'generates once and remains cached.' do
|
41
|
+
first = EmojiParser.emoticon_regex
|
42
|
+
second = EmojiParser.emoticon_regex
|
43
|
+
expect(first).to be second
|
44
|
+
end
|
45
|
+
|
46
|
+
it 'regenerates when called with a :rehash option.' do
|
47
|
+
first = EmojiParser.emoticon_regex
|
48
|
+
second = EmojiParser.emoticon_regex(rehash: true)
|
49
|
+
expect(first).not_to be second
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
describe '#parse_unicode' do
|
54
|
+
it 'successfully parses full Gemoji unicode set.' do
|
55
|
+
Emoji.all.each do |emoji|
|
56
|
+
emoji.unicode_aliases.each do |u|
|
57
|
+
parsed = EmojiParser.parse_unicode("Test #{u}") { |e| 'X' }
|
58
|
+
expect(parsed).to eq "Test X"
|
59
|
+
end
|
60
|
+
end
|
61
|
+
end
|
62
|
+
|
63
|
+
it 'replaces all valid unicode symbols via block transformation.' do
|
64
|
+
parsed = EmojiParser.parse_unicode(test_mixed) { |e| 'X' }
|
65
|
+
expect(parsed).to eq 'Test X X X :cold_sweat: :invalid: :tropical_fish:. :o)'
|
66
|
+
end
|
67
|
+
end
|
68
|
+
|
69
|
+
describe '#parse_tokens' do
|
70
|
+
it 'successfully parses full Gemoji name set.' do
|
71
|
+
Emoji.all.each do |emoji|
|
72
|
+
parsed = EmojiParser.parse_tokens("Test :#{emoji.name}:") { |e| 'X' }
|
73
|
+
expect(parsed).to eq "Test X"
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
it 'replaces all valid token symbols via block transformation.' do
|
78
|
+
parsed = EmojiParser.parse_tokens(test_tokens) { |e| 'X' }
|
79
|
+
expect(parsed).to eq 'Test X X X X :invalid: X. :o)'
|
80
|
+
end
|
81
|
+
end
|
82
|
+
|
83
|
+
describe '#parse_emoticons' do
|
84
|
+
it 'successfully parses full default emoticon set.' do
|
85
|
+
EmojiParser.emoticons.each_key do |emoticon|
|
86
|
+
parsed = EmojiParser.parse_emoticons("Test #{emoticon}") { |e| 'X' }
|
87
|
+
expect(parsed).to eq "Test X"
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
it 'replaces all valid emoticon symbols via block transformation.' do
|
92
|
+
parsed = EmojiParser.parse_emoticons(test_emoticons) { |e| 'X' }
|
93
|
+
expect(parsed).to eq 'X Test (:cold_sweat:) X X'
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
describe '#parse' do
|
98
|
+
it 'replaces valid symbols of all types via block transformation.' do
|
99
|
+
parsed = EmojiParser.parse(test_mixed) { |e| 'X' }
|
100
|
+
expect(parsed).to eq 'Test X X X X :invalid: X. X'
|
101
|
+
end
|
102
|
+
|
103
|
+
it 'replaces valid symbols of specified types (unicode, tokens).' do
|
104
|
+
parsed = EmojiParser.parse(test_mixed, emoticons: false) { |e| 'X' }
|
105
|
+
expect(parsed).to eq 'Test X X X X :invalid: X. :o)'
|
106
|
+
end
|
107
|
+
|
108
|
+
it 'replaces valid symbols of specified types (unicode, emoticons).' do
|
109
|
+
parsed = EmojiParser.parse(test_mixed, tokens: false) { |e| 'X' }
|
110
|
+
expect(parsed).to eq 'Test X X X :cold_sweat: :invalid: :tropical_fish:. X'
|
111
|
+
end
|
112
|
+
|
113
|
+
it 'replaces valid symbols of specified types (tokens, emoticons).' do
|
114
|
+
parsed = EmojiParser.parse(test_mixed, unicode: false) { |e| 'X' }
|
115
|
+
expect(parsed).to eq 'Test 🙈 🙊 🙉 X :invalid: X. X'
|
116
|
+
end
|
117
|
+
|
118
|
+
it 'allows symbols to safely insert other symbol types without getting re-parsed.' do
|
119
|
+
parsed = EmojiParser.parse('🙈 🙊 :hear_no_evil:') { |e| ":#{e.name}:" }
|
120
|
+
expect(parsed).to eq ':see_no_evil: :speak_no_evil: :hear_no_evil:'
|
121
|
+
end
|
122
|
+
end
|
123
|
+
|
124
|
+
describe '#tokenize' do
|
125
|
+
it 'successfully tokenizes full Gemoji unicode set.' do
|
126
|
+
Emoji.all.each do |emoji|
|
127
|
+
emoji.unicode_aliases.each do |u|
|
128
|
+
tokenized = EmojiParser.tokenize("Test #{u}")
|
129
|
+
expect(tokenized).to eq "Test :#{emoji.name}:"
|
130
|
+
end
|
131
|
+
end
|
132
|
+
end
|
133
|
+
|
134
|
+
it 'replaces all valid emoji unicode with their token equivalent.' do
|
135
|
+
tokenized = EmojiParser.tokenize(test_mixed)
|
136
|
+
expect(tokenized).to eq test_tokens
|
137
|
+
end
|
138
|
+
end
|
139
|
+
|
140
|
+
describe '#detokenize' do
|
141
|
+
it 'replaces all valid emoji tokens with their raw unicode equivalent.' do
|
142
|
+
tokenized = EmojiParser.detokenize(test_mixed)
|
143
|
+
expect(tokenized).to eq test_unicode
|
144
|
+
end
|
145
|
+
end
|
146
|
+
|
147
|
+
describe '#find' do
|
148
|
+
let (:the_unicode) { '🐵' }
|
149
|
+
let (:the_token) { 'monkey_face' }
|
150
|
+
let (:the_emoticon) { ':o)' }
|
151
|
+
let (:the_emoji) { Emoji.find_by_alias(the_token) }
|
152
|
+
|
153
|
+
it 'returns valid emoji characters.' do
|
154
|
+
expect(EmojiParser.find(the_emoji)).to eq the_emoji
|
155
|
+
end
|
156
|
+
|
157
|
+
it 'finds the proper emoji character for a unicode symbol.' do
|
158
|
+
expect(EmojiParser.find(the_unicode)).to eq the_emoji
|
159
|
+
end
|
160
|
+
|
161
|
+
it 'finds the proper emoji character for a token symbol.' do
|
162
|
+
expect(EmojiParser.find(the_token)).to eq the_emoji
|
163
|
+
end
|
164
|
+
|
165
|
+
it 'finds the proper emoji character for a unicode symbol.' do
|
166
|
+
expect(EmojiParser.find(the_emoticon)).to eq the_emoji
|
167
|
+
end
|
168
|
+
end
|
169
|
+
|
170
|
+
describe '#image_path' do
|
171
|
+
let (:the_emoji) { Emoji.find_by_alias('smiley') }
|
172
|
+
let (:the_image) { '1f603.png' }
|
173
|
+
|
174
|
+
it 'gets the image filename by emoji character.' do
|
175
|
+
path = EmojiParser.image_path(the_emoji)
|
176
|
+
expect(path).to eq the_emoji.image_filename
|
177
|
+
end
|
178
|
+
|
179
|
+
it 'gets the image filename by unicode symbol.' do
|
180
|
+
path = EmojiParser.image_path(the_emoji.raw)
|
181
|
+
expect(path).to eq the_emoji.image_filename
|
182
|
+
end
|
183
|
+
|
184
|
+
it 'gets the image filename by token symbol.' do
|
185
|
+
path = EmojiParser.image_path(the_emoji.name)
|
186
|
+
expect(path).to eq the_emoji.image_filename
|
187
|
+
end
|
188
|
+
|
189
|
+
it 'gets the image filename by emoticon symbol.' do
|
190
|
+
path = EmojiParser.image_path('=)')
|
191
|
+
expect(path).to eq the_emoji.image_filename
|
192
|
+
end
|
193
|
+
|
194
|
+
it 'formats a Gemoji image path as a custom location (with trailing slash).' do
|
195
|
+
custom_path = '//fonts.test.com/emoji/'
|
196
|
+
path = EmojiParser.image_path(the_emoji, custom_path)
|
197
|
+
expect(path).to eq "#{ custom_path }#{ the_image }"
|
198
|
+
end
|
199
|
+
|
200
|
+
it 'formats a Gemoji image path to a custom location (no trailing slash).' do
|
201
|
+
custom_path = '//fonts.test.com/emoji'
|
202
|
+
path = EmojiParser.image_path(the_emoji, custom_path)
|
203
|
+
expect(path).to eq "#{ custom_path }/#{ the_image }"
|
204
|
+
end
|
205
|
+
end
|
206
|
+
|
207
|
+
describe 'custom emoji' do
|
208
|
+
it 'replaces tokens for custom Emoji.' do
|
209
|
+
Emoji.create('boxing_kangaroo')
|
210
|
+
parsed = EmojiParser.parse_tokens('Test :boxing_kangaroo:') { |e| 'X' }
|
211
|
+
expect(parsed).to eq 'Test X'
|
212
|
+
end
|
213
|
+
|
214
|
+
it 'replaces custom emoticons (requires rehashing the regex).' do
|
215
|
+
EmojiParser.emoticons['¯\\(°_o)/¯'] = :confused
|
216
|
+
EmojiParser.emoticon_regex(rehash: true)
|
217
|
+
|
218
|
+
parsed = EmojiParser.parse_emoticons('Test ¯\\(°_o)/¯') { |e| e.name }
|
219
|
+
expect(parsed).to eq 'Test confused'
|
220
|
+
end
|
221
|
+
end
|
222
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gemoji-parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Greg MacWilliam
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-03-
|
11
|
+
date: 2015-03-21 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: gemoji
|
@@ -66,8 +66,8 @@ dependencies:
|
|
66
66
|
- - ">="
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '0'
|
69
|
-
description:
|
70
|
-
|
69
|
+
description: Expands GitHub Gemoji to parse unicode and token emoji symbols into custom
|
70
|
+
formats.
|
71
71
|
email:
|
72
72
|
- greg.macwilliam@voxmedia.com
|
73
73
|
executables: []
|
@@ -82,7 +82,7 @@ files:
|
|
82
82
|
- gemoji-parser.gemspec
|
83
83
|
- lib/gemoji-parser.rb
|
84
84
|
- lib/gemoji-parser/version.rb
|
85
|
-
- spec/
|
85
|
+
- spec/emoji_parser_spec.rb
|
86
86
|
homepage: https://github.com/gmac/gemoji-parser
|
87
87
|
licenses:
|
88
88
|
- MIT
|
@@ -108,4 +108,4 @@ signing_key:
|
|
108
108
|
specification_version: 4
|
109
109
|
summary: The missing helper methods for GitHub's Gemoji gem.
|
110
110
|
test_files:
|
111
|
-
- spec/
|
111
|
+
- spec/emoji_parser_spec.rb
|
data/spec/emoji_helper_spec.rb
DELETED
@@ -1,88 +0,0 @@
|
|
1
|
-
# coding: utf-8
|
2
|
-
require 'gemoji-parser'
|
3
|
-
|
4
|
-
describe EmojiParser do
|
5
|
-
let(:test_unicode) { 'Test 🙈 🙊 🙉 😰 :invalid: 🐠.' }
|
6
|
-
let(:test_mixed) { 'Test 🙈 🙊 🙉 :cold_sweat: :invalid: :tropical_fish:.' }
|
7
|
-
let(:test_tokens) { 'Test :see_no_evil: :speak_no_evil: :hear_no_evil: :cold_sweat: :invalid: :tropical_fish:.' }
|
8
|
-
|
9
|
-
describe '#emoji_regexp' do
|
10
|
-
it 'generates once and remains cached.' do
|
11
|
-
first = EmojiParser.emoji_regexp
|
12
|
-
second = EmojiParser.emoji_regexp
|
13
|
-
expect(first).to be second
|
14
|
-
end
|
15
|
-
|
16
|
-
it 'regenerates when called with a :rehash option.' do
|
17
|
-
first = EmojiParser.emoji_regexp
|
18
|
-
second = EmojiParser.emoji_regexp(rehash: true)
|
19
|
-
expect(first).not_to be second
|
20
|
-
end
|
21
|
-
end
|
22
|
-
|
23
|
-
describe '#parse_unicode' do
|
24
|
-
it 'replaces all valid emoji unicode via block transformation.' do
|
25
|
-
parsed = EmojiParser.parse_unicode(test_mixed) { |emoji| 'X' }
|
26
|
-
expect(parsed).to eq "Test X X X :cold_sweat: :invalid: :tropical_fish:."
|
27
|
-
end
|
28
|
-
end
|
29
|
-
|
30
|
-
describe '#parse_tokens' do
|
31
|
-
it 'replaces all valid emoji tokens via block transformation.' do
|
32
|
-
parsed = EmojiParser.parse_tokens(test_tokens) { |emoji| 'X' }
|
33
|
-
expect(parsed).to eq "Test X X X X :invalid: X."
|
34
|
-
end
|
35
|
-
end
|
36
|
-
|
37
|
-
describe '#parse_all' do
|
38
|
-
it 'replaces all valid emoji unicode and tokens via block transformation.' do
|
39
|
-
parsed = EmojiParser.parse_all(test_mixed) { |emoji| 'X' }
|
40
|
-
expect(parsed).to eq "Test X X X X :invalid: X."
|
41
|
-
end
|
42
|
-
end
|
43
|
-
|
44
|
-
describe '#tokenize' do
|
45
|
-
it 'successfully tokenizes all Gemoji unicode aliases.' do
|
46
|
-
Emoji.all.each do |emoji|
|
47
|
-
emoji.unicode_aliases.each do |u|
|
48
|
-
tokenized = EmojiParser.tokenize("Test #{u}")
|
49
|
-
expect(tokenized).to eq "Test :#{emoji.name}:"
|
50
|
-
end
|
51
|
-
end
|
52
|
-
end
|
53
|
-
|
54
|
-
it 'replaces all valid emoji unicodes with their token equivalent.' do
|
55
|
-
tokenized = EmojiParser.tokenize(test_mixed)
|
56
|
-
expect(tokenized).to eq test_tokens
|
57
|
-
end
|
58
|
-
end
|
59
|
-
|
60
|
-
describe '#detokenize' do
|
61
|
-
it 'replaces all valid emoji tokens with their raw unicode equivalent.' do
|
62
|
-
tokenized = EmojiParser.detokenize(test_mixed)
|
63
|
-
expect(tokenized).to eq test_unicode
|
64
|
-
end
|
65
|
-
end
|
66
|
-
|
67
|
-
describe '#filepath' do
|
68
|
-
let (:test_emoji) { Emoji.find_by_alias('de') }
|
69
|
-
let (:test_file) { '1f1e9-1f1ea.png' }
|
70
|
-
|
71
|
-
it 'formats a Gemoji image path as a root location by default.' do
|
72
|
-
path = EmojiParser.filepath(test_emoji)
|
73
|
-
expect(path).to eq "/#{test_file}"
|
74
|
-
end
|
75
|
-
|
76
|
-
it 'formats a Gemoji image path as a custom location (with trailing slash).' do
|
77
|
-
images_path = '//fonts.test.com/emoji/'
|
78
|
-
path = EmojiParser.filepath(test_emoji, images_path)
|
79
|
-
expect(path).to eq "#{images_path}#{test_file}"
|
80
|
-
end
|
81
|
-
|
82
|
-
it 'formats a Gemoji image path to a custom location (no trailing slash).' do
|
83
|
-
images_path = '//fonts.test.com/emoji'
|
84
|
-
path = EmojiParser.filepath(test_emoji, images_path)
|
85
|
-
expect(path).to eq "#{images_path}/#{test_file}"
|
86
|
-
end
|
87
|
-
end
|
88
|
-
end
|