unicode-emoji 0.9.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: a8a2374f40fac31061e825e42b16bbed3ad4455c
4
+ data.tar.gz: 567004277315d4a7a05dd7a86e07056e968e78ef
5
+ SHA512:
6
+ metadata.gz: 317e769c1426b946ff102670024dda19f8fbe0975af670a01ff79fc0e3c50571ccac503082dda8468b696dd9574ad206c39e20989f8b283993b7c3b48591c356
7
+ data.tar.gz: b2608a118f51a6c11d579696b4949ceef9827ee45c93609e3816c303199246942842150e240a9f33a013c2290ee1abd8e1b82960b2dcd93ffcc28c9c8debc4f8
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,23 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ rvm:
5
+ - ruby-head
6
+ - 2.4.1
7
+ - 2.3.4
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - jruby-head
12
+ - jruby-9.1.8.0
13
+
14
+ cache:
15
+ - bundler
16
+
17
+ matrix:
18
+ allow_failures:
19
+ - rvm: jruby-head
20
+ - rvm: ruby-head
21
+ - rvm: 2.0
22
+ # fast_finish: true
23
+
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.9.0
4
+
5
+ * Initial release (Emoji version 5.0)
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
6
+ gem 'rake'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2017 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,68 @@
1
+ # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
2
+
3
+ A small Ruby library which provides Unicode Emoji data and regexes.
4
+
5
+ Emoji version: **5.0**
6
+
7
+ Supported Rubies: **2.4**, **2.3**, **2.2**, **2.1**
8
+
9
+ ## Gemfile
10
+
11
+ ```ruby
12
+ gem "unicode-emoji"
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ ### Properties
18
+
19
+ Allows you to access the codepoint data form Unicode's [emoji-data.txt](http://unicode.org/Public/emoji/5.0/emoji-data.txt) file:
20
+
21
+ ```ruby
22
+ require "unicode/emoji"
23
+
24
+ Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
25
+ ```
26
+
27
+ ### Regex
28
+
29
+ Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
30
+
31
+ ```ruby
32
+ require "unicode/emoji"
33
+
34
+ string = "String which contains all kinds of emoji:
35
+
36
+ - Singleton Emoji: 😴
37
+ - Textual singleton Emoji with Emoji variation: ▶️
38
+ - Emoji with skin tone modifier: 🛌🏽
39
+ - Region flag: 🇵🇹
40
+ - Sub-Region flag: 🏴󠁧󠁢󠁳󠁣󠁴󠁿
41
+ - Keycap sequence: 2️⃣
42
+ - Sequence using ZWJ (zero width joiner): 🤾🏽‍♀️
43
+
44
+ "
45
+
46
+ string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴󠁧󠁢󠁳󠁣󠁴󠁿", "2️⃣", "🤾🏽‍♀️"]
47
+ ```
48
+
49
+ Regex | Description | Example Matches | Example Non-Matches
50
+ ------------------------------|-------------|-----------------|--------------------
51
+ `Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`
52
+ `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢` | `😴︎`, `▶`, `🏻`, `🇵🇵`
53
+ `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▶️` | `😴︎`, `▶`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
54
+ `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
55
+ `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `▶`, `🏻`, `🛌`, `🏽`, `🇵`, `🇹`, `2`, `🏴`, `🤾`, `♀`, `🤠`, `🤢` | -
56
+
57
+
58
+ ## Also See
59
+
60
+ - [Unicode® Technical Standard #51](http://www.unicode.org/reports/tr51/proposed.html)
61
+ - [Emoji data](http://unicode.org/Public/emoji/5.0/)
62
+ - [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name)
63
+ - Part of [unicode-x](https://github.com/janlelis/unicode-x)
64
+
65
+ ## MIT
66
+
67
+ - Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
68
+ - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
@@ -0,0 +1,37 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+ # # #
11
+ # Gem build and install task
12
+
13
+ desc info
14
+ task :gem do
15
+ puts info + "\n\n"
16
+ print " "; sh "gem build #{gemspec_file}"
17
+ FileUtils.mkdir_p 'pkg'
18
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
19
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
20
+ end
21
+
22
+ # # #
23
+ # Start an IRB session with the gem loaded
24
+
25
+ desc "#{gemspec.name} | IRB"
26
+ task :irb do
27
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
28
+ end
29
+
30
+ # # #
31
+ # Run Specs
32
+
33
+ desc "#{gemspec.name} | Spec"
34
+ task :spec do
35
+ sh "for file in spec/*.rb; do ruby $file; done"
36
+ end
37
+ task default: :spec
Binary file
@@ -0,0 +1,153 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "emoji/constants"
4
+ require_relative "emoji/index"
5
+
6
+ module Unicode
7
+ module Emoji
8
+ PROPERTY_NAMES = {
9
+ B: "Emoji_Modifier_Base",
10
+ M: "Emoji_Modifier",
11
+ C: "Emoji_Component",
12
+ P: "Emoji_Presentation",
13
+ }
14
+
15
+ EMOJI_VARIATION_SELECTOR = 0xFE0F
16
+ TEXT_VARIATION_SELECTOR = 0xFE0E
17
+ EMOJI_TAG_BASE_FLAG = 0x1F3F4
18
+ CANCEL_TAG = 0xE007F
19
+ EMOJI_KEYCAP_SUFFIX = 0x20E3
20
+ ZWJ = 0x200D
21
+
22
+ EMOJI_CHAR = INDEX[:PROPERTIES].keys.freeze
23
+ EMOJI_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze
24
+ TEXT_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| !props.include?(:P) }.keys.freeze
25
+ EMOJI_COMPONENT = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:C) }.keys.freeze
26
+ EMOJI_MODIFIER_BASES = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:B) }.keys.freeze
27
+ EMOJI_MODIFIERS = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:M) }.keys.freeze
28
+ EMOJI_KEYCAPS = INDEX[:KEYCAPS].freeze
29
+ VALID_REGION_FLAGS = INDEX[:FLAGS].freeze
30
+ VALID_SUBDIVISIONS = INDEX[:SD].freeze
31
+ RECOMMENDED_SUBDIVISION_FLAGS = INDEX[:TAGS].freeze
32
+ RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze
33
+
34
+ pack = ->(ord){ Regexp.escape(Array(ord).pack("U*")) }
35
+ join = -> (*strings){ "(?:" + strings.join("|") + ")" }
36
+ pack_and_join = ->(ords){ join[*ords.map{ |ord| pack[ord] }] }
37
+
38
+ emoji_character = \
39
+ pack_and_join[EMOJI_CHAR]
40
+
41
+ emoji_presentation_sequence = \
42
+ join[
43
+ pack_and_join[TEXT_PRESENTATION] + pack[EMOJI_VARIATION_SELECTOR],
44
+ pack_and_join[EMOJI_PRESENTATION] + "(?!" + pack[TEXT_VARIATION_SELECTOR] + ")" + pack[EMOJI_VARIATION_SELECTOR] + "?",
45
+ ]
46
+
47
+ text_presentation_sequence = \
48
+ join[
49
+ pack_and_join[TEXT_PRESENTATION]+ "(?!" + pack[EMOJI_VARIATION_SELECTOR] + ")" + pack[TEXT_VARIATION_SELECTOR] + "?",
50
+ pack_and_join[EMOJI_PRESENTATION] + pack[TEXT_VARIATION_SELECTOR]
51
+ ]
52
+
53
+ emoji_component = \
54
+ pack_and_join[EMOJI_COMPONENT]
55
+
56
+ emoji_modifier_sequence = \
57
+ pack_and_join[EMOJI_MODIFIER_BASES] + pack_and_join[EMOJI_MODIFIERS]
58
+
59
+ emoji_keycap_sequence = \
60
+ pack_and_join[EMOJI_KEYCAPS] + pack[[EMOJI_VARIATION_SELECTOR, EMOJI_KEYCAP_SUFFIX]]
61
+
62
+ emoji_valid_region_sequence = \
63
+ pack_and_join[VALID_REGION_FLAGS]
64
+
65
+ emoji_valid_tag_sequence = \
66
+ "(?:" +
67
+ pack[EMOJI_TAG_BASE_FLAG] +
68
+ "(?:" + VALID_SUBDIVISIONS.map{ |sd| Regexp.escape(sd.tr("\u{20}-\u{7E}", "\u{E0020}-\u{E007E}"))}.join("|") + ")" +
69
+ pack[CANCEL_TAG] +
70
+ ")"
71
+
72
+ emoji_zwj_element = \
73
+ join[
74
+ emoji_modifier_sequence,
75
+ emoji_presentation_sequence,
76
+ emoji_character,
77
+ ]
78
+
79
+ # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences
80
+ REGEX = Regexp.compile(
81
+ pack_and_join[RECOMMENDED_ZWJ_SEQUENCES] +
82
+ ?| + pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS] +
83
+ ?| + emoji_modifier_sequence +
84
+ ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
85
+ ?| + emoji_keycap_sequence +
86
+ ?| + emoji_valid_region_sequence +
87
+ ""
88
+ )
89
+
90
+ # Matches basic singleton emoji and all kind of valid sequences
91
+ REGEX_VALID = Regexp.compile(
92
+ # EMOJI_TAGS.map{ |base, spec| "(?:" + pack[base] + "[" + pack[spec] + "]+" + pack[CANCEL_TAG] + ")" }.join("|") +
93
+ emoji_valid_tag_sequence +
94
+ ?| + "(?:" + "(?:" + emoji_zwj_element + pack[ZWJ] + "){1,3}" + emoji_zwj_element + ")" +
95
+ ?| + emoji_modifier_sequence +
96
+ ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
97
+ ?| + emoji_keycap_sequence +
98
+ ?| + emoji_valid_region_sequence +
99
+ ""
100
+ )
101
+
102
+ # Matches only basic single, non-textual emoji
103
+ # Ignores "components" like modifiers or simple digits
104
+ REGEX_BASIC = Regexp.compile(
105
+ "(?!" + emoji_component + ")" + emoji_presentation_sequence
106
+ )
107
+
108
+ # Matches only basic single, textual emoji
109
+ # Ignores "components" like modifiers or simple digits
110
+ REGEX_TEXT = Regexp.compile(
111
+ "(?!" + emoji_component + ")" + text_presentation_sequence
112
+ )
113
+
114
+ # Matches any emoji-related codepoint
115
+ REGEX_ANY = Regexp.compile(
116
+ emoji_character
117
+ )
118
+
119
+ def self.properties(char)
120
+ ord = get_codepoint_value(char)
121
+ props = INDEX[:PROPERTIES][ord]
122
+
123
+ if props
124
+ ["Emoji"] + props.map{ |prop| PROPERTY_NAMES[prop] }
125
+ else
126
+ # nothing
127
+ end
128
+ end
129
+
130
+ def self.get_codepoint_value(char)
131
+ ord = nil
132
+
133
+ if char.valid_encoding?
134
+ ord = char.ord
135
+ elsif char.encoding.name == "UTF-8"
136
+ begin
137
+ ord = char.unpack("U*")[0]
138
+ rescue ArgumentError
139
+ end
140
+ end
141
+
142
+ if ord
143
+ ord
144
+ else
145
+ raise(ArgumentError, "Unicode::Emoji must be given a valid string")
146
+ end
147
+ end
148
+
149
+ class << self
150
+ private :get_codepoint_value
151
+ end
152
+ end
153
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Unicode
4
+ module Emoji
5
+ VERSION = "0.9.0".freeze
6
+ EMOJI_VERSION = "5.0".freeze
7
+ DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
8
+ INDEX_FILENAME = (DATA_DIRECTORY + '/emoji.marshal.gz').freeze
9
+ end
10
+ end
11
+
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'constants'
4
+
5
+ module Unicode
6
+ module Emoji
7
+ INDEX = Marshal.load(Gem.gunzip(File.binread(INDEX_FILENAME)))
8
+ end
9
+ end
@@ -0,0 +1,295 @@
1
+ require_relative "../lib/unicode/emoji"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicode::Emoji do
5
+ describe ".properties" do
6
+ it "will return an Array with Emoji properties if codepoints has some" do
7
+ assert_equal ["Emoji", "Emoji_Presentation"], Unicode::Emoji.properties("😴")
8
+ assert_equal ["Emoji"], Unicode::Emoji.properties("♠")
9
+ end
10
+
11
+ it "will return nil with Emoji properties if codepoints has some" do
12
+ assert_nil Unicode::Emoji.properties("A")
13
+ end
14
+ end
15
+
16
+ describe "REGEX" do
17
+ it "matches most singleton emoji codepoints" do
18
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX
19
+ assert_equal "😴", $&
20
+ end
21
+
22
+ it "matches singleton emoji in combination with emoji variation selector" do
23
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX
24
+ assert_equal "😴\u{FE0F}", $&
25
+ end
26
+
27
+ it "does not match singleton emoji when in combination with text variation selector" do
28
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX
29
+ assert_nil $&
30
+ end
31
+
32
+ it "does not match textual singleton emoji" do
33
+ "▶ play button" =~ Unicode::Emoji::REGEX
34
+ assert_nil $&
35
+ end
36
+
37
+ it "does match textual singleton emoji in combination with emoji variation selector" do
38
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX
39
+ assert_equal "▶\u{FE0F}", $&
40
+ end
41
+
42
+ it "does not match singleton 'component' emoji codepoints" do
43
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX
44
+ assert_nil $&
45
+ end
46
+
47
+ it "does match modified emoji if modifier base emoji is used" do
48
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX
49
+ assert_equal "🛌🏽", $&
50
+ end
51
+
52
+ it "does not match modified emoji if no modifier base emoji is used" do
53
+ "🌵🏽 cactus" =~ Unicode::Emoji::REGEX
54
+ assert_equal "🌵", $&
55
+ end
56
+
57
+ it "does match valid region flags" do
58
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX
59
+ assert_equal "🇵🇹", $&
60
+ end
61
+
62
+ it "does not match invalid region flags" do
63
+ "🇵🇵 PP Land" =~ Unicode::Emoji::REGEX
64
+ assert_nil $&
65
+ end
66
+
67
+ it "does match emoji keycap sequences" do
68
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX
69
+ assert_equal "2️⃣", $&
70
+ end
71
+
72
+ it "does match recommended tag sequences" do
73
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX
74
+ assert_equal "🏴󠁧󠁢󠁳󠁣󠁴󠁿", $&
75
+ end
76
+
77
+ it "does not match valid tag sequences which are not recommended" do
78
+ "🏴󠁧󠁢󠁡󠁧󠁢󠁿 GB AGB" =~ Unicode::Emoji::REGEX
79
+ assert_equal "🏴", $& # only base flag is matched
80
+ end
81
+
82
+ it "does match recommended zwj sequences" do
83
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX
84
+ assert_equal "🤾🏽‍♀️", $&
85
+ end
86
+
87
+ it "does not match valid zwj sequences which are not recommended" do
88
+ "🤠‍🤢 vomiting cowboy" =~ Unicode::Emoji::REGEX
89
+ assert_equal "🤠", $&
90
+ end
91
+ end
92
+
93
+ describe "REGEX_VALID" do
94
+ it "matches most singleton emoji codepoints" do
95
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_VALID
96
+ assert_equal "😴", $&
97
+ end
98
+
99
+ it "matches singleton emoji in combination with emoji variation selector" do
100
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_VALID
101
+ assert_equal "😴\u{FE0F}", $&
102
+ end
103
+
104
+ it "does not match singleton emoji when in combination with text variation selector" do
105
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_VALID
106
+ assert_nil $&
107
+ end
108
+
109
+ it "does not match textual singleton emoji" do
110
+ "▶ play button" =~ Unicode::Emoji::REGEX_VALID
111
+ assert_nil $&
112
+ end
113
+
114
+ it "does match textual singleton emoji in combination with emoji variation selector" do
115
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX_VALID
116
+ assert_equal "▶\u{FE0F}", $&
117
+ end
118
+
119
+ it "does not match singleton 'component' emoji codepoints" do
120
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_VALID
121
+ assert_nil $&
122
+ end
123
+
124
+ it "does match modified emoji if modifier base emoji is used" do
125
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_VALID
126
+ assert_equal "🛌🏽", $&
127
+ end
128
+
129
+ it "does not match modified emoji if no modifier base emoji is used" do
130
+ "🌵🏽 cactus" =~ Unicode::Emoji::REGEX_VALID
131
+ assert_equal "🌵", $&
132
+ end
133
+
134
+ it "does match valid region flags" do
135
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_VALID
136
+ assert_equal "🇵🇹", $&
137
+ end
138
+
139
+ it "does not match invalid region flags" do
140
+ "🇵🇵 PP Land" =~ Unicode::Emoji::REGEX_VALID
141
+ assert_nil $&
142
+ end
143
+
144
+ it "does match emoji keycap sequences" do
145
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_VALID
146
+ assert_equal "2️⃣", $&
147
+ end
148
+
149
+ it "does match recommended tag sequences" do
150
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_VALID
151
+ assert_equal "🏴󠁧󠁢󠁳󠁣󠁴󠁿", $&
152
+ end
153
+
154
+ it "does match valid tag sequences, even though they are not recommended" do
155
+ "🏴󠁧󠁢󠁡󠁧󠁢󠁿 GB AGB" =~ Unicode::Emoji::REGEX_VALID
156
+ assert_equal "🏴󠁧󠁢󠁡󠁧󠁢󠁿", $&
157
+ end
158
+
159
+ it "does not match invalid tag sequences" do
160
+ "🏴󠁧󠁢󠁡󠁡󠁡󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
161
+ assert_equal "🏴", $&
162
+ end
163
+
164
+ it "does match recommended zwj sequences" do
165
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_VALID
166
+ assert_equal "🤾🏽‍♀️", $&
167
+ end
168
+
169
+ it "does match valid zwj sequences, even though they are not recommended" do
170
+ "🤠‍🤢 vomiting cowboy" =~ Unicode::Emoji::REGEX_VALID
171
+ assert_equal "🤠‍🤢", $&
172
+ end
173
+ end
174
+
175
+ describe "REGEX_BASIC" do
176
+ it "matches most singleton emoji codepoints" do
177
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_BASIC
178
+ assert_equal "😴", $&
179
+ end
180
+
181
+ it "matches singleton emoji in combination with emoji variation selector" do
182
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_BASIC
183
+ assert_equal "😴\u{FE0F}", $&
184
+ end
185
+
186
+ it "does not match singleton emoji when in combination with text variation selector" do
187
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_BASIC
188
+ assert_nil $&
189
+ end
190
+
191
+ it "does not match textual singleton emoji" do
192
+ "▶ play button" =~ Unicode::Emoji::REGEX
193
+ assert_nil $&
194
+ end
195
+
196
+ it "does match textual singleton emoji in combination with emoji variation selector" do
197
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX
198
+ assert_equal "▶\u{FE0F}", $&
199
+ end
200
+
201
+ it "does not match singleton 'component' emoji codepoints" do
202
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_BASIC
203
+ assert_nil $&
204
+ end
205
+
206
+ it "does not match modified emoji" do
207
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_BASIC
208
+ assert_equal "🛌", $&
209
+ end
210
+
211
+ it "does not match region flags" do
212
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_BASIC
213
+ assert_nil $&
214
+ end
215
+
216
+ it "does not match emoji keycap sequences" do
217
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_BASIC
218
+ assert_nil $&
219
+ end
220
+
221
+ it "does not match tag sequences" do
222
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_BASIC
223
+ assert_equal "🏴", $& # only base flag is matched
224
+ end
225
+
226
+ it "does not match zwj sequences" do
227
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_BASIC
228
+ assert_equal "🤾", $&
229
+ end
230
+ end
231
+
232
+ describe "REGEX_TEXT" do
233
+ it "deos not match singleton emoji codepoints with emoji presentation and no variation selector" do
234
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_TEXT
235
+ assert_nil $&
236
+ end
237
+
238
+ it "does not match singleton emoji in combination with emoji variation selector" do
239
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_TEXT
240
+ assert_nil $&
241
+ end
242
+
243
+ it "does match singleton emoji in combination with text variation selector" do
244
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_TEXT
245
+ assert_equal "😴\u{FE0E}", $&
246
+ end
247
+
248
+ it "does match textual singleton emoji" do
249
+ "▶ play button" =~ Unicode::Emoji::REGEX_TEXT
250
+ assert_equal "▶", $&
251
+ end
252
+
253
+ it "does not match textual singleton emoji in combination with emoji variation selector" do
254
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX_TEXT
255
+ assert_nil $&
256
+ end
257
+
258
+ it "does not match singleton 'component' emoji codepoints" do
259
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_TEXT
260
+ assert_nil $&
261
+ end
262
+
263
+ it "does not match modified emoji" do
264
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_TEXT
265
+ assert_nil $&
266
+ end
267
+
268
+ it "does not match region flags" do
269
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_TEXT
270
+ assert_nil $&
271
+ end
272
+
273
+ it "does not match emoji keycap sequences" do
274
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_TEXT
275
+ assert_nil $&
276
+ end
277
+
278
+ it "does not match tag sequences" do
279
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_TEXT
280
+ assert_nil $&
281
+ end
282
+
283
+ it "does not match zwj sequences" do
284
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_TEXT
285
+ assert_nil $&
286
+ end
287
+ end
288
+
289
+ describe "REGEX_ANY" do
290
+ it "returns any emoji-related codepoint (but no variation selectors or tags)" do
291
+ matches = "1 string 😴\u{FE0F} sleeping face with 🇵 and modifier 🏾, also 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland".scan(Unicode::Emoji::REGEX_ANY)
292
+ assert_equal ["1", "😴", "🇵", "🏾", "🏴"], matches
293
+ end
294
+ end
295
+ end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicode/emoji/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicode-emoji"
7
+ gem.version = Unicode::Emoji::VERSION
8
+ gem.summary = "Retrieve Emoji data about Unicode codepoints."
9
+ gem.description = "[Emoji #{Unicode::Emoji::EMOJI_VERSION}] Retrieve emoji data about Unicode codepoints. Also contains a regex to match emoji."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicode-emoji"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ end
metadata ADDED
@@ -0,0 +1,61 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicode-emoji
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.9.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-04-08 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: "[Emoji 5.0] Retrieve emoji data about Unicode codepoints. Also contains
14
+ a regex to match emoji."
15
+ email:
16
+ - mail@janlelis.de
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - ".gitignore"
22
+ - ".travis.yml"
23
+ - CHANGELOG.md
24
+ - CODE_OF_CONDUCT.md
25
+ - Gemfile
26
+ - Gemfile.lock
27
+ - MIT-LICENSE.txt
28
+ - README.md
29
+ - Rakefile
30
+ - data/emoji.marshal.gz
31
+ - lib/unicode/emoji.rb
32
+ - lib/unicode/emoji/constants.rb
33
+ - lib/unicode/emoji/index.rb
34
+ - spec/unicode_emoji_spec.rb
35
+ - unicode-emoji.gemspec
36
+ homepage: https://github.com/janlelis/unicode-emoji
37
+ licenses:
38
+ - MIT
39
+ metadata: {}
40
+ post_install_message:
41
+ rdoc_options: []
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '2.0'
49
+ required_rubygems_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubyforge_project:
56
+ rubygems_version: 2.6.8
57
+ signing_key:
58
+ specification_version: 4
59
+ summary: Retrieve Emoji data about Unicode codepoints.
60
+ test_files:
61
+ - spec/unicode_emoji_spec.rb