unicode-emoji 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: a8a2374f40fac31061e825e42b16bbed3ad4455c
4
+ data.tar.gz: 567004277315d4a7a05dd7a86e07056e968e78ef
5
+ SHA512:
6
+ metadata.gz: 317e769c1426b946ff102670024dda19f8fbe0975af670a01ff79fc0e3c50571ccac503082dda8468b696dd9574ad206c39e20989f8b283993b7c3b48591c356
7
+ data.tar.gz: b2608a118f51a6c11d579696b4949ceef9827ee45c93609e3816c303199246942842150e240a9f33a013c2290ee1abd8e1b82960b2dcd93ffcc28c9c8debc4f8
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,23 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ rvm:
5
+ - ruby-head
6
+ - 2.4.1
7
+ - 2.3.4
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - jruby-head
12
+ - jruby-9.1.8.0
13
+
14
+ cache:
15
+ - bundler
16
+
17
+ matrix:
18
+ allow_failures:
19
+ - rvm: jruby-head
20
+ - rvm: ruby-head
21
+ - rvm: 2.0
22
+ # fast_finish: true
23
+
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.9.0
4
+
5
+ * Initial release (Emoji version 5.0)
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
6
+ gem 'rake'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2017 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,68 @@
1
+ # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
2
+
3
+ A small Ruby library which provides Unicode Emoji data and regexes.
4
+
5
+ Emoji version: **5.0**
6
+
7
+ Supported Rubies: **2.4**, **2.3**, **2.2**, **2.1**
8
+
9
+ ## Gemfile
10
+
11
+ ```ruby
12
+ gem "unicode-emoji"
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ ### Properties
18
+
19
+ Allows you to access the codepoint data form Unicode's [emoji-data.txt](http://unicode.org/Public/emoji/5.0/emoji-data.txt) file:
20
+
21
+ ```ruby
22
+ require "unicode/emoji"
23
+
24
+ Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
25
+ ```
26
+
27
+ ### Regex
28
+
29
+ Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
30
+
31
+ ```ruby
32
+ require "unicode/emoji"
33
+
34
+ string = "String which contains all kinds of emoji:
35
+
36
+ - Singleton Emoji: 😴
37
+ - Textual singleton Emoji with Emoji variation: ▶️
38
+ - Emoji with skin tone modifier: 🛌🏽
39
+ - Region flag: 🇵🇹
40
+ - Sub-Region flag: 🏴󠁧󠁢󠁳󠁣󠁴󠁿
41
+ - Keycap sequence: 2️⃣
42
+ - Sequence using ZWJ (zero width joiner): 🤾🏽‍♀️
43
+
44
+ "
45
+
46
+ string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴󠁧󠁢󠁳󠁣󠁴󠁿", "2️⃣", "🤾🏽‍♀️"]
47
+ ```
48
+
49
+ Regex | Description | Example Matches | Example Non-Matches
50
+ ------------------------------|-------------|-----------------|--------------------
51
+ `Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`
52
+ `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢` | `😴︎`, `▶`, `🏻`, `🇵🇵`
53
+ `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▶️` | `😴︎`, `▶`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
54
+ `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
55
+ `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `▶`, `🏻`, `🛌`, `🏽`, `🇵`, `🇹`, `2`, `🏴`, `🤾`, `♀`, `🤠`, `🤢` | -
56
+
57
+
58
+ ## Also See
59
+
60
+ - [Unicode® Technical Standard #51](http://www.unicode.org/reports/tr51/proposed.html)
61
+ - [Emoji data](http://unicode.org/Public/emoji/5.0/)
62
+ - [Emoji sequence names](https://github.com/janlelis/unicode-sequence_name)
63
+ - Part of [unicode-x](https://github.com/janlelis/unicode-x)
64
+
65
+ ## MIT
66
+
67
+ - Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
68
+ - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
@@ -0,0 +1,37 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+ # # #
11
+ # Gem build and install task
12
+
13
+ desc info
14
+ task :gem do
15
+ puts info + "\n\n"
16
+ print " "; sh "gem build #{gemspec_file}"
17
+ FileUtils.mkdir_p 'pkg'
18
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
19
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
20
+ end
21
+
22
+ # # #
23
+ # Start an IRB session with the gem loaded
24
+
25
+ desc "#{gemspec.name} | IRB"
26
+ task :irb do
27
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
28
+ end
29
+
30
+ # # #
31
+ # Run Specs
32
+
33
+ desc "#{gemspec.name} | Spec"
34
+ task :spec do
35
+ sh "for file in spec/*.rb; do ruby $file; done"
36
+ end
37
+ task default: :spec
Binary file
@@ -0,0 +1,153 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "emoji/constants"
4
+ require_relative "emoji/index"
5
+
6
+ module Unicode
7
+ module Emoji
8
+ PROPERTY_NAMES = {
9
+ B: "Emoji_Modifier_Base",
10
+ M: "Emoji_Modifier",
11
+ C: "Emoji_Component",
12
+ P: "Emoji_Presentation",
13
+ }
14
+
15
+ EMOJI_VARIATION_SELECTOR = 0xFE0F
16
+ TEXT_VARIATION_SELECTOR = 0xFE0E
17
+ EMOJI_TAG_BASE_FLAG = 0x1F3F4
18
+ CANCEL_TAG = 0xE007F
19
+ EMOJI_KEYCAP_SUFFIX = 0x20E3
20
+ ZWJ = 0x200D
21
+
22
+ EMOJI_CHAR = INDEX[:PROPERTIES].keys.freeze
23
+ EMOJI_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze
24
+ TEXT_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| !props.include?(:P) }.keys.freeze
25
+ EMOJI_COMPONENT = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:C) }.keys.freeze
26
+ EMOJI_MODIFIER_BASES = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:B) }.keys.freeze
27
+ EMOJI_MODIFIERS = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:M) }.keys.freeze
28
+ EMOJI_KEYCAPS = INDEX[:KEYCAPS].freeze
29
+ VALID_REGION_FLAGS = INDEX[:FLAGS].freeze
30
+ VALID_SUBDIVISIONS = INDEX[:SD].freeze
31
+ RECOMMENDED_SUBDIVISION_FLAGS = INDEX[:TAGS].freeze
32
+ RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze
33
+
34
+ pack = ->(ord){ Regexp.escape(Array(ord).pack("U*")) }
35
+ join = -> (*strings){ "(?:" + strings.join("|") + ")" }
36
+ pack_and_join = ->(ords){ join[*ords.map{ |ord| pack[ord] }] }
37
+
38
+ emoji_character = \
39
+ pack_and_join[EMOJI_CHAR]
40
+
41
+ emoji_presentation_sequence = \
42
+ join[
43
+ pack_and_join[TEXT_PRESENTATION] + pack[EMOJI_VARIATION_SELECTOR],
44
+ pack_and_join[EMOJI_PRESENTATION] + "(?!" + pack[TEXT_VARIATION_SELECTOR] + ")" + pack[EMOJI_VARIATION_SELECTOR] + "?",
45
+ ]
46
+
47
+ text_presentation_sequence = \
48
+ join[
49
+ pack_and_join[TEXT_PRESENTATION]+ "(?!" + pack[EMOJI_VARIATION_SELECTOR] + ")" + pack[TEXT_VARIATION_SELECTOR] + "?",
50
+ pack_and_join[EMOJI_PRESENTATION] + pack[TEXT_VARIATION_SELECTOR]
51
+ ]
52
+
53
+ emoji_component = \
54
+ pack_and_join[EMOJI_COMPONENT]
55
+
56
+ emoji_modifier_sequence = \
57
+ pack_and_join[EMOJI_MODIFIER_BASES] + pack_and_join[EMOJI_MODIFIERS]
58
+
59
+ emoji_keycap_sequence = \
60
+ pack_and_join[EMOJI_KEYCAPS] + pack[[EMOJI_VARIATION_SELECTOR, EMOJI_KEYCAP_SUFFIX]]
61
+
62
+ emoji_valid_region_sequence = \
63
+ pack_and_join[VALID_REGION_FLAGS]
64
+
65
+ emoji_valid_tag_sequence = \
66
+ "(?:" +
67
+ pack[EMOJI_TAG_BASE_FLAG] +
68
+ "(?:" + VALID_SUBDIVISIONS.map{ |sd| Regexp.escape(sd.tr("\u{20}-\u{7E}", "\u{E0020}-\u{E007E}"))}.join("|") + ")" +
69
+ pack[CANCEL_TAG] +
70
+ ")"
71
+
72
+ emoji_zwj_element = \
73
+ join[
74
+ emoji_modifier_sequence,
75
+ emoji_presentation_sequence,
76
+ emoji_character,
77
+ ]
78
+
79
+ # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences
80
+ REGEX = Regexp.compile(
81
+ pack_and_join[RECOMMENDED_ZWJ_SEQUENCES] +
82
+ ?| + pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS] +
83
+ ?| + emoji_modifier_sequence +
84
+ ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
85
+ ?| + emoji_keycap_sequence +
86
+ ?| + emoji_valid_region_sequence +
87
+ ""
88
+ )
89
+
90
+ # Matches basic singleton emoji and all kind of valid sequences
91
+ REGEX_VALID = Regexp.compile(
92
+ # EMOJI_TAGS.map{ |base, spec| "(?:" + pack[base] + "[" + pack[spec] + "]+" + pack[CANCEL_TAG] + ")" }.join("|") +
93
+ emoji_valid_tag_sequence +
94
+ ?| + "(?:" + "(?:" + emoji_zwj_element + pack[ZWJ] + "){1,3}" + emoji_zwj_element + ")" +
95
+ ?| + emoji_modifier_sequence +
96
+ ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
97
+ ?| + emoji_keycap_sequence +
98
+ ?| + emoji_valid_region_sequence +
99
+ ""
100
+ )
101
+
102
+ # Matches only basic single, non-textual emoji
103
+ # Ignores "components" like modifiers or simple digits
104
+ REGEX_BASIC = Regexp.compile(
105
+ "(?!" + emoji_component + ")" + emoji_presentation_sequence
106
+ )
107
+
108
+ # Matches only basic single, textual emoji
109
+ # Ignores "components" like modifiers or simple digits
110
+ REGEX_TEXT = Regexp.compile(
111
+ "(?!" + emoji_component + ")" + text_presentation_sequence
112
+ )
113
+
114
+ # Matches any emoji-related codepoint
115
+ REGEX_ANY = Regexp.compile(
116
+ emoji_character
117
+ )
118
+
119
+ def self.properties(char)
120
+ ord = get_codepoint_value(char)
121
+ props = INDEX[:PROPERTIES][ord]
122
+
123
+ if props
124
+ ["Emoji"] + props.map{ |prop| PROPERTY_NAMES[prop] }
125
+ else
126
+ # nothing
127
+ end
128
+ end
129
+
130
+ def self.get_codepoint_value(char)
131
+ ord = nil
132
+
133
+ if char.valid_encoding?
134
+ ord = char.ord
135
+ elsif char.encoding.name == "UTF-8"
136
+ begin
137
+ ord = char.unpack("U*")[0]
138
+ rescue ArgumentError
139
+ end
140
+ end
141
+
142
+ if ord
143
+ ord
144
+ else
145
+ raise(ArgumentError, "Unicode::Emoji must be given a valid string")
146
+ end
147
+ end
148
+
149
+ class << self
150
+ private :get_codepoint_value
151
+ end
152
+ end
153
+ end
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Unicode
4
+ module Emoji
5
+ VERSION = "0.9.0".freeze
6
+ EMOJI_VERSION = "5.0".freeze
7
+ DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
8
+ INDEX_FILENAME = (DATA_DIRECTORY + '/emoji.marshal.gz').freeze
9
+ end
10
+ end
11
+
@@ -0,0 +1,9 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative 'constants'
4
+
5
+ module Unicode
6
+ module Emoji
7
+ INDEX = Marshal.load(Gem.gunzip(File.binread(INDEX_FILENAME)))
8
+ end
9
+ end
@@ -0,0 +1,295 @@
1
+ require_relative "../lib/unicode/emoji"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicode::Emoji do
5
+ describe ".properties" do
6
+ it "will return an Array with Emoji properties if codepoints has some" do
7
+ assert_equal ["Emoji", "Emoji_Presentation"], Unicode::Emoji.properties("😴")
8
+ assert_equal ["Emoji"], Unicode::Emoji.properties("♠")
9
+ end
10
+
11
+ it "will return nil with Emoji properties if codepoints has some" do
12
+ assert_nil Unicode::Emoji.properties("A")
13
+ end
14
+ end
15
+
16
+ describe "REGEX" do
17
+ it "matches most singleton emoji codepoints" do
18
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX
19
+ assert_equal "😴", $&
20
+ end
21
+
22
+ it "matches singleton emoji in combination with emoji variation selector" do
23
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX
24
+ assert_equal "😴\u{FE0F}", $&
25
+ end
26
+
27
+ it "does not match singleton emoji when in combination with text variation selector" do
28
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX
29
+ assert_nil $&
30
+ end
31
+
32
+ it "does not match textual singleton emoji" do
33
+ "▶ play button" =~ Unicode::Emoji::REGEX
34
+ assert_nil $&
35
+ end
36
+
37
+ it "does match textual singleton emoji in combination with emoji variation selector" do
38
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX
39
+ assert_equal "▶\u{FE0F}", $&
40
+ end
41
+
42
+ it "does not match singleton 'component' emoji codepoints" do
43
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX
44
+ assert_nil $&
45
+ end
46
+
47
+ it "does match modified emoji if modifier base emoji is used" do
48
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX
49
+ assert_equal "🛌🏽", $&
50
+ end
51
+
52
+ it "does not match modified emoji if no modifier base emoji is used" do
53
+ "🌵🏽 cactus" =~ Unicode::Emoji::REGEX
54
+ assert_equal "🌵", $&
55
+ end
56
+
57
+ it "does match valid region flags" do
58
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX
59
+ assert_equal "🇵🇹", $&
60
+ end
61
+
62
+ it "does not match invalid region flags" do
63
+ "🇵🇵 PP Land" =~ Unicode::Emoji::REGEX
64
+ assert_nil $&
65
+ end
66
+
67
+ it "does match emoji keycap sequences" do
68
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX
69
+ assert_equal "2️⃣", $&
70
+ end
71
+
72
+ it "does match recommended tag sequences" do
73
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX
74
+ assert_equal "🏴󠁧󠁢󠁳󠁣󠁴󠁿", $&
75
+ end
76
+
77
+ it "does not match valid tag sequences which are not recommended" do
78
+ "🏴󠁧󠁢󠁡󠁧󠁢󠁿 GB AGB" =~ Unicode::Emoji::REGEX
79
+ assert_equal "🏴", $& # only base flag is matched
80
+ end
81
+
82
+ it "does match recommended zwj sequences" do
83
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX
84
+ assert_equal "🤾🏽‍♀️", $&
85
+ end
86
+
87
+ it "does not match valid zwj sequences which are not recommended" do
88
+ "🤠‍🤢 vomiting cowboy" =~ Unicode::Emoji::REGEX
89
+ assert_equal "🤠", $&
90
+ end
91
+ end
92
+
93
+ describe "REGEX_VALID" do
94
+ it "matches most singleton emoji codepoints" do
95
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_VALID
96
+ assert_equal "😴", $&
97
+ end
98
+
99
+ it "matches singleton emoji in combination with emoji variation selector" do
100
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_VALID
101
+ assert_equal "😴\u{FE0F}", $&
102
+ end
103
+
104
+ it "does not match singleton emoji when in combination with text variation selector" do
105
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_VALID
106
+ assert_nil $&
107
+ end
108
+
109
+ it "does not match textual singleton emoji" do
110
+ "▶ play button" =~ Unicode::Emoji::REGEX_VALID
111
+ assert_nil $&
112
+ end
113
+
114
+ it "does match textual singleton emoji in combination with emoji variation selector" do
115
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX_VALID
116
+ assert_equal "▶\u{FE0F}", $&
117
+ end
118
+
119
+ it "does not match singleton 'component' emoji codepoints" do
120
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_VALID
121
+ assert_nil $&
122
+ end
123
+
124
+ it "does match modified emoji if modifier base emoji is used" do
125
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_VALID
126
+ assert_equal "🛌🏽", $&
127
+ end
128
+
129
+ it "does not match modified emoji if no modifier base emoji is used" do
130
+ "🌵🏽 cactus" =~ Unicode::Emoji::REGEX_VALID
131
+ assert_equal "🌵", $&
132
+ end
133
+
134
+ it "does match valid region flags" do
135
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_VALID
136
+ assert_equal "🇵🇹", $&
137
+ end
138
+
139
+ it "does not match invalid region flags" do
140
+ "🇵🇵 PP Land" =~ Unicode::Emoji::REGEX_VALID
141
+ assert_nil $&
142
+ end
143
+
144
+ it "does match emoji keycap sequences" do
145
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_VALID
146
+ assert_equal "2️⃣", $&
147
+ end
148
+
149
+ it "does match recommended tag sequences" do
150
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_VALID
151
+ assert_equal "🏴󠁧󠁢󠁳󠁣󠁴󠁿", $&
152
+ end
153
+
154
+ it "does match valid tag sequences, even though they are not recommended" do
155
+ "🏴󠁧󠁢󠁡󠁧󠁢󠁿 GB AGB" =~ Unicode::Emoji::REGEX_VALID
156
+ assert_equal "🏴󠁧󠁢󠁡󠁧󠁢󠁿", $&
157
+ end
158
+
159
+ it "does not match invalid tag sequences" do
160
+ "🏴󠁧󠁢󠁡󠁡󠁡󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
161
+ assert_equal "🏴", $&
162
+ end
163
+
164
+ it "does match recommended zwj sequences" do
165
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_VALID
166
+ assert_equal "🤾🏽‍♀️", $&
167
+ end
168
+
169
+ it "does match valid zwj sequences, even though they are not recommended" do
170
+ "🤠‍🤢 vomiting cowboy" =~ Unicode::Emoji::REGEX_VALID
171
+ assert_equal "🤠‍🤢", $&
172
+ end
173
+ end
174
+
175
+ describe "REGEX_BASIC" do
176
+ it "matches most singleton emoji codepoints" do
177
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_BASIC
178
+ assert_equal "😴", $&
179
+ end
180
+
181
+ it "matches singleton emoji in combination with emoji variation selector" do
182
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_BASIC
183
+ assert_equal "😴\u{FE0F}", $&
184
+ end
185
+
186
+ it "does not match singleton emoji when in combination with text variation selector" do
187
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_BASIC
188
+ assert_nil $&
189
+ end
190
+
191
+ it "does not match textual singleton emoji" do
192
+ "▶ play button" =~ Unicode::Emoji::REGEX
193
+ assert_nil $&
194
+ end
195
+
196
+ it "does match textual singleton emoji in combination with emoji variation selector" do
197
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX
198
+ assert_equal "▶\u{FE0F}", $&
199
+ end
200
+
201
+ it "does not match singleton 'component' emoji codepoints" do
202
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_BASIC
203
+ assert_nil $&
204
+ end
205
+
206
+ it "does not match modified emoji" do
207
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_BASIC
208
+ assert_equal "🛌", $&
209
+ end
210
+
211
+ it "does not match region flags" do
212
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_BASIC
213
+ assert_nil $&
214
+ end
215
+
216
+ it "does not match emoji keycap sequences" do
217
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_BASIC
218
+ assert_nil $&
219
+ end
220
+
221
+ it "does not match tag sequences" do
222
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_BASIC
223
+ assert_equal "🏴", $& # only base flag is matched
224
+ end
225
+
226
+ it "does not match zwj sequences" do
227
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_BASIC
228
+ assert_equal "🤾", $&
229
+ end
230
+ end
231
+
232
+ describe "REGEX_TEXT" do
233
+ it "deos not match singleton emoji codepoints with emoji presentation and no variation selector" do
234
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_TEXT
235
+ assert_nil $&
236
+ end
237
+
238
+ it "does not match singleton emoji in combination with emoji variation selector" do
239
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_TEXT
240
+ assert_nil $&
241
+ end
242
+
243
+ it "does match singleton emoji in combination with text variation selector" do
244
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_TEXT
245
+ assert_equal "😴\u{FE0E}", $&
246
+ end
247
+
248
+ it "does match textual singleton emoji" do
249
+ "▶ play button" =~ Unicode::Emoji::REGEX_TEXT
250
+ assert_equal "▶", $&
251
+ end
252
+
253
+ it "does not match textual singleton emoji in combination with emoji variation selector" do
254
+ "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX_TEXT
255
+ assert_nil $&
256
+ end
257
+
258
+ it "does not match singleton 'component' emoji codepoints" do
259
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_TEXT
260
+ assert_nil $&
261
+ end
262
+
263
+ it "does not match modified emoji" do
264
+ "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_TEXT
265
+ assert_nil $&
266
+ end
267
+
268
+ it "does not match region flags" do
269
+ "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_TEXT
270
+ assert_nil $&
271
+ end
272
+
273
+ it "does not match emoji keycap sequences" do
274
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_TEXT
275
+ assert_nil $&
276
+ end
277
+
278
+ it "does not match tag sequences" do
279
+ "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_TEXT
280
+ assert_nil $&
281
+ end
282
+
283
+ it "does not match zwj sequences" do
284
+ "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_TEXT
285
+ assert_nil $&
286
+ end
287
+ end
288
+
289
+ describe "REGEX_ANY" do
290
+ it "returns any emoji-related codepoint (but no variation selectors or tags)" do
291
+ matches = "1 string 😴\u{FE0F} sleeping face with 🇵 and modifier 🏾, also 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland".scan(Unicode::Emoji::REGEX_ANY)
292
+ assert_equal ["1", "😴", "🇵", "🏾", "🏴"], matches
293
+ end
294
+ end
295
+ end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicode/emoji/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicode-emoji"
7
+ gem.version = Unicode::Emoji::VERSION
8
+ gem.summary = "Retrieve Emoji data about Unicode codepoints."
9
+ gem.description = "[Emoji #{Unicode::Emoji::EMOJI_VERSION}] Retrieve emoji data about Unicode codepoints. Also contains a regex to match emoji."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicode-emoji"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.0"
21
+ end
metadata ADDED
@@ -0,0 +1,61 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicode-emoji
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.9.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-04-08 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: "[Emoji 5.0] Retrieve emoji data about Unicode codepoints. Also contains
14
+ a regex to match emoji."
15
+ email:
16
+ - mail@janlelis.de
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - ".gitignore"
22
+ - ".travis.yml"
23
+ - CHANGELOG.md
24
+ - CODE_OF_CONDUCT.md
25
+ - Gemfile
26
+ - Gemfile.lock
27
+ - MIT-LICENSE.txt
28
+ - README.md
29
+ - Rakefile
30
+ - data/emoji.marshal.gz
31
+ - lib/unicode/emoji.rb
32
+ - lib/unicode/emoji/constants.rb
33
+ - lib/unicode/emoji/index.rb
34
+ - spec/unicode_emoji_spec.rb
35
+ - unicode-emoji.gemspec
36
+ homepage: https://github.com/janlelis/unicode-emoji
37
+ licenses:
38
+ - MIT
39
+ metadata: {}
40
+ post_install_message:
41
+ rdoc_options: []
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - "~>"
47
+ - !ruby/object:Gem::Version
48
+ version: '2.0'
49
+ required_rubygems_version: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ requirements: []
55
+ rubyforge_project:
56
+ rubygems_version: 2.6.8
57
+ signing_key:
58
+ specification_version: 4
59
+ summary: Retrieve Emoji data about Unicode codepoints.
60
+ test_files:
61
+ - spec/unicode_emoji_spec.rb