unicode-emoji 1.1.0 β†’ 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA256:
3
- metadata.gz: c6cdca65a25735347a97f077a43c00b1b30c84fa2aa7b688b7072b1af8624d1f
4
- data.tar.gz: e083a936a9f360ca0348fe6a3f76a12aa2fbf94826cbf30fd31db1abc23d9bc7
2
+ SHA1:
3
+ metadata.gz: 32bec9a0f826ab808cf77b3bf69e8248de000d99
4
+ data.tar.gz: 6c8a53dc8874ab6bf508aad2a914eded9a7a4889
5
5
  SHA512:
6
- metadata.gz: d408fec5b09dd66db4ea61fb476cd5c74570d1cd9f43451732619ea303292afeb28e7d65d56eb012676ecbd2bda61e4e03912f3fae66bb26f51abde5ff85ba6b
7
- data.tar.gz: 815d375a3de1d1cbedb64ced2530ee2c9c02a52223ced9bff0273231f8de09a135b7043f6159701c76af60046234b353ec14a5ae136aad36f998560e96988841
6
+ metadata.gz: c5ebaf7c4c6a66331af9c0f927f8f41079aaf89d3389c4ba84533c1f64fbd2b4456657971e05c437987568c6853f844f1084986721cede6da8979696e4efffd6
7
+ data.tar.gz: 3fc8af7fc6bdcaac8ac14ec148c4322a8d26a1e627895ec54a3a038a85062c9f00d37317208cdbe876cdc31603e83943eac569ee937bccfd7e44df15d239ac19
data/.travis.yml CHANGED
@@ -2,12 +2,13 @@ sudo: false
2
2
  language: ruby
3
3
 
4
4
  rvm:
5
+ - 2.6.1
6
+ - 2.5.3
7
+ - 2.4.5
8
+ - 2.3.8
5
9
  - ruby-head
6
- - 2.5.1
7
- - 2.4.4
8
- - 2.3.7
9
10
  - jruby-head
10
- - jruby-9.1.16.0
11
+ - jruby-9.2.6.0
11
12
 
12
13
  matrix:
13
14
  allow_failures:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 2.0.0
4
+
5
+ - Emoji 12.0 data (including valid subdivisions)
6
+ - Introduce new `REGEX_WELL_FORMED` to be able to match for invalid tag and region sequences
7
+ - Introduce new `*_INCLUDE_TEXT` regexes which include matching for textual presentation emoji
8
+ - Refactoring: Update Emoji matching to latest standard while keeping naming close to standard
9
+ - Issue warning when using `#list` method to retrieve outdated category
10
+ - Change matching for ZWJ sequences: Do not limit sequence to a maximum of 3 ZWJs
11
+
3
12
  ### 1.1.0
4
13
 
5
14
  - Emoji 11.0
data/MIT-LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2017, 2018 Jan Lelis, mail@janlelis.de
1
+ Copyright (c) 2017-2019 Jan Lelis, mail@janlelis.de
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining
4
4
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
- # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
1
+ # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
2
2
 
3
3
  A small Ruby library which provides Unicode Emoji data and regexes.
4
4
 
5
5
  Also includes a categorized list of recommended Emoji.
6
6
 
7
- Emoji version: **11.0**
7
+ Emoji version: **12.0** (February 2018)
8
8
 
9
- Supported Rubies: **2.5**, **2.4**, **2.3**
9
+ Supported Rubies: **2.6**, **2.5**, **2.4**, **2.3**
10
10
 
11
11
  If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem.
12
12
 
@@ -20,7 +20,7 @@ gem "unicode-emoji"
20
20
 
21
21
  ### Regex
22
22
 
23
- Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
23
+ The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
24
24
 
25
25
  ```ruby
26
26
  require "unicode/emoji"
@@ -40,16 +40,64 @@ string = "String which contains all kinds of emoji:
40
40
  string.scan(Unicode::Emoji::REGEX) # => ["😴", "▢️", "πŸ›ŒπŸ½", "πŸ‡΅πŸ‡Ή", "🏴󠁧󠁒󠁳󠁣󠁴󠁿", "2️⃣", "πŸ€ΎπŸ½β€β™€οΈ"]
41
41
  ```
42
42
 
43
+ #### Main Regexes
44
+
45
+ Matches (non-textual) Emoji of all kinds:
46
+
43
47
  Regex | Description | Example Matches | Example Non-Matches
44
48
  ------------------------------|-------------|-----------------|--------------------
45
- `Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
46
- `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`
47
- `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
48
- `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
49
- `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | -
49
+ `Unicode::Emoji::REGEX` | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
50
+ `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`
51
+ `Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻`
52
+
53
+ ##### Picking the Right Emoji Regex
54
+
55
+ - Usually you just want `REGEX` (RGI set)
56
+ - If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID`
57
+ - If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED`
58
+
59
+ Please see [the standard](http://www.unicode.org/reports/tr51/#Emoji_Sets) for details.
60
+
61
+ Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed)
62
+ ---------|-----------------------------|-----------------------|----------------------------------
63
+ Region "πŸ‡΅πŸ‡Ή" | Yes | Yes | Yes
64
+ Region "πŸ‡΅πŸ‡΅" | No | No | Yes
65
+ Tag Sequence "🏴󠁧󠁒󠁳󠁣󠁴󠁿" | Yes | Yes | Yes
66
+ Tag Sequence "🏴󠁧󠁒󠁑󠁧󠁒󠁿" | No | Yes | Yes
67
+ Tag Sequence "😴󠁧󠁒󠁑󠁑󠁑󠁿" | No | No | Yes
68
+ ZWJ Sequence "πŸ€ΎπŸ½β€β™€οΈ" | Yes | Yes | Yes
69
+ ZWJ Sequence "πŸ€ β€πŸ€’" | No | Yes | Yes
50
70
 
51
71
  More info about valid vs. recommended Emoji in this [blog article on Emojipedia](http://blog.emojipedia.org/unicode-behind-the-curtain/).
52
72
 
73
+ #### Singleton Regexes
74
+
75
+ Matches only simple one-codepoint (+ optional variation selector) Emoji:
76
+
77
+ Regex | Description | Example Matches | Example Non-Matches
78
+ ------------------------------|-------------|-----------------|--------------------
79
+ `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
80
+ `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
81
+
82
+ #### Include Textual Emoji
83
+
84
+ By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
85
+
86
+ Regex | Description | Example Matches | Example Non-Matches
87
+ ------------------------------|-------------|-----------------|--------------------
88
+ `Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
89
+ `Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅`
90
+ `Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά` | `🏻`
91
+
92
+ #### Partial Regexes
93
+
94
+ Matches potential Emoji parts (often, this is not what you want):
95
+
96
+ Regex | Description | Example Matches | Example Non-Matches
97
+ ------------------------------|-------------|-----------------|--------------------
98
+ `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | -
99
+
100
+
53
101
  ### List
54
102
 
55
103
  Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji:
@@ -65,6 +113,8 @@ Unicode::Emoji.list("Food & Drink", "food-asian")
65
113
  => ["🍱", "🍘", "πŸ™", "🍚", "πŸ›", "🍜", "🍝", "🍠", "🍒", "🍣", "🍀", "πŸ₯", "🍑", "\u{1F95F}", "\u{1F960}", "\u{1F961}"]
66
114
  ```
67
115
 
116
+ Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attemting to retrieve old categories using the `#list` method.
117
+
68
118
  A markdown file with all recommended Emoji can be found [in this gist](https://gist.github.com/janlelis/72f9be1f0ecca07372c64cf13894b801).
69
119
 
70
120
  ### Properties
@@ -87,5 +137,5 @@ Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
87
137
 
88
138
  ## MIT
89
139
 
90
- - Copyright (C) 2017, 2018 Jan Lelis <http://janlelis.com>. Released under the MIT license.
140
+ - Copyright (C) 2017-2019 Jan Lelis <http://janlelis.com>. Released under the MIT license.
91
141
  - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
Binary file
data/lib/unicode/emoji.rb CHANGED
@@ -18,8 +18,10 @@ module Unicode
18
18
  TEXT_VARIATION_SELECTOR = 0xFE0E
19
19
  EMOJI_TAG_BASE_FLAG = 0x1F3F4
20
20
  CANCEL_TAG = 0xE007F
21
+ TAGS = [*0xE0020..0xE007E]
21
22
  EMOJI_KEYCAP_SUFFIX = 0x20E3
22
23
  ZWJ = 0x200D
24
+ REGIONAL_INDICATORS = [*0x1F1E6..0x1F1FF]
23
25
 
24
26
  EMOJI_CHAR = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:E) }.keys.freeze
25
27
  EMOJI_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze
@@ -36,6 +38,10 @@ module Unicode
36
38
  RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze
37
39
 
38
40
  LIST = INDEX[:LIST].freeze.each_value(&:freeze)
41
+ LIST_REMOVED_KEYS = [
42
+ "Smileys & People",
43
+ "Component",
44
+ ]
39
45
 
40
46
  pack = ->(ord){ Regexp.escape(Array(ord).pack("U*")) }
41
47
  join = -> (*strings){ "(?:" + strings.join("|") + ")" }
@@ -61,6 +67,9 @@ module Unicode
61
67
  emoji_presentation + "(?!" + pack[TEXT_VARIATION_SELECTOR] + ")" + pack[EMOJI_VARIATION_SELECTOR] + "?",
62
68
  ]
63
69
 
70
+ non_component_emoji_presentation_sequence = \
71
+ "(?!" + emoji_component + ")" + emoji_presentation_sequence
72
+
64
73
  text_presentation_sequence = \
65
74
  join[
66
75
  pack_and_join[TEXT_PRESENTATION]+ "(?!" + join[emoji_modifier, pack[EMOJI_VARIATION_SELECTOR]] + ")" + pack[TEXT_VARIATION_SELECTOR] + "?",
@@ -73,9 +82,36 @@ module Unicode
73
82
  emoji_keycap_sequence = \
74
83
  pack_and_join[EMOJI_KEYCAPS] + pack[[EMOJI_VARIATION_SELECTOR, EMOJI_KEYCAP_SUFFIX]]
75
84
 
76
- emoji_valid_region_sequence = \
85
+ emoji_valid_flag_sequence = \
77
86
  pack_and_join[VALID_REGION_FLAGS]
78
87
 
88
+ emoji_well_formed_flag_sequence = \
89
+ "(?:" +
90
+ pack_and_join[REGIONAL_INDICATORS] +
91
+ pack_and_join[REGIONAL_INDICATORS] +
92
+ ")"
93
+
94
+ emoji_valid_core_sequence = \
95
+ join[
96
+ # emoji_character,
97
+ emoji_keycap_sequence,
98
+ emoji_modifier_sequence,
99
+ non_component_emoji_presentation_sequence,
100
+ emoji_valid_flag_sequence,
101
+ ]
102
+
103
+ emoji_well_formed_core_sequence = \
104
+ join[
105
+ # emoji_character,
106
+ emoji_keycap_sequence,
107
+ emoji_modifier_sequence,
108
+ non_component_emoji_presentation_sequence,
109
+ emoji_well_formed_flag_sequence,
110
+ ]
111
+
112
+ emoji_rgi_tag_sequence = \
113
+ pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS]
114
+
79
115
  emoji_valid_tag_sequence = \
80
116
  "(?:" +
81
117
  pack[EMOJI_TAG_BASE_FLAG] +
@@ -83,35 +119,60 @@ module Unicode
83
119
  pack[CANCEL_TAG] +
84
120
  ")"
85
121
 
86
- emoji_zwj_element = \
122
+ emoji_well_formed_tag_sequence = \
123
+ "(?:" +
124
+ join[
125
+ non_component_emoji_presentation_sequence,
126
+ emoji_modifier_sequence,
127
+ ] +
128
+ pack_and_join[TAGS] + "+" +
129
+ pack[CANCEL_TAG] +
130
+ ")"
131
+
132
+ emoji_rgi_zwj_sequence = \
133
+ pack_and_join[RECOMMENDED_ZWJ_SEQUENCES]
134
+
135
+ emoji_valid_zwj_element = \
87
136
  join[
88
137
  emoji_modifier_sequence,
89
138
  emoji_presentation_sequence,
90
139
  emoji_character,
91
140
  ]
92
141
 
93
- # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences
94
- REGEX = Regexp.compile(
95
- pack_and_join[RECOMMENDED_ZWJ_SEQUENCES] +
96
- ?| + pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS] +
97
- ?| + emoji_modifier_sequence +
98
- ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
99
- ?| + emoji_keycap_sequence +
100
- ?| + emoji_valid_region_sequence +
101
- ""
102
- )
142
+ emoji_valid_zwj_sequence = \
143
+ "(?:" +
144
+ "(?:" + emoji_valid_zwj_element + pack[ZWJ] + ")+" + emoji_valid_zwj_element +
145
+ ")"
146
+
147
+ emoji_rgi_sequence = \
148
+ join[
149
+ emoji_rgi_zwj_sequence,
150
+ emoji_rgi_tag_sequence,
151
+ emoji_valid_core_sequence,
152
+ ]
153
+
154
+ emoji_valid_sequence = \
155
+ join[
156
+ emoji_valid_zwj_sequence,
157
+ emoji_valid_tag_sequence,
158
+ emoji_valid_core_sequence,
159
+ ]
160
+
161
+ emoji_well_formed_sequence = \
162
+ join[
163
+ emoji_valid_zwj_sequence,
164
+ emoji_well_formed_tag_sequence,
165
+ emoji_well_formed_core_sequence,
166
+ ]
167
+
168
+ # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences (rgi)
169
+ REGEX = Regexp.compile(emoji_rgi_sequence)
103
170
 
104
171
  # Matches basic singleton emoji and all kind of valid sequences
105
- REGEX_VALID = Regexp.compile(
106
- # EMOJI_TAGS.map{ |base, spec| "(?:" + pack[base] + "[" + pack[spec] + "]+" + pack[CANCEL_TAG] + ")" }.join("|") +
107
- emoji_valid_tag_sequence +
108
- ?| + "(?:" + "(?:" + emoji_zwj_element + pack[ZWJ] + "){1,3}" + emoji_zwj_element + ")" +
109
- ?| + emoji_modifier_sequence +
110
- ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
111
- ?| + emoji_keycap_sequence +
112
- ?| + emoji_valid_region_sequence +
113
- ""
114
- )
172
+ REGEX_VALID = Regexp.compile(emoji_valid_sequence)
173
+
174
+ # Matches basic singleton emoji and all kind of sequences
175
+ REGEX_WELL_FORMED = Regexp.compile(emoji_well_formed_sequence)
115
176
 
116
177
  # Matches only basic single, non-textual emoji
117
178
  # Ignores "components" like modifiers or simple digits
@@ -125,11 +186,16 @@ module Unicode
125
186
  "(?!" + emoji_component + ")" + text_presentation_sequence
126
187
  )
127
188
 
128
- # Matches any emoji-related codepoint
189
+ # Matches any emoji-related codepoint - Use with caution (returns partil matches)
129
190
  REGEX_ANY = Regexp.compile(
130
191
  emoji_character
131
192
  )
132
193
 
194
+ # Combined REGEXes which also match for TEXTUAL emoji
195
+ REGEX_INCLUDE_TEXT = Regexp.union(REGEX, REGEX_TEXT)
196
+ REGEX_VALID_INCLUDE_TEXT = Regexp.union(REGEX_VALID, REGEX_TEXT)
197
+ REGEX_WELL_FORMED_INCLUDE_TEXT = Regexp.union(REGEX_WELL_FORMED, REGEX_TEXT)
198
+
133
199
  def self.properties(char)
134
200
  ord = get_codepoint_value(char)
135
201
  props = INDEX[:PROPERTIES][ord]
@@ -143,6 +209,9 @@ module Unicode
143
209
 
144
210
  def self.list(key = nil, sub_key = nil)
145
211
  return LIST unless key || sub_key
212
+ if LIST_REMOVED_KEYS.include?(key)
213
+ $stderr.puts "Warning(unicode-emoji): The category of #{key} does not exist anymore"
214
+ end
146
215
  LIST.dig(*[key, sub_key].compact)
147
216
  end
148
217
 
@@ -2,12 +2,12 @@
2
2
 
3
3
  module Unicode
4
4
  module Emoji
5
- VERSION = "1.1.0".freeze
6
- EMOJI_VERSION = "11.0".freeze
5
+ VERSION = "2.0.0".freeze
6
+ EMOJI_VERSION = "12.0".freeze
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
8
8
  INDEX_FILENAME = (DATA_DIRECTORY + '/emoji.marshal.gz').freeze
9
9
 
10
- ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false
10
+ ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false # As of Ruby 2.6.1, Emoji version 11 is included
11
11
  end
12
12
  end
13
13
 
@@ -158,7 +158,7 @@ describe Unicode::Emoji do
158
158
 
159
159
  it "does not match invalid tag sequences" do
160
160
  "🏴󠁧󠁒󠁑󠁑󠁑󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
161
- assert_equal "🏴", $&
161
+ assert_equal "🏴", $& # only base flag is matched
162
162
  end
163
163
 
164
164
  it "matches recommended zwj sequences" do
@@ -172,6 +172,88 @@ describe Unicode::Emoji do
172
172
  end
173
173
  end
174
174
 
175
+ describe "REGEX_WELL_FORMED" do
176
+ it "matches most singleton emoji codepoints" do
177
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
178
+ assert_equal "😴", $&
179
+ end
180
+
181
+ it "matches singleton emoji in combination with emoji variation selector" do
182
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
183
+ assert_equal "😴\u{FE0F}", $&
184
+ end
185
+
186
+ it "does not match singleton emoji when in combination with text variation selector" do
187
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
188
+ assert_nil $&
189
+ end
190
+
191
+ it "does not match textual singleton emoji" do
192
+ "β–Ά play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
193
+ assert_nil $&
194
+ end
195
+
196
+ it "matches textual singleton emoji in combination with emoji variation selector" do
197
+ "β–Ά\u{FE0F} play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
198
+ assert_equal "β–Ά\u{FE0F}", $&
199
+ end
200
+
201
+ it "does not match singleton 'component' emoji codepoints" do
202
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
203
+ assert_nil $&
204
+ end
205
+
206
+ it "matches modified emoji if modifier base emoji is used" do
207
+ "πŸ›ŒπŸ½ person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
208
+ assert_equal "πŸ›ŒπŸ½", $&
209
+ end
210
+
211
+ it "does not match modified emoji if no modifier base emoji is used" do
212
+ "🌡🏽 cactus" =~ Unicode::Emoji::REGEX_WELL_FORMED
213
+ assert_equal "🌡", $&
214
+ end
215
+
216
+ it "matches valid region flags" do
217
+ "πŸ‡΅πŸ‡Ή Portugal" =~ Unicode::Emoji::REGEX_WELL_FORMED
218
+ assert_equal "πŸ‡΅πŸ‡Ή", $&
219
+ end
220
+
221
+ it "does match invalid region flags" do
222
+ "πŸ‡΅πŸ‡΅ PP Land" =~ Unicode::Emoji::REGEX_WELL_FORMED
223
+ assert_equal "πŸ‡΅πŸ‡΅", $&
224
+ end
225
+
226
+ it "matches emoji keycap sequences" do
227
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_WELL_FORMED
228
+ assert_equal "2️⃣", $&
229
+ end
230
+
231
+ it "matches recommended tag sequences" do
232
+ "🏴󠁧󠁒󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_WELL_FORMED
233
+ assert_equal "🏴󠁧󠁒󠁳󠁣󠁴󠁿", $&
234
+ end
235
+
236
+ it "matches valid tag sequences, even though they are not recommended" do
237
+ "🏴󠁧󠁒󠁑󠁧󠁒󠁿 GB AGB" =~ Unicode::Emoji::REGEX_WELL_FORMED
238
+ assert_equal "🏴󠁧󠁒󠁑󠁧󠁒󠁿", $&
239
+ end
240
+
241
+ it "does match invalid tag sequences" do
242
+ "😴󠁧󠁒󠁑󠁑󠁑󠁿 GB AAA" =~ Unicode::Emoji::REGEX_WELL_FORMED
243
+ assert_equal "😴󠁧󠁒󠁑󠁑󠁑󠁿", $&
244
+ end
245
+
246
+ it "matches recommended zwj sequences" do
247
+ "πŸ€ΎπŸ½β€β™€οΈ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
248
+ assert_equal "πŸ€ΎπŸ½β€β™€οΈ", $&
249
+ end
250
+
251
+ it "matches valid zwj sequences, even though they are not recommended" do
252
+ "πŸ€ β€πŸ€’ vomiting cowboy" =~ Unicode::Emoji::REGEX_WELL_FORMED
253
+ assert_equal "πŸ€ β€πŸ€’", $&
254
+ end
255
+ end
256
+
175
257
  describe "REGEX_BASIC" do
176
258
  it "matches most singleton emoji codepoints" do
177
259
  "😴 sleeping face" =~ Unicode::Emoji::REGEX_BASIC
@@ -300,15 +382,21 @@ describe Unicode::Emoji do
300
382
 
301
383
  describe ".list" do
302
384
  it "returns a grouped list of emoji" do
303
- assert_includes Unicode::Emoji.list.keys, "Smileys & People"
385
+ assert_includes Unicode::Emoji.list.keys, "Smileys & Emotion"
304
386
  end
305
387
 
306
388
  it "sub-groups the list of emoji" do
307
- assert_includes Unicode::Emoji.list("Smileys & People").keys, "face-positive"
389
+ assert_includes Unicode::Emoji.list("Smileys & Emotion").keys, "face-glasses"
308
390
  end
309
391
 
310
392
  it "has emoji in sub-groups" do
311
- assert_includes Unicode::Emoji.list("Smileys & People", "face-positive"), "😎"
393
+ assert_includes Unicode::Emoji.list("Smileys & Emotion", "face-glasses"), "😎"
394
+ end
395
+
396
+ it "issues a warning if attempting to retrieve old category" do
397
+ assert_output nil, "Warning(unicode-emoji): The category of Smileys & People does not exist anymore\n" do
398
+ assert_nil Unicode::Emoji.list("Smileys & People", "face-positive")
399
+ end
312
400
  end
313
401
  end
314
402
  end
metadata CHANGED
@@ -1,16 +1,16 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-emoji
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 2.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-05 00:00:00.000000000 Z
11
+ date: 2019-02-19 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: "[Emoji 11.0] Retrieve emoji data about Unicode codepoints. Also contains
13
+ description: "[Emoji 12.0] Retrieve emoji data about Unicode codepoints. Also contains
14
14
  a regex to match emoji."
15
15
  email:
16
16
  - mail@janlelis.de
@@ -53,7 +53,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
53
53
  version: '0'
54
54
  requirements: []
55
55
  rubyforge_project:
56
- rubygems_version: 2.7.6
56
+ rubygems_version: 2.5.1
57
57
  signing_key:
58
58
  specification_version: 4
59
59
  summary: Retrieve Emoji data about Unicode codepoints.