unicode-emoji 1.1.0 β†’ 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA256:
3
- metadata.gz: c6cdca65a25735347a97f077a43c00b1b30c84fa2aa7b688b7072b1af8624d1f
4
- data.tar.gz: e083a936a9f360ca0348fe6a3f76a12aa2fbf94826cbf30fd31db1abc23d9bc7
2
+ SHA1:
3
+ metadata.gz: 32bec9a0f826ab808cf77b3bf69e8248de000d99
4
+ data.tar.gz: 6c8a53dc8874ab6bf508aad2a914eded9a7a4889
5
5
  SHA512:
6
- metadata.gz: d408fec5b09dd66db4ea61fb476cd5c74570d1cd9f43451732619ea303292afeb28e7d65d56eb012676ecbd2bda61e4e03912f3fae66bb26f51abde5ff85ba6b
7
- data.tar.gz: 815d375a3de1d1cbedb64ced2530ee2c9c02a52223ced9bff0273231f8de09a135b7043f6159701c76af60046234b353ec14a5ae136aad36f998560e96988841
6
+ metadata.gz: c5ebaf7c4c6a66331af9c0f927f8f41079aaf89d3389c4ba84533c1f64fbd2b4456657971e05c437987568c6853f844f1084986721cede6da8979696e4efffd6
7
+ data.tar.gz: 3fc8af7fc6bdcaac8ac14ec148c4322a8d26a1e627895ec54a3a038a85062c9f00d37317208cdbe876cdc31603e83943eac569ee937bccfd7e44df15d239ac19
data/.travis.yml CHANGED
@@ -2,12 +2,13 @@ sudo: false
2
2
  language: ruby
3
3
 
4
4
  rvm:
5
+ - 2.6.1
6
+ - 2.5.3
7
+ - 2.4.5
8
+ - 2.3.8
5
9
  - ruby-head
6
- - 2.5.1
7
- - 2.4.4
8
- - 2.3.7
9
10
  - jruby-head
10
- - jruby-9.1.16.0
11
+ - jruby-9.2.6.0
11
12
 
12
13
  matrix:
13
14
  allow_failures:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 2.0.0
4
+
5
+ - Emoji 12.0 data (including valid subdivisions)
6
+ - Introduce new `REGEX_WELL_FORMED` to be able to match for invalid tag and region sequences
7
+ - Introduce new `*_INCLUDE_TEXT` regexes which include matching for textual presentation emoji
8
+ - Refactoring: Update Emoji matching to latest standard while keeping naming close to standard
9
+ - Issue warning when using `#list` method to retrieve outdated category
10
+ - Change matching for ZWJ sequences: Do not limit sequence to a maximum of 3 ZWJs
11
+
3
12
  ### 1.1.0
4
13
 
5
14
  - Emoji 11.0
data/MIT-LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2017, 2018 Jan Lelis, mail@janlelis.de
1
+ Copyright (c) 2017-2019 Jan Lelis, mail@janlelis.de
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining
4
4
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
- # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
1
+ # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
2
2
 
3
3
  A small Ruby library which provides Unicode Emoji data and regexes.
4
4
 
5
5
  Also includes a categorized list of recommended Emoji.
6
6
 
7
- Emoji version: **11.0**
7
+ Emoji version: **12.0** (February 2018)
8
8
 
9
- Supported Rubies: **2.5**, **2.4**, **2.3**
9
+ Supported Rubies: **2.6**, **2.5**, **2.4**, **2.3**
10
10
 
11
11
  If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem.
12
12
 
@@ -20,7 +20,7 @@ gem "unicode-emoji"
20
20
 
21
21
  ### Regex
22
22
 
23
- Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
23
+ The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
24
24
 
25
25
  ```ruby
26
26
  require "unicode/emoji"
@@ -40,16 +40,64 @@ string = "String which contains all kinds of emoji:
40
40
  string.scan(Unicode::Emoji::REGEX) # => ["😴", "▢️", "πŸ›ŒπŸ½", "πŸ‡΅πŸ‡Ή", "🏴󠁧󠁒󠁳󠁣󠁴󠁿", "2️⃣", "πŸ€ΎπŸ½β€β™€οΈ"]
41
41
  ```
42
42
 
43
+ #### Main Regexes
44
+
45
+ Matches (non-textual) Emoji of all kinds:
46
+
43
47
  Regex | Description | Example Matches | Example Non-Matches
44
48
  ------------------------------|-------------|-----------------|--------------------
45
- `Unicode::Emoji::REGEX` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
46
- `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`
47
- `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
48
- `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
49
- `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | -
49
+ `Unicode::Emoji::REGEX` | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
50
+ `Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’` | `😴︎`, `β–Ά`, `🏻`, `πŸ‡΅πŸ‡΅`
51
+ `Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅` | `😴︎`, `β–Ά`, `🏻`
52
+
53
+ ##### Picking the Right Emoji Regex
54
+
55
+ - Usually you just want `REGEX` (RGI set)
56
+ - If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID`
57
+ - If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED`
58
+
59
+ Please see [the standard](http://www.unicode.org/reports/tr51/#Emoji_Sets) for details.
60
+
61
+ Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed)
62
+ ---------|-----------------------------|-----------------------|----------------------------------
63
+ Region "πŸ‡΅πŸ‡Ή" | Yes | Yes | Yes
64
+ Region "πŸ‡΅πŸ‡΅" | No | No | Yes
65
+ Tag Sequence "🏴󠁧󠁒󠁳󠁣󠁴󠁿" | Yes | Yes | Yes
66
+ Tag Sequence "🏴󠁧󠁒󠁑󠁧󠁒󠁿" | No | Yes | Yes
67
+ Tag Sequence "😴󠁧󠁒󠁑󠁑󠁑󠁿" | No | No | Yes
68
+ ZWJ Sequence "πŸ€ΎπŸ½β€β™€οΈ" | Yes | Yes | Yes
69
+ ZWJ Sequence "πŸ€ β€πŸ€’" | No | Yes | Yes
50
70
 
51
71
  More info about valid vs. recommended Emoji in this [blog article on Emojipedia](http://blog.emojipedia.org/unicode-behind-the-curtain/).
52
72
 
73
+ #### Singleton Regexes
74
+
75
+ Matches only simple one-codepoint (+ optional variation selector) Emoji:
76
+
77
+ Regex | Description | Example Matches | Example Non-Matches
78
+ ------------------------------|-------------|-----------------|--------------------
79
+ `Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▢️` | `😴︎`, `β–Ά`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
80
+ `Unicode::Emoji::REGEX_TEXT` | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `β–Ά` | `😴`, `▢️`, `🏻`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `πŸ‡΅πŸ‡΅`,`2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`
81
+
82
+ #### Include Textual Emoji
83
+
84
+ By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
85
+
86
+ Regex | Description | Example Matches | Example Non-Matches
87
+ ------------------------------|-------------|-----------------|--------------------
88
+ `Unicode::Emoji::REGEX_INCLUDE_TEXT` | `REGEX` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ β€πŸ€’`
89
+ `Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `😴︎`, `β–Ά` | `🏻`, `πŸ‡΅πŸ‡΅`
90
+ `Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▢️`, `πŸ›ŒπŸ½`, `πŸ‡΅πŸ‡Ή`, `2️⃣`, `🏴󠁧󠁒󠁳󠁣󠁴󠁿`, `🏴󠁧󠁒󠁑󠁧󠁒󠁿`, `πŸ€ΎπŸ½β€β™€οΈ`, `πŸ€ β€πŸ€’`, `πŸ‡΅πŸ‡΅`, `😴︎`, `β–Ά` | `🏻`
91
+
92
+ #### Partial Regexes
93
+
94
+ Matches potential Emoji parts (often, this is not what you want):
95
+
96
+ Regex | Description | Example Matches | Example Non-Matches
97
+ ------------------------------|-------------|-----------------|--------------------
98
+ `Unicode::Emoji::REGEX_ANY` | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `😴`, `β–Ά`, `🏻`, `πŸ›Œ`, `🏽`, `πŸ‡΅`, `πŸ‡Ή`, `2`, `🏴`, `🀾`, `♀`, `🀠`, `🀒` | -
99
+
100
+
53
101
  ### List
54
102
 
55
103
  Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji:
@@ -65,6 +113,8 @@ Unicode::Emoji.list("Food & Drink", "food-asian")
65
113
  => ["🍱", "🍘", "πŸ™", "🍚", "πŸ›", "🍜", "🍝", "🍠", "🍒", "🍣", "🍀", "πŸ₯", "🍑", "\u{1F95F}", "\u{1F960}", "\u{1F961}"]
66
114
  ```
67
115
 
116
+ Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attemting to retrieve old categories using the `#list` method.
117
+
68
118
  A markdown file with all recommended Emoji can be found [in this gist](https://gist.github.com/janlelis/72f9be1f0ecca07372c64cf13894b801).
69
119
 
70
120
  ### Properties
@@ -87,5 +137,5 @@ Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
87
137
 
88
138
  ## MIT
89
139
 
90
- - Copyright (C) 2017, 2018 Jan Lelis <http://janlelis.com>. Released under the MIT license.
140
+ - Copyright (C) 2017-2019 Jan Lelis <http://janlelis.com>. Released under the MIT license.
91
141
  - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
Binary file
data/lib/unicode/emoji.rb CHANGED
@@ -18,8 +18,10 @@ module Unicode
18
18
  TEXT_VARIATION_SELECTOR = 0xFE0E
19
19
  EMOJI_TAG_BASE_FLAG = 0x1F3F4
20
20
  CANCEL_TAG = 0xE007F
21
+ TAGS = [*0xE0020..0xE007E]
21
22
  EMOJI_KEYCAP_SUFFIX = 0x20E3
22
23
  ZWJ = 0x200D
24
+ REGIONAL_INDICATORS = [*0x1F1E6..0x1F1FF]
23
25
 
24
26
  EMOJI_CHAR = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:E) }.keys.freeze
25
27
  EMOJI_PRESENTATION = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze
@@ -36,6 +38,10 @@ module Unicode
36
38
  RECOMMENDED_ZWJ_SEQUENCES = INDEX[:ZWJ].freeze
37
39
 
38
40
  LIST = INDEX[:LIST].freeze.each_value(&:freeze)
41
+ LIST_REMOVED_KEYS = [
42
+ "Smileys & People",
43
+ "Component",
44
+ ]
39
45
 
40
46
  pack = ->(ord){ Regexp.escape(Array(ord).pack("U*")) }
41
47
  join = -> (*strings){ "(?:" + strings.join("|") + ")" }
@@ -61,6 +67,9 @@ module Unicode
61
67
  emoji_presentation + "(?!" + pack[TEXT_VARIATION_SELECTOR] + ")" + pack[EMOJI_VARIATION_SELECTOR] + "?",
62
68
  ]
63
69
 
70
+ non_component_emoji_presentation_sequence = \
71
+ "(?!" + emoji_component + ")" + emoji_presentation_sequence
72
+
64
73
  text_presentation_sequence = \
65
74
  join[
66
75
  pack_and_join[TEXT_PRESENTATION]+ "(?!" + join[emoji_modifier, pack[EMOJI_VARIATION_SELECTOR]] + ")" + pack[TEXT_VARIATION_SELECTOR] + "?",
@@ -73,9 +82,36 @@ module Unicode
73
82
  emoji_keycap_sequence = \
74
83
  pack_and_join[EMOJI_KEYCAPS] + pack[[EMOJI_VARIATION_SELECTOR, EMOJI_KEYCAP_SUFFIX]]
75
84
 
76
- emoji_valid_region_sequence = \
85
+ emoji_valid_flag_sequence = \
77
86
  pack_and_join[VALID_REGION_FLAGS]
78
87
 
88
+ emoji_well_formed_flag_sequence = \
89
+ "(?:" +
90
+ pack_and_join[REGIONAL_INDICATORS] +
91
+ pack_and_join[REGIONAL_INDICATORS] +
92
+ ")"
93
+
94
+ emoji_valid_core_sequence = \
95
+ join[
96
+ # emoji_character,
97
+ emoji_keycap_sequence,
98
+ emoji_modifier_sequence,
99
+ non_component_emoji_presentation_sequence,
100
+ emoji_valid_flag_sequence,
101
+ ]
102
+
103
+ emoji_well_formed_core_sequence = \
104
+ join[
105
+ # emoji_character,
106
+ emoji_keycap_sequence,
107
+ emoji_modifier_sequence,
108
+ non_component_emoji_presentation_sequence,
109
+ emoji_well_formed_flag_sequence,
110
+ ]
111
+
112
+ emoji_rgi_tag_sequence = \
113
+ pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS]
114
+
79
115
  emoji_valid_tag_sequence = \
80
116
  "(?:" +
81
117
  pack[EMOJI_TAG_BASE_FLAG] +
@@ -83,35 +119,60 @@ module Unicode
83
119
  pack[CANCEL_TAG] +
84
120
  ")"
85
121
 
86
- emoji_zwj_element = \
122
+ emoji_well_formed_tag_sequence = \
123
+ "(?:" +
124
+ join[
125
+ non_component_emoji_presentation_sequence,
126
+ emoji_modifier_sequence,
127
+ ] +
128
+ pack_and_join[TAGS] + "+" +
129
+ pack[CANCEL_TAG] +
130
+ ")"
131
+
132
+ emoji_rgi_zwj_sequence = \
133
+ pack_and_join[RECOMMENDED_ZWJ_SEQUENCES]
134
+
135
+ emoji_valid_zwj_element = \
87
136
  join[
88
137
  emoji_modifier_sequence,
89
138
  emoji_presentation_sequence,
90
139
  emoji_character,
91
140
  ]
92
141
 
93
- # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences
94
- REGEX = Regexp.compile(
95
- pack_and_join[RECOMMENDED_ZWJ_SEQUENCES] +
96
- ?| + pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS] +
97
- ?| + emoji_modifier_sequence +
98
- ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
99
- ?| + emoji_keycap_sequence +
100
- ?| + emoji_valid_region_sequence +
101
- ""
102
- )
142
+ emoji_valid_zwj_sequence = \
143
+ "(?:" +
144
+ "(?:" + emoji_valid_zwj_element + pack[ZWJ] + ")+" + emoji_valid_zwj_element +
145
+ ")"
146
+
147
+ emoji_rgi_sequence = \
148
+ join[
149
+ emoji_rgi_zwj_sequence,
150
+ emoji_rgi_tag_sequence,
151
+ emoji_valid_core_sequence,
152
+ ]
153
+
154
+ emoji_valid_sequence = \
155
+ join[
156
+ emoji_valid_zwj_sequence,
157
+ emoji_valid_tag_sequence,
158
+ emoji_valid_core_sequence,
159
+ ]
160
+
161
+ emoji_well_formed_sequence = \
162
+ join[
163
+ emoji_valid_zwj_sequence,
164
+ emoji_well_formed_tag_sequence,
165
+ emoji_well_formed_core_sequence,
166
+ ]
167
+
168
+ # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences (rgi)
169
+ REGEX = Regexp.compile(emoji_rgi_sequence)
103
170
 
104
171
  # Matches basic singleton emoji and all kind of valid sequences
105
- REGEX_VALID = Regexp.compile(
106
- # EMOJI_TAGS.map{ |base, spec| "(?:" + pack[base] + "[" + pack[spec] + "]+" + pack[CANCEL_TAG] + ")" }.join("|") +
107
- emoji_valid_tag_sequence +
108
- ?| + "(?:" + "(?:" + emoji_zwj_element + pack[ZWJ] + "){1,3}" + emoji_zwj_element + ")" +
109
- ?| + emoji_modifier_sequence +
110
- ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
111
- ?| + emoji_keycap_sequence +
112
- ?| + emoji_valid_region_sequence +
113
- ""
114
- )
172
+ REGEX_VALID = Regexp.compile(emoji_valid_sequence)
173
+
174
+ # Matches basic singleton emoji and all kind of sequences
175
+ REGEX_WELL_FORMED = Regexp.compile(emoji_well_formed_sequence)
115
176
 
116
177
  # Matches only basic single, non-textual emoji
117
178
  # Ignores "components" like modifiers or simple digits
@@ -125,11 +186,16 @@ module Unicode
125
186
  "(?!" + emoji_component + ")" + text_presentation_sequence
126
187
  )
127
188
 
128
- # Matches any emoji-related codepoint
189
+ # Matches any emoji-related codepoint - Use with caution (returns partil matches)
129
190
  REGEX_ANY = Regexp.compile(
130
191
  emoji_character
131
192
  )
132
193
 
194
+ # Combined REGEXes which also match for TEXTUAL emoji
195
+ REGEX_INCLUDE_TEXT = Regexp.union(REGEX, REGEX_TEXT)
196
+ REGEX_VALID_INCLUDE_TEXT = Regexp.union(REGEX_VALID, REGEX_TEXT)
197
+ REGEX_WELL_FORMED_INCLUDE_TEXT = Regexp.union(REGEX_WELL_FORMED, REGEX_TEXT)
198
+
133
199
  def self.properties(char)
134
200
  ord = get_codepoint_value(char)
135
201
  props = INDEX[:PROPERTIES][ord]
@@ -143,6 +209,9 @@ module Unicode
143
209
 
144
210
  def self.list(key = nil, sub_key = nil)
145
211
  return LIST unless key || sub_key
212
+ if LIST_REMOVED_KEYS.include?(key)
213
+ $stderr.puts "Warning(unicode-emoji): The category of #{key} does not exist anymore"
214
+ end
146
215
  LIST.dig(*[key, sub_key].compact)
147
216
  end
148
217
 
@@ -2,12 +2,12 @@
2
2
 
3
3
  module Unicode
4
4
  module Emoji
5
- VERSION = "1.1.0".freeze
6
- EMOJI_VERSION = "11.0".freeze
5
+ VERSION = "2.0.0".freeze
6
+ EMOJI_VERSION = "12.0".freeze
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
8
8
  INDEX_FILENAME = (DATA_DIRECTORY + '/emoji.marshal.gz').freeze
9
9
 
10
- ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false
10
+ ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false # As of Ruby 2.6.1, Emoji version 11 is included
11
11
  end
12
12
  end
13
13
 
@@ -158,7 +158,7 @@ describe Unicode::Emoji do
158
158
 
159
159
  it "does not match invalid tag sequences" do
160
160
  "🏴󠁧󠁒󠁑󠁑󠁑󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
161
- assert_equal "🏴", $&
161
+ assert_equal "🏴", $& # only base flag is matched
162
162
  end
163
163
 
164
164
  it "matches recommended zwj sequences" do
@@ -172,6 +172,88 @@ describe Unicode::Emoji do
172
172
  end
173
173
  end
174
174
 
175
+ describe "REGEX_WELL_FORMED" do
176
+ it "matches most singleton emoji codepoints" do
177
+ "😴 sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
178
+ assert_equal "😴", $&
179
+ end
180
+
181
+ it "matches singleton emoji in combination with emoji variation selector" do
182
+ "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
183
+ assert_equal "😴\u{FE0F}", $&
184
+ end
185
+
186
+ it "does not match singleton emoji when in combination with text variation selector" do
187
+ "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
188
+ assert_nil $&
189
+ end
190
+
191
+ it "does not match textual singleton emoji" do
192
+ "β–Ά play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
193
+ assert_nil $&
194
+ end
195
+
196
+ it "matches textual singleton emoji in combination with emoji variation selector" do
197
+ "β–Ά\u{FE0F} play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
198
+ assert_equal "β–Ά\u{FE0F}", $&
199
+ end
200
+
201
+ it "does not match singleton 'component' emoji codepoints" do
202
+ "🏻 light skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
203
+ assert_nil $&
204
+ end
205
+
206
+ it "matches modified emoji if modifier base emoji is used" do
207
+ "πŸ›ŒπŸ½ person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
208
+ assert_equal "πŸ›ŒπŸ½", $&
209
+ end
210
+
211
+ it "does not match modified emoji if no modifier base emoji is used" do
212
+ "🌡🏽 cactus" =~ Unicode::Emoji::REGEX_WELL_FORMED
213
+ assert_equal "🌡", $&
214
+ end
215
+
216
+ it "matches valid region flags" do
217
+ "πŸ‡΅πŸ‡Ή Portugal" =~ Unicode::Emoji::REGEX_WELL_FORMED
218
+ assert_equal "πŸ‡΅πŸ‡Ή", $&
219
+ end
220
+
221
+ it "does match invalid region flags" do
222
+ "πŸ‡΅πŸ‡΅ PP Land" =~ Unicode::Emoji::REGEX_WELL_FORMED
223
+ assert_equal "πŸ‡΅πŸ‡΅", $&
224
+ end
225
+
226
+ it "matches emoji keycap sequences" do
227
+ "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_WELL_FORMED
228
+ assert_equal "2️⃣", $&
229
+ end
230
+
231
+ it "matches recommended tag sequences" do
232
+ "🏴󠁧󠁒󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_WELL_FORMED
233
+ assert_equal "🏴󠁧󠁒󠁳󠁣󠁴󠁿", $&
234
+ end
235
+
236
+ it "matches valid tag sequences, even though they are not recommended" do
237
+ "🏴󠁧󠁒󠁑󠁧󠁒󠁿 GB AGB" =~ Unicode::Emoji::REGEX_WELL_FORMED
238
+ assert_equal "🏴󠁧󠁒󠁑󠁧󠁒󠁿", $&
239
+ end
240
+
241
+ it "does match invalid tag sequences" do
242
+ "😴󠁧󠁒󠁑󠁑󠁑󠁿 GB AAA" =~ Unicode::Emoji::REGEX_WELL_FORMED
243
+ assert_equal "😴󠁧󠁒󠁑󠁑󠁑󠁿", $&
244
+ end
245
+
246
+ it "matches recommended zwj sequences" do
247
+ "πŸ€ΎπŸ½β€β™€οΈ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
248
+ assert_equal "πŸ€ΎπŸ½β€β™€οΈ", $&
249
+ end
250
+
251
+ it "matches valid zwj sequences, even though they are not recommended" do
252
+ "πŸ€ β€πŸ€’ vomiting cowboy" =~ Unicode::Emoji::REGEX_WELL_FORMED
253
+ assert_equal "πŸ€ β€πŸ€’", $&
254
+ end
255
+ end
256
+
175
257
  describe "REGEX_BASIC" do
176
258
  it "matches most singleton emoji codepoints" do
177
259
  "😴 sleeping face" =~ Unicode::Emoji::REGEX_BASIC
@@ -300,15 +382,21 @@ describe Unicode::Emoji do
300
382
 
301
383
  describe ".list" do
302
384
  it "returns a grouped list of emoji" do
303
- assert_includes Unicode::Emoji.list.keys, "Smileys & People"
385
+ assert_includes Unicode::Emoji.list.keys, "Smileys & Emotion"
304
386
  end
305
387
 
306
388
  it "sub-groups the list of emoji" do
307
- assert_includes Unicode::Emoji.list("Smileys & People").keys, "face-positive"
389
+ assert_includes Unicode::Emoji.list("Smileys & Emotion").keys, "face-glasses"
308
390
  end
309
391
 
310
392
  it "has emoji in sub-groups" do
311
- assert_includes Unicode::Emoji.list("Smileys & People", "face-positive"), "😎"
393
+ assert_includes Unicode::Emoji.list("Smileys & Emotion", "face-glasses"), "😎"
394
+ end
395
+
396
+ it "issues a warning if attempting to retrieve old category" do
397
+ assert_output nil, "Warning(unicode-emoji): The category of Smileys & People does not exist anymore\n" do
398
+ assert_nil Unicode::Emoji.list("Smileys & People", "face-positive")
399
+ end
312
400
  end
313
401
  end
314
402
  end
metadata CHANGED
@@ -1,16 +1,16 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-emoji
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 2.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-06-05 00:00:00.000000000 Z
11
+ date: 2019-02-19 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: "[Emoji 11.0] Retrieve emoji data about Unicode codepoints. Also contains
13
+ description: "[Emoji 12.0] Retrieve emoji data about Unicode codepoints. Also contains
14
14
  a regex to match emoji."
15
15
  email:
16
16
  - mail@janlelis.de
@@ -53,7 +53,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
53
53
  version: '0'
54
54
  requirements: []
55
55
  rubyforge_project:
56
- rubygems_version: 2.7.6
56
+ rubygems_version: 2.5.1
57
57
  signing_key:
58
58
  specification_version: 4
59
59
  summary: Retrieve Emoji data about Unicode codepoints.