RubyGems - unicode-emoji - Versions diffs - 1.1.0 → 2.0.0 - Mend

unicode-emoji 1.1.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +5 -5
data/.travis.yml +5 -4
data/CHANGELOG.md +9 -0
data/MIT-LICENSE.txt +1 -1
data/README.md +60 -10
data/data/emoji.marshal.gz +0 -0
data/lib/unicode/emoji.rb +92 -23
data/lib/unicode/emoji/constants.rb +3 -3
data/spec/unicode_emoji_spec.rb +92 -4
metadata +4 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA256:
-  metadata.gz: c6cdca65a25735347a97f077a43c00b1b30c84fa2aa7b688b7072b1af8624d1f
-  data.tar.gz: e083a936a9f360ca0348fe6a3f76a12aa2fbf94826cbf30fd31db1abc23d9bc7
+SHA1:
+  metadata.gz: 32bec9a0f826ab808cf77b3bf69e8248de000d99
+  data.tar.gz: 6c8a53dc8874ab6bf508aad2a914eded9a7a4889
 SHA512:
-  metadata.gz: d408fec5b09dd66db4ea61fb476cd5c74570d1cd9f43451732619ea303292afeb28e7d65d56eb012676ecbd2bda61e4e03912f3fae66bb26f51abde5ff85ba6b
-  data.tar.gz: 815d375a3de1d1cbedb64ced2530ee2c9c02a52223ced9bff0273231f8de09a135b7043f6159701c76af60046234b353ec14a5ae136aad36f998560e96988841
+  metadata.gz: c5ebaf7c4c6a66331af9c0f927f8f41079aaf89d3389c4ba84533c1f64fbd2b4456657971e05c437987568c6853f844f1084986721cede6da8979696e4efffd6
+  data.tar.gz: 3fc8af7fc6bdcaac8ac14ec148c4322a8d26a1e627895ec54a3a038a85062c9f00d37317208cdbe876cdc31603e83943eac569ee937bccfd7e44df15d239ac19

data/.travis.yml CHANGED Viewed

@@ -2,12 +2,13 @@ sudo: false
 language: ruby
 rvm:
+- 2.6.1
+- 2.5.3
+- 2.4.5
+- 2.3.8
 - ruby-head
-- 2.5.1
-- 2.4.4
-- 2.3.7
 - jruby-head
-- jruby-9.1.16.0
+- jruby-9.2.6.0
 matrix:
   allow_failures:

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,14 @@
 ## CHANGELOG
+### 2.0.0
+- Emoji 12.0 data (including valid subdivisions)
+- Introduce new `REGEX_WELL_FORMED` to be able to match for invalid tag and region sequences
+- Introduce new `*_INCLUDE_TEXT` regexes which include matching for textual presentation emoji
+- Refactoring: Update Emoji matching to latest standard while keeping naming close to standard
+- Issue warning when using `#list` method to retrieve outdated category
+- Change matching for ZWJ sequences: Do not limit sequence to a maximum of 3 ZWJs
 ### 1.1.0
 - Emoji 11.0

data/MIT-LICENSE.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-Copyright (c) 2017, 2018 Jan Lelis, mail@janlelis.de
+Copyright (c) 2017-2019 Jan Lelis, mail@janlelis.de
 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the

data/README.md CHANGED Viewed

@@ -1,12 +1,12 @@
-# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](http://badge.fury.io/rb/unicode-emoji)  [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
+# Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji)  [![[travis]](https://travis-ci.org/janlelis/unicode-emoji.svg)](https://travis-ci.org/janlelis/unicode-emoji)
 A small Ruby library which provides Unicode Emoji data and regexes.
 Also includes a categorized list of recommended Emoji.
-Emoji version: **11.0**
+Emoji version: **12.0** (February 2018)
-Supported Rubies: **2.5**, **2.4**, **2.3**
+Supported Rubies: **2.6**, **2.5**, **2.4**, **2.3**
 If you are stuck on an older Ruby version, checkout the latest [0.9 version](https://rubygems.org/gems/unicode-emoji/versions/0.9.3) of this gem.
@@ -20,7 +20,7 @@ gem "unicode-emoji"
 ### Regex
-Five Emoji regexes are included, which are compiled out of various Emoji Unicode data.
+The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.
 ```ruby
 require "unicode/emoji"
@@ -40,16 +40,64 @@ string = "String which contains all kinds of emoji:
 string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴󠁧󠁢󠁳󠁣󠁴󠁿", "2️⃣", "🤾🏽‍♀️"]
 ```
+#### Main Regexes
+Matches (non-textual) Emoji of all kinds:
 Regex                         | Description | Example Matches | Example Non-Matches
 ------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX`       | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences, but restrict ZWJ and TAG sequences to recommended sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢` | `😴︎`, `▶`, `🏻`, `🇵🇵`
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences | `😴`, `▶️` | `😴︎`, `▶`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
-`Unicode::Emoji::REGEX_TEXT`  | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
-`Unicode::Emoji::REGEX_ANY`   | Matches any Emoji-related codepoint (but no variation selectors or tags) | `😴`, `▶`, `🏻`, `🛌`, `🏽`, `🇵`, `🇹`, `2`, `🏴`, `🤾`, `♀`, `🤠`, `🤢` | -
+`Unicode::Emoji::REGEX`       | **Use this if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *recommended* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *valid* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢` | `😴︎`, `▶`, `🏻`, `🇵🇵`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of *well-formed* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`,  `🇵🇵` | `😴︎`, `▶`, `🏻`
+##### Picking the Right Emoji Regex
+- Usually you just want `REGEX` (RGI set)
+- If you want broader matching (e.g. more sub-regions), choose `REGEX_VALID`
+- If you even want to match for invalid sequences, too, use `REGEX_WELL_FORMED`
+Please see [the standard](http://www.unicode.org/reports/tr51/#Emoji_Sets) for details.
+Property | `REGEX` (RGI / Recommended) | `REGEX_VALID` (Valid) | `REGEX_WELL_FORMED` (Well-formed)
+---------|-----------------------------|-----------------------|----------------------------------
+Region "🇵🇹"                    | Yes | Yes | Yes
+Region "🇵🇵"                   | No  | No  | Yes
+Tag Sequence "🏴󠁧󠁢󠁳󠁣󠁴󠁿"              | Yes | Yes | Yes
+Tag Sequence "🏴󠁧󠁢󠁡󠁧󠁢󠁿"              | No  | Yes | Yes
+Tag Sequence "😴󠁧󠁢󠁡󠁡󠁡󠁿"              | No  | No  | Yes
+ZWJ Sequence "🤾🏽‍♀️"           | Yes | Yes | Yes
+ZWJ Sequence "🤠‍🤢"            | No  | Yes | Yes
 More info about valid vs. recommended Emoji in this [blog article on Emojipedia](http://blog.emojipedia.org/unicode-behind-the-curtain/).
+#### Singleton Regexes
+Matches only simple one-codepoint (+ optional variation selector) Emoji:
+Regex                         | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▶️` | `😴︎`, `▶`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
+`Unicode::Emoji::REGEX_TEXT`  | Matches only textual singleton Emoji (except for singleton components, like digit 1) | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`
+#### Include Textual Emoji
+By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the `_INCLUDE_TEXT` suffix:
+Regex                         | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_INCLUDE_TEXT`       | `REGEX` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `😴︎`, `▶` | `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`
+`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`, `😴︎`, `▶` | `🏻`, `🇵🇵`
+`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤠‍🤢`,  `🇵🇵`, `😴︎`, `▶` | `🏻`
+#### Partial Regexes
+Matches potential Emoji parts (often, this is not what you want):
+Regex                         | Description | Example Matches | Example Non-Matches
+------------------------------|-------------|-----------------|--------------------
+`Unicode::Emoji::REGEX_ANY`   | Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! | `😴`, `▶`, `🏻`, `🛌`, `🏽`, `🇵`, `🇹`, `2`, `🏴`, `🤾`, `♀`, `🤠`, `🤢` | -
 ### List
 Use `Unicode::Emoji::LIST` or the list method to get a grouped (and ordered) list of Emoji:
@@ -65,6 +113,8 @@ Unicode::Emoji.list("Food & Drink", "food-asian")
 => ["🍱", "🍘", "🍙", "🍚", "🍛", "🍜", "🍝", "🍠", "🍢", "🍣", "🍤", "🍥", "🍡", "\u{1F95F}", "\u{1F960}", "\u{1F961}"]
 ```
+Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attemting to retrieve old categories using the `#list` method.
 A markdown file with all recommended Emoji can be found [in this gist](https://gist.github.com/janlelis/72f9be1f0ecca07372c64cf13894b801).
 ### Properties
@@ -87,5 +137,5 @@ Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
 ## MIT
-- Copyright (C) 2017, 2018 Jan Lelis <http://janlelis.com>. Released under the MIT license.
+- Copyright (C) 2017-2019 Jan Lelis <http://janlelis.com>. Released under the MIT license.
 - Unicode data: http://www.unicode.org/copyright.html#Exhibit1

data/data/emoji.marshal.gz CHANGED Viewed

Binary file

data/lib/unicode/emoji.rb CHANGED Viewed

@@ -18,8 +18,10 @@ module Unicode
     TEXT_VARIATION_SELECTOR       = 0xFE0E
     EMOJI_TAG_BASE_FLAG           = 0x1F3F4
     CANCEL_TAG                    = 0xE007F
+    TAGS                          = [*0xE0020..0xE007E]
     EMOJI_KEYCAP_SUFFIX           = 0x20E3
     ZWJ                           = 0x200D
+    REGIONAL_INDICATORS           = [*0x1F1E6..0x1F1FF]
     EMOJI_CHAR                    = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:E) }.keys.freeze
     EMOJI_PRESENTATION            = INDEX[:PROPERTIES].select{ |ord, props| props.include?(:P) }.keys.freeze
@@ -36,6 +38,10 @@ module Unicode
     RECOMMENDED_ZWJ_SEQUENCES     = INDEX[:ZWJ].freeze
     LIST                          = INDEX[:LIST].freeze.each_value(&:freeze)
+    LIST_REMOVED_KEYS             = [
+      "Smileys & People",
+      "Component",
+    ]
     pack = ->(ord){ Regexp.escape(Array(ord).pack("U*")) }
     join = -> (*strings){ "(?:" + strings.join("|") + ")" }
@@ -61,6 +67,9 @@ module Unicode
         emoji_presentation + "(?!" + pack[TEXT_VARIATION_SELECTOR] + ")" + pack[EMOJI_VARIATION_SELECTOR] + "?",
       ]
+    non_component_emoji_presentation_sequence = \
+      "(?!" + emoji_component + ")" + emoji_presentation_sequence
     text_presentation_sequence = \
       join[
         pack_and_join[TEXT_PRESENTATION]+ "(?!" + join[emoji_modifier, pack[EMOJI_VARIATION_SELECTOR]] + ")" + pack[TEXT_VARIATION_SELECTOR] + "?",
@@ -73,9 +82,36 @@ module Unicode
     emoji_keycap_sequence = \
       pack_and_join[EMOJI_KEYCAPS] + pack[[EMOJI_VARIATION_SELECTOR, EMOJI_KEYCAP_SUFFIX]]
-    emoji_valid_region_sequence = \
+    emoji_valid_flag_sequence = \
       pack_and_join[VALID_REGION_FLAGS]
+    emoji_well_formed_flag_sequence = \
+      "(?:" +
+        pack_and_join[REGIONAL_INDICATORS] +
+        pack_and_join[REGIONAL_INDICATORS] +
+      ")"
+    emoji_valid_core_sequence = \
+      join[
+        # emoji_character,
+        emoji_keycap_sequence,
+        emoji_modifier_sequence,
+        non_component_emoji_presentation_sequence,
+        emoji_valid_flag_sequence,
+      ]
+    emoji_well_formed_core_sequence = \
+      join[
+        # emoji_character,
+        emoji_keycap_sequence,
+        emoji_modifier_sequence,
+        non_component_emoji_presentation_sequence,
+        emoji_well_formed_flag_sequence,
+      ]
+    emoji_rgi_tag_sequence = \
+      pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS]
     emoji_valid_tag_sequence = \
       "(?:" +
         pack[EMOJI_TAG_BASE_FLAG] +
@@ -83,35 +119,60 @@ module Unicode
         pack[CANCEL_TAG] +
       ")"
-    emoji_zwj_element = \
+    emoji_well_formed_tag_sequence = \
+      "(?:" +
+        join[
+          non_component_emoji_presentation_sequence,
+          emoji_modifier_sequence,
+        ] +
+        pack_and_join[TAGS] + "+" +
+        pack[CANCEL_TAG] +
+      ")"
+    emoji_rgi_zwj_sequence = \
+      pack_and_join[RECOMMENDED_ZWJ_SEQUENCES]
+    emoji_valid_zwj_element = \
       join[
         emoji_modifier_sequence,
         emoji_presentation_sequence,
         emoji_character,
       ]
-    # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences
-    REGEX = Regexp.compile(
-      pack_and_join[RECOMMENDED_ZWJ_SEQUENCES] +
-      ?| + pack_and_join[RECOMMENDED_SUBDIVISION_FLAGS] +
-      ?| + emoji_modifier_sequence +
-      ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
-      ?| + emoji_keycap_sequence +
-      ?| + emoji_valid_region_sequence  +
-      ""
-    )
+    emoji_valid_zwj_sequence = \
+      "(?:" +
+        "(?:" + emoji_valid_zwj_element + pack[ZWJ] + ")+" + emoji_valid_zwj_element +
+      ")"
+    emoji_rgi_sequence = \
+      join[
+        emoji_rgi_zwj_sequence,
+        emoji_rgi_tag_sequence,
+        emoji_valid_core_sequence,
+      ]
+    emoji_valid_sequence = \
+      join[
+        emoji_valid_zwj_sequence,
+        emoji_valid_tag_sequence,
+        emoji_valid_core_sequence,
+      ]
+    emoji_well_formed_sequence = \
+      join[
+        emoji_valid_zwj_sequence,
+        emoji_well_formed_tag_sequence,
+        emoji_well_formed_core_sequence,
+      ]
+    # Matches basic singleton emoji and all kind of sequences, but restrict zwj and tag sequences to known sequences (rgi)
+    REGEX = Regexp.compile(emoji_rgi_sequence)
     # Matches basic singleton emoji and all kind of valid sequences
-    REGEX_VALID = Regexp.compile(
-      # EMOJI_TAGS.map{ |base, spec| "(?:" + pack[base] + "[" + pack[spec] + "]+" + pack[CANCEL_TAG] + ")" }.join("|") +
-      emoji_valid_tag_sequence +
-      ?| + "(?:" + "(?:" + emoji_zwj_element + pack[ZWJ] + "){1,3}" + emoji_zwj_element + ")" +
-      ?| + emoji_modifier_sequence +
-      ?| + "(?!" + emoji_component + ")" + emoji_presentation_sequence +
-      ?| + emoji_keycap_sequence +
-      ?| + emoji_valid_region_sequence +
-      ""
-    )
+    REGEX_VALID = Regexp.compile(emoji_valid_sequence)
+    # Matches basic singleton emoji and all kind of sequences
+    REGEX_WELL_FORMED = Regexp.compile(emoji_well_formed_sequence)
     # Matches only basic single, non-textual emoji
     # Ignores "components" like modifiers or simple digits
@@ -125,11 +186,16 @@ module Unicode
       "(?!" + emoji_component + ")" + text_presentation_sequence
     )
-    # Matches any emoji-related codepoint
+    # Matches any emoji-related codepoint - Use with caution (returns partil matches)
     REGEX_ANY = Regexp.compile(
       emoji_character
     )
+    # Combined REGEXes which also match for TEXTUAL emoji
+    REGEX_INCLUDE_TEXT = Regexp.union(REGEX, REGEX_TEXT)
+    REGEX_VALID_INCLUDE_TEXT = Regexp.union(REGEX_VALID, REGEX_TEXT)
+    REGEX_WELL_FORMED_INCLUDE_TEXT = Regexp.union(REGEX_WELL_FORMED, REGEX_TEXT)
     def self.properties(char)
       ord = get_codepoint_value(char)
       props = INDEX[:PROPERTIES][ord]
@@ -143,6 +209,9 @@ module Unicode
     def self.list(key = nil, sub_key = nil)
       return LIST unless key || sub_key
+      if LIST_REMOVED_KEYS.include?(key)
+        $stderr.puts "Warning(unicode-emoji): The category of #{key} does not exist anymore"
+      end
       LIST.dig(*[key, sub_key].compact)
     end

data/lib/unicode/emoji/constants.rb CHANGED Viewed

@@ -2,12 +2,12 @@
 module Unicode
   module Emoji
-    VERSION = "1.1.0".freeze
-    EMOJI_VERSION = "11.0".freeze
+    VERSION = "2.0.0".freeze
+    EMOJI_VERSION = "12.0".freeze
     DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
     INDEX_FILENAME = (DATA_DIRECTORY + '/emoji.marshal.gz').freeze
-    ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false
+    ENABLE_NATIVE_EMOJI_UNICODE_PROPERTIES = false # As of Ruby 2.6.1, Emoji version 11 is included
   end
 end

data/spec/unicode_emoji_spec.rb CHANGED Viewed

@@ -158,7 +158,7 @@ describe Unicode::Emoji do
     it "does not match invalid tag sequences" do
       "🏴󠁧󠁢󠁡󠁡󠁡󠁿 GB AAA" =~ Unicode::Emoji::REGEX_VALID
-      assert_equal "🏴", $&
+      assert_equal "🏴", $& # only base flag is matched
     end
     it "matches recommended zwj sequences" do
@@ -172,6 +172,88 @@ describe Unicode::Emoji do
     end
   end
+  describe "REGEX_WELL_FORMED" do
+    it "matches most singleton emoji codepoints" do
+      "😴 sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "😴", $&
+    end
+    it "matches singleton emoji in combination with emoji variation selector" do
+      "😴\u{FE0F} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "😴\u{FE0F}", $&
+    end
+    it "does not match singleton emoji when in combination with text variation selector" do
+      "😴\u{FE0E} sleeping face" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_nil $&
+    end
+    it "does not match textual singleton emoji" do
+      "▶ play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_nil $&
+    end
+    it "matches textual singleton emoji in combination with emoji variation selector" do
+      "▶\u{FE0F} play button" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "▶\u{FE0F}", $&
+    end
+    it "does not match singleton 'component' emoji codepoints" do
+      "🏻 light skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_nil $&
+    end
+    it "matches modified emoji if modifier base emoji is used" do
+      "🛌🏽 person in bed: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🛌🏽", $&
+    end
+    it "does not match modified emoji if no modifier base emoji is used" do
+      "🌵🏽 cactus" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🌵", $&
+    end
+    it "matches valid region flags" do
+      "🇵🇹 Portugal" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🇵🇹", $&
+    end
+    it "does match invalid region flags" do
+      "🇵🇵 PP Land" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🇵🇵", $&
+    end
+    it "matches emoji keycap sequences" do
+      "2️⃣ keycap: 2" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "2️⃣", $&
+    end
+    it "matches recommended tag sequences" do
+      "🏴󠁧󠁢󠁳󠁣󠁴󠁿 Scotland" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🏴󠁧󠁢󠁳󠁣󠁴󠁿", $&
+    end
+    it "matches valid tag sequences, even though they are not recommended" do
+      "🏴󠁧󠁢󠁡󠁧󠁢󠁿 GB AGB" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🏴󠁧󠁢󠁡󠁧󠁢󠁿", $&
+    end
+    it "does match invalid tag sequences" do
+      "😴󠁧󠁢󠁡󠁡󠁡󠁿 GB AAA" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "😴󠁧󠁢󠁡󠁡󠁡󠁿", $&
+    end
+    it "matches recommended zwj sequences" do
+      "🤾🏽‍♀️ woman playing handball: medium skin tone" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🤾🏽‍♀️", $&
+    end
+    it "matches valid zwj sequences, even though they are not recommended" do
+      "🤠‍🤢 vomiting cowboy" =~ Unicode::Emoji::REGEX_WELL_FORMED
+      assert_equal "🤠‍🤢", $&
+    end
+  end
   describe "REGEX_BASIC" do
     it "matches most singleton emoji codepoints" do
       "😴 sleeping face" =~ Unicode::Emoji::REGEX_BASIC
@@ -300,15 +382,21 @@ describe Unicode::Emoji do
   describe ".list" do
     it "returns a grouped list of emoji" do
-      assert_includes Unicode::Emoji.list.keys, "Smileys & People"
+      assert_includes Unicode::Emoji.list.keys, "Smileys & Emotion"
     end
     it "sub-groups the list of emoji" do
-      assert_includes Unicode::Emoji.list("Smileys & People").keys, "face-positive"
+      assert_includes Unicode::Emoji.list("Smileys & Emotion").keys, "face-glasses"
     end
     it "has emoji in sub-groups" do
-      assert_includes Unicode::Emoji.list("Smileys & People", "face-positive"), "😎"
+      assert_includes Unicode::Emoji.list("Smileys & Emotion", "face-glasses"), "😎"
+    end
+    it "issues a warning if attempting to retrieve old category" do
+      assert_output nil, "Warning(unicode-emoji): The category of Smileys & People does not exist anymore\n" do
+        assert_nil Unicode::Emoji.list("Smileys & People", "face-positive")
+      end
     end
   end
 end

metadata CHANGED Viewed

@@ -1,16 +1,16 @@
 --- !ruby/object:Gem::Specification
 name: unicode-emoji
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 2.0.0
 platform: ruby
 authors:
 - Jan Lelis
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2018-06-05 00:00:00.000000000 Z
+date: 2019-02-19 00:00:00.000000000 Z
 dependencies: []
-description: "[Emoji 11.0] Retrieve emoji data about Unicode codepoints. Also contains
+description: "[Emoji 12.0] Retrieve emoji data about Unicode codepoints. Also contains
   a regex to match emoji."
 email:
 - mail@janlelis.de
@@ -53,7 +53,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.7.6
+rubygems_version: 2.5.1
 signing_key:
 specification_version: 4
 summary: Retrieve Emoji data about Unicode codepoints.