RubyGems - unicode-emoji - Versions diffs - 3.8.0 → 4.0.4 - Mend

unicode-emoji 3.8.0 → 4.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: e3c7cc2671d256d8208b72d719384e7c13aaace4fec6b4919b92640e5336d87f
-  data.tar.gz: 9420777da4805787467c7f4eac1580f7179a6abf56e54b011c11640a50502b88
+  metadata.gz: 3b08d6adaddfcbca3e754c9a52a0c0d5c772da86ca708affc9799ad113c5a005
+  data.tar.gz: e9f3817a215ef38b7933d69b4f0563d848a03f6a6b8728ecd06a74417fb5f8a7
 SHA512:
-  metadata.gz: f31a83c8a492affe4ec34f2f6e43fa20a44f98dae6c0855322d5c12bc924edc154f7fa4793ca28f299f5436c8981651a5777c2e7944db6dee9c3b99f5ec997ce
-  data.tar.gz: 54c1beecbcb673274bdf98169f11fa51bc746282418c3b9ca30385194ae384e3532b04bbca3ef196316c11d3344258b9059da1ea18630d8d90b51029c608565d
+  metadata.gz: cedad0ceb5f1039be614bbca170cccc3f29e8f05bd7fc74714ed586ddf20edb5da25a6c8fd840acfb7dfbeb918ec6e2218cf427e5e510eb2406235253e32ad74
+  data.tar.gz: 4680b526737abd7491351ff87c5d323f3a6acf5996dffa4c2f4737e10bc8083da16928dbab34362b6014ae5a17863266eb456ab2d382f25350339eaf71a175bf

data/.rake_tasks CHANGED Viewed

@@ -1,3 +1,4 @@
+dependencies...
 gem
 generate_constants
 irb

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,32 @@
 # CHANGELOG
-### 3.8.0
+## 4.0.4
+- Add `REGEX_TEXT_PRESENTATION` to be able to match for raw default-text Emoji codepoints
+## 4.0.3
+- Remove emoji-test.txt from Rubygems package
+## 4.0.2
+- Directly use `RbConfig::CONFIG["UNICODE_EMOJI_VERSION"]` to detect Ruby's Emoji version,
+  drop unicode-version dependency
+## 4.0.0
+- **Breaking change:** Regexes now include single skin tone modifiers (`🏻`) and hair components (`🦰`).
+  They were previously considered to be invalid partial Emoji, however since they are supposed to be
+  displayed as Emoji in isolation, they are now part of the regexes (see *ED-20* in UTS51).
+- **Breaking change:** Drop `REGEX_ANY` in favor of `REGEX_PROP_EMOJI`
+- Expose regexes for Emoji props (`REGEX_PROP_*`). The advantage over using the native regex properties
+  directly is that you will be able to use the Emoji support level of this gem instead of Ruby's.
+  For example, as of releasing this, the current Emoji version is 16.0, while Ruby is at 15.0.
+  Also see README for a table listing the regexes that match Emoji properties.
+- Add `REGEX_EMOJI_KEYCAP` for matching specifically Emoji keycaps
+- Use character class instead of lookbehind for native text emoji and non-emoji pictographic regexes
+## 3.8.0
 - Add new RGI-based regexes `REGEX_INCLUDE_MQE` and `REGEX_INCLUDE_MQE_UQE` which allows to match
   for minimally-qualified and unqualified RGI sequences (Emoji that lack some VS16)
@@ -10,7 +36,7 @@
 - Update CLDR to v46 (valid subdivisions)
 - Further improvements (see commit log)
-### 3.7.0
+## 3.7.0
 - Bump required Ruby slightly to 2.5
 - Introduce new `REGEX_POSSIBLE` which contains the regex described in
@@ -23,46 +49,46 @@
 - Separately autoload emoji list, so it can be loaded when other indexes
   are not needed
-### 3.6.0
+## 3.6.0
 - `Unicode::Emoji::REGEX_TEXT` now matches non-emoji keycaps like "3⃣"  (U+0033 U+20E3)
 - Minor refactorings
-### 3.5.0
+## 3.5.0
 - Emoji 16.0
-### 3.4.0
+## 3.4.0
 - Emoji 15.1
-### 3.3.2
+## 3.3.2
 - Update valid subdivisions to CLDR 43 (no changes)
   -> there won't be any new RGI subdivision flags in Emoji
-### 3.3.1
+## 3.3.1
 - Update valid subdivisions to CLDR 42 (no changes)
-### 3.3.0
+## 3.3.0
 - Emoji 15.0
-### 3.2.0
+## 3.2.0
 - Update valid subdivisions to CLDR 41
-### 3.1.1
+## 3.1.1
 - Fix `REGEX` to be able to match complete family emoji, instead of
   sub-matching partial families, thanks @matt17r
-### 3.1.0
+## 3.1.0
 - Update valid subdivisions to CLDR 40
-### 3.0.0
+## 3.0.0
 - Vastly improve memory usage, patch by @radarek
   - Emoji regexes are now pre-generated and bundled with the release
@@ -70,54 +96,54 @@
   - Most constants (e.g. regexes) now get autoloaded
   - See https://github.com/janlelis/unicode-emoji/pull/9 for more details
-### 2.9.0
+## 2.9.0
 - Emoji 14.0
-### 2.8.0
+## 2.8.0
 - Update valid subdivisions to CLDR 39
-### 2.7.1
+## 2.7.1
 - Update valid subdivisions to CLDR 38.1
-### 2.7.0
+## 2.7.0
 - Update valid subdivisions to CLDR 38
 - Loosen Ruby dependency to allow Ruby 3.0
-### 2.6.0
+## 2.6.0
 - Emoji 13.1
-### 2.5.0
+## 2.5.0
 - Use native Emoji regex properties when current Ruby's Emoji support is the same as our current Emoji version
 - Update valid subdivisions to CLDR 37
-### 2.4.0
+## 2.4.0
 - Emoji 13.0
-### 2.3.1
+## 2.3.1
 - Fix index to actually include Emoji 12.1
-### 2.3.0
+## 2.3.0
 - Emoji 12.1
-### 2.2.0
+## 2.2.0
 - Update subdivisions to CLDR 36
-### 2.1.0
+## 2.1.0
 - Add `REGEX_PICTO` which matches codepoints with the **Extended_Pictographic** property
 - Add `REGEX_PICTO_NO_EMOJI` which matches codepoints with the **Extended_Pictographic** property, but no **Emoji** property
-### 2.0.0
+## 2.0.0
 - Emoji 12.0 data (including valid subdivisions)
 - Introduce new `REGEX_WELL_FORMED` to be able to match for invalid tag and region sequences
@@ -126,40 +152,40 @@
 - Issue warning when using `#list` method to retrieve outdated category
 - Change matching for ZWJ sequences: Do not limit sequence to a maximum of 3 ZWJs
-### 1.1.0
+## 1.1.0
 - Emoji 11.0
 - Do not depend on rubygems (only use zlib stdlib for unzipping)
-### 1.0.3
+## 1.0.3
 - Explicitly load rubygems/util, fixes regression in 1.2.1
-### 1.0.2
+## 1.0.2
 - Use `Gem::Util` for `gunzip`, removes deprecation warning
-### 1.0.1
+## 1.0.1
 - Actually set required Ruby version to 2.3 in gemspec
-### 1.0.0
+## 1.0.0
 - Drop support for Ruby below 2.3, use 0.9 if you need to
 - Internal refactorings, no API change
-### 0.9.3
+## 0.9.3
 - Implement native Emoji regex matchers, but do not activate or document, yet
-### 0.9.2
+## 0.9.2
 - REGEX_TEXT: Do not match if the text emoji is followed by a emoji modifier
-### 0.9.1
+## 0.9.1
 - Include a categorized list of recommended Emoji
-### 0.9.0
+## 0.9.0
 - Initial release (Emoji version 5.0)

data/Gemfile.lock CHANGED Viewed

@@ -1,8 +1,7 @@
 PATH
   remote: .
   specs:
-    unicode-emoji (3.5.0)
-      unicode-version (~> 1.0)
+    unicode-emoji (4.0.4)
 GEM
   remote: https://rubygems.org/
@@ -20,7 +19,6 @@ GEM
     reline (0.3.8)
       io-console (~> 0.5)
     stringio (3.0.8)
-    unicode-version (1.3.0)
 PLATFORMS
   ruby
@@ -32,4 +30,4 @@ DEPENDENCIES
   unicode-emoji!
 BUNDLED WITH
-   2.2.22
+   2.5.21

data/README.md CHANGED Viewed

@@ -1,6 +1,7 @@
 # Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji)  [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)
-Provides regular expressions to find Emoji in strings, incorporating the latest Unicode / Emoji standards.
+Provides various sophisticated regular expressions to work with Emoji in strings,
+incorporating the latest Unicode / Emoji standards.
 Additional features:
@@ -26,16 +27,17 @@ require "unicode/emoji"
 string = "String which contains all types of Emoji sequences:
-- Singleton Emoji: 😴
-- Textual singleton Emoji with Emoji variation: ▶️
+- Basic Emoji: 😴
+- Textual Emoji with Emoji variation (VS16): ▶️
 - Emoji with skin tone modifier: 🛌🏽
 - Region flag: 🇵🇹
 - Sub-Region flag: 🏴󠁧󠁢󠁳󠁣󠁴󠁿
 - Keycap sequence: 2️⃣
+- Skin tone modifier: 🏻
 - Sequence using ZWJ (zero width joiner): 🤾🏽‍♀️
 "
-string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴󠁧󠁢󠁳󠁣󠁴󠁿", "2️⃣", "🤾🏽‍♀️"]
+string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴󠁧󠁢󠁳󠁣󠁴󠁿", "2️⃣", "🏻", "🤾🏽‍♀️"]
 ```
 Depending on your exact usecase, you can choose between multiple levels of Emoji detection:
@@ -44,10 +46,10 @@ Depending on your exact usecase, you can choose between multiple levels of Emoji
 Regex                         | Description | Example Matches | Example Non-Matches
 ------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX`       | **Use this one if unsure!** Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *recommended* Emoji sequences (RGI/FQE) | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️` |  `🤾🏽‍♀`, `🏌‍♂️`, `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
-`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *valid* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀` ,`🏌‍♂️`, `🤠‍🤢` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `1`, `1⃣`
-`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kinds of *well-formed* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`,`🏌‍♂️` , `🤠‍🤢`,  `🇵🇵` | `😴︎`, `▶`, `🏻`, `1`, `1⃣`
-`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, singleton components, all kinds of Emoji sequences, and even single digits (except for: unqualified keycap sequences) | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`,  `🇵🇵`, `😴︎`, `▶`, `🏻`, `1` | `1⃣`
+`Unicode::Emoji::REGEX`       | **Use this one if unsure!** Matches (non-textual) Basic Emoji and all kinds of *recommended* Emoji sequences (RGI/FQE) | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `🏻` |  `🤾🏽‍♀`, `🏌‍♂️`, `😴︎`, `▶`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
+`Unicode::Emoji::REGEX_VALID` | Matches (non-textual) Basic Emoji and all kinds of *valid* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀` ,`🏌‍♂️`, `🤠‍🤢`, `🏻` | `😴︎`, `▶`, `🇵🇵`, `1`, `1⃣`
+`Unicode::Emoji::REGEX_WELL_FORMED` | Matches (non-textual) Basic Emoji and all kinds of *well-formed* Emoji sequences | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`,`🏌‍♂️` , `🤠‍🤢`,  `🇵🇵`, `🏻` | `😴︎`, `▶`, `1`, `1⃣`
+`Unicode::Emoji::REGEX_POSSIBLE` | Matches all singleton Emoji, all kinds of Emoji sequences, and even non-Emoji singleton components like digits. Only exception: Unqualified keycap sequences are not matched | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`,  `🇵🇵`, `😴︎`, `▶`, `🏻`, `1` | `1⃣`
 #### Include Text Emoji
@@ -55,16 +57,16 @@ By default, textual Emoji (emoji characters with text variation selector or thos
 Regex                         | Description | Example Matches | Example Non-Matches
 ------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_INCLUDE_TEXT`       | `REGEX` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `😴︎`, `▶`, `1⃣` | `🤾🏽‍♀`, `🏌‍♂️`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`
-`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `😴︎`, `▶`, `1⃣` | `🏻`, `🇵🇵`, `1`
-`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`,  `🇵🇵`, `😴︎`, `▶`, `1⃣` | `🏻`, `1`
+`Unicode::Emoji::REGEX_INCLUDE_TEXT`       | `REGEX` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `😴︎`, `▶`, `1⃣` , `🏻`| `🤾🏽‍♀`, `🏌‍♂️`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`
+`Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT` | `REGEX_VALID` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `😴︎`, `▶`, `1⃣` , `🏻` | `🇵🇵`, `1`
+`Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT` | `REGEX_WELL_FORMED` + `REGEX_TEXT` | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`,  `🇵🇵`, `😴︎`, `▶`, `1⃣` , `🏻` | `1`
 #### Minimally-qualified and Unqualified Sequences
 Regex                         | Description | Example Matches | Example Non-Matches
 ------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀` | `🏌‍♂️`, `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
-`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️` | `😴︎`, `▶`, `🏻`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
+`Unicode::Emoji::REGEX_INCLUDE_MQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏻` | `🏌‍♂️`, `😴︎`, `▶`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
+`Unicode::Emoji::REGEX_INCLUDE_MQE_UQE` | Like `REGEX`, but additionally includes Emoji with missing Emoji Presentation Variation Selectors | `😴`, `▶️`, `🛌🏽`, `🇵🇹`, `2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🏻` | `😴︎`, `▶`, `🇵🇵`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤠‍🤢`, `1`, `1⃣`
 [List of MQE and UQE Emoji sequences](https://character.construction/unqualified-emoji)
@@ -74,10 +76,10 @@ Matches only simple one-codepoint (+ optional variation selector) Emoji:
 Regex                         | Description | Example Matches | Example Non-Matches
 ------------------------------|-------------|-----------------|--------------------
-`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all | `😴`, `▶️` | `😴︎`, `▶`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `1`
-`Unicode::Emoji::REGEX_TEXT`  | Matches only textual singleton Emoji (except for singleton components, like digits) | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `1`
+`Unicode::Emoji::REGEX_BASIC` | Matches (non-textual) Basic Emoji, but no sequences at all | `😴`, `▶️`, `🏻` | `😴︎`, `▶`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `1`
+`Unicode::Emoji::REGEX_TEXT`  | Matches only textual singleton Emoji | `😴︎`, `▶` | `😴`, `▶️`, `🏻`, `🛌🏽`, `🇵🇹`, `🇵🇵`,`2️⃣`, `🏴󠁧󠁢󠁳󠁣󠁴󠁿`, `🏴󠁧󠁢󠁡󠁧󠁢󠁿`, `🤾🏽‍♀️`, `🤾🏽‍♀`, `🏌‍♂️`, `🤠‍🤢`, `1`
-Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text)
+Here is a list of all Emoji that can be matched using the two regexes: [character.construction/emoji-vs-text](https://character.construction/emoji-vs-text). The `REGEX_BASIC` regex also matches [visual Emoji components](https://character.construction/emoji-components) (skin tone modifiers and hair components).
 While `REGEX_BASIC` is part of the above regexes, `REGEX_TEXT` is only included in the `*_INCLUDE_TEXT` or `*_UQE` variants.
@@ -140,7 +142,20 @@ Please see [the standard](https://www.unicode.org/reports/tr51/#Emoji_Sets) for
 More info about valid vs. recommended Emoji can also be found in this [blog article on Emojipedia](https://blog.emojipedia.org/unicode-behind-the-curtain/).
-### Extended Pictographic Regex
+### Emoji Property Regexes
+Ruby includes native regex Emoji properties, as listed in the following table. You can also opt-in to use the `*_PROP_*` regexes to get the Emoji support level of this gem (instead of Ruby's).
+Gem Regex (`Unicode::Emoji`'s Emoji support level) | Native Regex (Ruby's Emoji support level)
+---------------------------------------------------|------------------------------------------
+`Unicode::Emoji::REGEX_PROP_EMOJI`         | `/\p{Emoji}/`
+`Unicode::Emoji::REGEX_PROP_MODIFIER`      | `/\p{EMod}/`
+`Unicode::Emoji::REGEX_PROP_MODIFIER_BASE` | `/\p{EBase}/`
+`Unicode::Emoji::REGEX_PROP_COMPONENT`     | `/\p{EComp}/`
+`Unicode::Emoji::REGEX_PROP_PRESENTATION`  | `/\p{EPres}/`
+`Unicode::Emoji::REGEX_TEXT_PRESENTATION`  | `/[\p{Emoji}&&\P{EPres}]/`
+#### Extended Pictographic Regex
 `Unicode::Emoji::REGEX_PICTO` matches single codepoints with the **Extended_Pictographic** property. For example, it will match `✀` BLACK SAFETY SCISSORS.
@@ -148,10 +163,6 @@ More info about valid vs. recommended Emoji can also be found in this [blog arti
 See [character.construction/picto](https://character.construction/picto) for a list of all non-Emoji pictographic characters.
-### Partial Regexes
-`Unicode::Emoji::REGEX_ANY`, same as `\p{Emoji}`. Deprecated: Will be removed or renamed in the future.
 ## Usage – List
 Use `Unicode::Emoji::LIST` or the **list** method to get a ordered and categorized list of Emoji:

data/data/generate_constants.rb CHANGED Viewed

@@ -69,6 +69,8 @@ def pack_and_join(ords)
 end
 def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_component:, emoji_presentation:, text_presentation:, picto:, picto_no_emoji:)
+  visual_component = pack_and_join(VISUAL_COMPONENT)
   emoji_presentation_sequence = \
     join(
       text_presentation + pack(EMOJI_VARIATION_SELECTOR),
@@ -78,6 +80,12 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
   non_component_emoji_presentation_sequence = \
     "(?!" + emoji_component + ")" + emoji_presentation_sequence
+  basic_emoji = \
+    join(
+      non_component_emoji_presentation_sequence,
+      visual_component,
+    )
   text_keycap_sequence = \
     pack_and_join(EMOJI_KEYCAPS) + pack(EMOJI_KEYCAP_SUFFIX)
@@ -169,6 +177,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_rgi_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
     )
   emoji_rgi_sequence_include_text = \
@@ -177,6 +186,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_rgi_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
       text_emoji,
     )
@@ -186,6 +196,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_rgi_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
     )
   emoji_rgi_include_mqe_uqe_sequence = \
@@ -195,6 +206,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_rgi_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
     )
   emoji_valid_sequence = \
@@ -203,6 +215,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_valid_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
     )
   emoji_valid_sequence_include_text = \
@@ -211,6 +224,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_valid_tag_sequence,
       emoji_valid_flag_sequence,
       emoji_core_sequence,
+      visual_component,
       text_emoji,
     )
@@ -220,6 +234,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_well_formed_tag_sequence,
       emoji_well_formed_flag_sequence,
       emoji_core_sequence,
+      visual_component,
     )
   emoji_well_formed_sequence_include_text = \
@@ -228,6 +243,7 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
       emoji_well_formed_tag_sequence,
       emoji_well_formed_flag_sequence,
       emoji_core_sequence,
+      visual_component,
       text_emoji,
     )
@@ -279,19 +295,27 @@ def compile(emoji_character:, emoji_modifier:, emoji_modifier_base:, emoji_compo
   # See https://www.unicode.org/reports/tr51/#EBNF_and_Regex
   regexes[:REGEX_POSSIBLE] = Regexp.compile(emoji_possible)
-  # Matches only basic single, non-textual emoji, ignores "components" like modifiers or simple digits
-  regexes[:REGEX_BASIC] = Regexp.compile(non_component_emoji_presentation_sequence)
+  # Matches only basic single, non-textual emoji, ignores some components like simple digits
+  regexes[:REGEX_BASIC] = Regexp.compile(basic_emoji)
-  # Matches only basic single, textual emoji, ignores "components" like modifiers or simple digits
+  # Matches only basic single, textual emoji, ignores components like modifiers or simple digits
   regexes[:REGEX_TEXT] = Regexp.compile(text_emoji)
+  regexes[:REGEX_TEXT_PRESENTATION] = Regexp.compile(text_presentation)
-  # Same as \p{Emoji} - to be removed or renamed
-  regexes[:REGEX_ANY] = Regexp.compile(emoji_character)
+  # Export regexes for Emoji properties so they can be used with newer Unicode than Ruby's
+  regexes[:REGEX_PROP_EMOJI] = Regexp.compile(emoji_character)
+  regexes[:REGEX_PROP_MODIFIER] = Regexp.compile(emoji_modifier)
+  regexes[:REGEX_PROP_MODIFIER_BASE] = Regexp.compile(emoji_modifier_base)
+  regexes[:REGEX_PROP_COMPONENT] = Regexp.compile(emoji_component)
+  regexes[:REGEX_PROP_PRESENTATION] = Regexp.compile(emoji_presentation)
+  # Same goes for ExtendedPictographic
   regexes[:REGEX_PICTO] = Regexp.compile(picto)
   regexes[:REGEX_PICTO_NO_EMOJI] = Regexp.compile(picto_no_emoji)
+  # Emoji keycaps
+  regexes[:REGEX_EMOJI_KEYCAP] = Regexp.compile(emoji_keycap_sequence)
   regexes
 end
@@ -313,8 +337,8 @@ native_regexes = compile(
   emoji_modifier_base:  "\\p{EBase}",
   emoji_component:      "\\p{EComp}",
   emoji_presentation:   "\\p{EPres}",
-  text_presentation:    "\\p{Emoji}(?<!\\p{EPres})",
+  text_presentation:    "[\\p{Emoji}&&\\P{EPres}]",
   picto:                "\\p{ExtPict}",
-  picto_no_emoji:       "\\p{ExtPict}(?<!\\p{Emoji})"
+  picto_no_emoji:       "[\\p{ExtPict}&&\\P{Emoji}]"
 )
 write_regexes(native_regexes, File.expand_path("../lib/unicode/emoji/generated_native", __dir__))

data/lib/unicode/emoji/constants.rb CHANGED Viewed

@@ -2,9 +2,9 @@
 module Unicode
   module Emoji
-    VERSION = "3.8.0"
+    VERSION = "4.0.4"
     EMOJI_VERSION = "16.0"
-    CLDR_VERSION = "45"
+    CLDR_VERSION = "46"
     DATA_DIRECTORY = File.expand_path('../../../data', __dir__).freeze
     INDEX_FILENAME = (DATA_DIRECTORY + "/emoji.marshal.gz").freeze
@@ -41,5 +41,9 @@ module Unicode
     # Two regional indicators make up a region
     REGIONAL_INDICATORS           = [*0x1F1E6..0x1F1FF].freeze
+    # The current list of Emoji components that should have a visual representation
+    # Currently skin tone modifiers + hair components
+    VISUAL_COMPONENT              = [*0x1F3FB..0x1F3FF, *0x1F9B0..0x1F9B3].freeze
   end
 end