RubyGems - unicode-display_width - Versions diffs - 2.6.0 → 3.1.4 - Mend

unicode-display_width 2.6.0 → 3.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +93 -2
data/README.md +75 -52
data/data/display_width.marshal.gz +0 -0
data/lib/unicode/display_width/constants.rb +1 -1
data/lib/unicode/display_width/emoji_support.rb +52 -0
data/lib/unicode/display_width/reline_ext.rb +14 -0
data/lib/unicode/display_width/string_ext.rb +3 -3
data/lib/unicode/display_width.rb +199 -75
metadata +28 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eb663fb7dd6d3409dd3b21bd4a793d954ad8fd9b974593868292b5ec59ba7c01
-  data.tar.gz: e960ab9c24135cb1d7872e84c4e3d7b24f83a5ae85b12a14af27312a90241597
+  metadata.gz: 9dadf5b8337ac74b8e5af2a6cd481708c39050506440ccf5d30c6cdc9eb5ade3
+  data.tar.gz: ae3bb12a0fabe7a53a1b533909f42a86f1fc7acecc33c9b57218ed3607f71d3c
 SHA512:
-  metadata.gz: bd4fb14101159588eec1c2bf6871a94e297e6314317ee10ce320b09fc30e01f35d8610cdd3dfe32edb979f0ece6053914a23298d6b46732f22d16f642576aacf
-  data.tar.gz: b02c66363a1740303715e30b8023e0cc2de99baf78f183ec2ad48b18ff26ac95da4e496f604aa4bda1dcf691ad4fdff644b9e1d7c41922ba078b6c35653debc6
+  metadata.gz: 414227480c3ae2ca0afcee225bb68b6506ac2f7dac630422db87c0a6d28d9d921bcc4645146ffa0a75ace5eb338dc71df2f051aeb0732824dbf1c7bb3117ba22
+  data.tar.gz: 1002b3752d47df6f3d416d378148af38ff7c92698a118aded36f55e3b6bbdc99a29552fdee792636d5edea1ce2104266b1dcf9198631a7669334a48fa425af29

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,78 @@
 # CHANGELOG
+## 3.1.4
+- Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence
+  context (= single emoji char + modifier) #29
+- Add more docs and specs about modifier handling
+## 3.1.3
+Better handling of non-UTF-8 strings, patch by @Earlopain:
+- Data with *BINARY* encoding is interpreted as UTF-8, if possible
+- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
+## 3.1.2
+- Performance improvements
+## 3.1.1
+- Performance improvements
+## 3.1.0
+**Improve Emoji support:**
+- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
+  ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
+  to implement.
+- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
+  the former `:rgi_uqe` option). Most terminals that want to support the RGI set
+  will probably want to catch Emoji sequences with missing VS16s.
+- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
+  that needs these quirks
+- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
+- `:auto` mode: Only consider terminal cells when recommending Emoji support level
+  (Emoji themselves might display differently)
+- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
+- Rename `:basic` mode to `:vs16`
+## 3.0.1
+- Add WezTerm and foot as good Emoji terminals
+## 3.0.0
+**Rework Emoji support:**
+- Emoji widths are now enabled by default
+- Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
+- VS16 turns Emoji characters of width 1 into full-width
+- Please note that Emoji parsing has a notable impact on performance.
+  You can use the `emoji: false` option to disable Emoji adjustments
+- Tries to detect terminal's Emoji support level automatically (from ENV vars)
+**Index fixes and updates:**
+- Private-use characters are considered ambiguous (were given width 1 before)
+- Fix that a few zero-width ignorable codepoints from recent Unicode were missing
+- Consider the following separators to be zero-width:
+  - U+2028 - LINE SEPARATOR - Zl
+  - U+2029 - PARAGRAPH SEPARATOR - Zp
+**Other:**
+- Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
+  with overwrite values as third parameter, be sure to put it in curly braces.
+- Using third parameter or explicit hash as fourth parameter is deprecated,
+  please migrate to the keyword arguments API
+- Gem raises `ArgumentError` for ambiguous values other than 1 or 2
+- Performance optimizations
+- Require Ruby 2.5
 ## 2.6.0
 - Unicode 16
@@ -40,8 +113,26 @@ More performance improvements:
 ## 2.0.0
-- Release 2.0.0
-- Supports Ruby 3.0
+Add Support for Ruby 3.0
+### Breaking Changes
+Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
+- Aliases of display\_width (…\_size, …\_length) have been removed
+- Auto-loading of string core extension has been removed:
+If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
+```ruby
+require "unicode/display_width/string_ext"
+```
+You could also change your `Gemfile` line to achieve this:
+```ruby
+gem "unicode-display_width", require: "unicode/display_width/string_ext"
+```
 ## 2.0.0.pre2

data/README.md CHANGED Viewed

@@ -1,39 +1,22 @@
-## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
+# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
-Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
+Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
 Unicode version: **16.0.0** (September 2024)
-Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
+## Gem Version 3 — Improved Emoji Support
-Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**
+**Emoji support is now enabled by default.** See below for description and configuration possibilities.
-For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
+**Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
-## Version 2.4.2 — Performance Updates
+See [CHANGELOG](/CHANGELOG.md) for details.
-**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
-This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
-## Version 2.0 — Breaking Changes
-Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
+## Gem Version 2.4.2 — Performance Updates
-- Aliases of display_width (…\_size, …\_length) have been removed
-- Auto-loading of string core extension has been removed:
-If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
-```ruby
-require "unicode/display_width/string_ext"
-```
-You could also change your `Gemfile` line to achieve this:
+**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
-```ruby
-gem "unicode-display_width", require: "unicode/display_width/string_ext"
-```
+This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
 ## Introduction to Character Widths
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
 Width  | Characters                   | Comment
 -------|------------------------------|--------------------------------------------------
-X      | (user defined)               | Overwrites any other values
+?      | (user defined)               | Overwrites any other values
+?      | Emoji                        | See "How this Library Handles Emoji Width" below
 -1     | `"\b"`                       | Backspace (total width never below 0)
 0      | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
 1      | `"\u{00AD}"`                 | SOFT HYPHEN
 2      | `"\u{2E3A}"`                 | TWO-EM DASH
 3      | `"\u{2E3B}"`                 | THREE-EM DASH
-0      | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
-0      | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"`     | HANGUL JUNGSEONG
-0      | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
+0      | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
+0      | Derived Property: Default_Ignorable_Code_Point     | Ignorable ranges
+0      | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
 2      | East Asian Width: F, W       | Full-width characters
 2      | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
 1 or 2 | East Asian Width: A          | Ambiguous characters, user defined, default: 1
@@ -71,8 +55,6 @@ Or add to your Gemfile:
 ## Usage
-### Classic API
 ```ruby
 require 'unicode/display_width'
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
 Unicode::DisplayWidth.of("一") # => 2
 ```
-#### Ambiguous Characters
+### Ambiguous Characters
 The second parameter defines the value returned by characters defined as ambiguous:
@@ -89,34 +71,75 @@ Unicode::DisplayWidth.of("·", 1) # => 1
 Unicode::DisplayWidth.of("·", 2) # => 2
 ```
-#### Custom Overwrites
+### Encoding Notes
+- Data with *BINARY* encoding is interpreted as UTF-8, if possible
+- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
+### Custom Overwrites
-You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
+You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
 ```ruby
-Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
+Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
 ```
 Please note that using overwrites disables some perfomance optimizations of this gem.
+### Emoji
-#### Emoji Support
-Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
+If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
 ```ruby
-gem 'unicode-display_width'
-gem 'unicode-emoji'
+Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: :all # => 2
+Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
 ```
-Enable the emoji string width adjustments by passing `emoji: true` as fourth parameter:
+#### How this Library Handles Emoji Width
-```ruby
-Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
-Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2
-```
+There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
+Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
+Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property.
+Emoji Type  | Width / Comment
+------------|----------------
+Basic/Single Emoji character without Variation Selector   | No special handling
+Basic/Single Emoji character with VS15 (Text)             | No special handling
+Basic/Single Emoji character with VS16 (Emoji)            | 2 or East Asian Width (see table below)
+Single Emoji character with Skin Tone Modifier            | 2
+Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at`
+Emoji Sequence                                            | 2 if Emoji belongs to configured Emoji set (see table below)
+#### Emoji Modes
+The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
+`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
+----------------|------------------|---------------------------------|------------------
+`true` or `:auto`  | - | Automatically use recommended Emoji setting for your terminal | -
+`:all`     | 2                | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
+`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
+`:possible`| 2                | 2 for all possible/well-formed Emoji sequences | ?
+`:rgi`     | 2                | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
+`:rgi_at`  | EAW (1 or 2)     | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal
+`:vs16`    | 2                | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
+`false` or  `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals
+- *EAW:* East Asian Width
+- *RGI Emoji:* Emoji Recommended for General Interchange
+- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences
+#### Emoji Support in Terminals
+Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
+Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
+**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
-#### Usage with String Extension
+### Usage with String Extension
 ```ruby
 require 'unicode/display_width/string_ext'
@@ -125,9 +148,9 @@ require 'unicode/display_width/string_ext'
 '一'.display_width # => 2
 ```
-### Modern API: Keyword-arguments Based Config Object
+### Usage with Config Object
-Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
+You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
 ```ruby
 require 'unicode/display_width'
@@ -135,15 +158,15 @@ require 'unicode/display_width'
 display_width = Unicode::DisplayWidth.new(
   # ambiguous: 1,
   overwrite: { "A".ord => 100 },
-  emoji: true,
+  emoji: :all,
 )
 display_width.of "⚀" # => 1
-display_width.of "🤾🏽‍♀️" # => 2
+display_width.of "🤠‍🤢" # => 2
 display_width.of "A" # => 100
 ```
-### Usage From the CLI
+### Usage from the Command-Line
 Use this one-liner to print out display widths for strings from the command-line:

data/data/display_width.marshal.gz CHANGED Viewed

Binary file

data/lib/unicode/display_width/constants.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module Unicode
   class DisplayWidth
-    VERSION = "2.6.0"
+    VERSION = "3.1.4"
     UNICODE_VERSION = "16.0.0"
     DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
     INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"

data/lib/unicode/display_width/emoji_support.rb ADDED Viewed

@@ -0,0 +1,52 @@
+# require "rbconfig"
+# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
+module Unicode
+  class DisplayWidth
+    module EmojiSupport
+      # Tries to find out which terminal emulator is used to
+      # set emoji: config to best suiting value
+      #
+      # Please also see section in README.md and
+      # misc/terminal-emoji-width.rb
+      #
+      # Please note: Many terminals do not set any ENV vars,
+      # maybe CSI queries can help?
+      def self.recommended
+        if ENV["CI"]
+          return :rqi
+        end
+        case ENV["TERM_PROGRAM"]
+        when "iTerm.app"
+          return :all
+        when "Apple_Terminal"
+          return :rgi_at
+        when "WezTerm"
+          return :all_no_vs16
+        end
+        case ENV["TERM"]
+        when "contour","foot"
+          # konsole: all, how to detect?
+          return :all
+        when /kitty/
+          return :vs16
+        end
+        if ENV["WT_SESSION"] # Windows Terminal
+          return :vs16
+        end
+        # As of last time checked: gnome-terminal, vscode, alacritty
+        :none
+      end
+      # Maybe: Implement something like https://github.com/jquast/ucs-detect
+      #        which uses the terminal cursor to check for best support level
+      #        at runtime
+      # def self.detect!
+      # end
+    end
+  end
+end

data/lib/unicode/display_width/reline_ext.rb ADDED Viewed

@@ -0,0 +1,14 @@
+# Experimental
+# Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
+require "reline"
+require "reline/unicode"
+require_relative "../display_width"
+class Reline::Unicode
+  def self.get_mbchar_width(mbchar)
+    Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
+  end
+end

data/lib/unicode/display_width/string_ext.rb CHANGED Viewed

@@ -1,9 +1,9 @@
 # frozen_string_literal: true
-require_relative "../display_width" unless defined? Unicode::DisplayWidth
+require_relative "../display_width"
 class String
-  def display_width(ambiguous = 1, overwrite = {}, options = {})
-    Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
+  def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
+    Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
   end
 end

data/lib/unicode/display_width.rb CHANGED Viewed

@@ -1,123 +1,247 @@
 # frozen_string_literal: true
+require "unicode/emoji"
 require_relative "display_width/constants"
 require_relative "display_width/index"
+require_relative "display_width/emoji_support"
 module Unicode
   class DisplayWidth
+    DEFAULT_AMBIGUOUS = 1
     INITIAL_DEPTH = 0x10000
-    ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
-    FIRST_4096 = decompress_index(INDEX[0][0], 1)
-    def self.of(string, ambiguous = 1, overwrite = {}, options = {})
-      if overwrite.empty?
-        # Optimization for ASCII-only strings without certain control symbols
-        if string.ascii_only?
-          if string.match?(ASCII_NON_ZERO_REGEX)
-            res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
-            res < 0 ? 0 : res
-          else
-            string.size
-          end
-        else
-          width_no_overwrite(string, ambiguous, options)
+    ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/
+    ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F"
+    ASCII_BACKSPACE = "\b"
+    AMBIGUOUS_MAP = {
+      1 => :WIDTH_ONE,
+      2 => :WIDTH_TWO,
+    }
+    FIRST_AMBIGUOUS = {
+      WIDTH_ONE: 768,
+      WIDTH_TWO: 161,
+    }
+    NOT_COMMON_NARROW_REGEX = {
+     WIDTH_ONE: /[^\u{10}-\u{2FF}]/m,
+     WIDTH_TWO: /[^\u{10}-\u{A1}]/m,
+    }
+    FIRST_4096 = {
+      WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
+      WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
+    }
+    EMOJI_SEQUENCES_REGEX_MAPPING = {
+      rgi: :REGEX_INCLUDE_MQE_UQE,
+      rgi_at: :REGEX_INCLUDE_MQE_UQE,
+      possible: :REGEX_WELL_FORMED,
+    }
+    REGEX_EMOJI_VS16 = Regexp.union(
+      Regexp.compile(
+        Unicode::Emoji::REGEX_TEXT_PRESENTATION.source +
+        "(?<![#*0-9])" +
+        "\u{FE0F}"
+      ),
+      Unicode::Emoji::REGEX_EMOJI_KEYCAP
+    )
+    # ebase = Unicode::Emoji::REGEX_PROP_MODIFIER_BASE.source
+    REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+|.[\u{1F3FB}-\u{1F3FF}]/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
+    REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16)
+    # Returns monospace display width of string
+    def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
+      # Binary strings don't make much sense when calculating display width.
+      # Assume it's valid UTF-8
+      if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
+        # Didn't work out, go back to binary
+        string.force_encoding(Encoding::BINARY)
+      end
+      string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
+      options = normalize_options(string, ambiguous, overwrite, old_options, **options)
+      width = 0
+      unless options[:overwrite].empty?
+        width, string = width_custom(string, options[:overwrite])
+      end
+      if string.ascii_only?
+        return width + width_ascii(string)
+      end
+      ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
+      unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
+        return width + string.size
+      end
+      # Retrieve Emoji width
+      if options[:emoji] != :none
+        e_width, string = emoji_width(
+          string,
+          options[:emoji],
+          options[:ambiguous],
+        )
+        width += e_width
+        unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
+          return width + string.size
         end
-      else
-        width_all_features(string, ambiguous, overwrite, options)
       end
-    end
-    def self.width_no_overwrite(string, ambiguous, options = {})
-      # Sum of all chars widths
-      res = string.codepoints.sum{ |codepoint|
-        if codepoint > 15 && codepoint < 161 # very common
-          next 1
+      index_full = INDEX[ambiguous_index_name]
+      index_low = FIRST_4096[ambiguous_index_name]
+      first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name]
+      string.each_codepoint{ |codepoint|
+        if codepoint > 15 && codepoint < first_ambiguous
+          width += 1
         elsif codepoint < 0x1001
-          width = FIRST_4096[codepoint]
+          width += index_low[codepoint] || 1
         else
-          width = INDEX
-          depth = INITIAL_DEPTH
-          while (width = width[codepoint / depth]).instance_of? Array
-            codepoint %= depth
-            depth /= 16
+          d = INITIAL_DEPTH
+          w = index_full[codepoint / d]
+          while w.instance_of? Array
+            w = w[(codepoint %= d) / (d /= 16)]
           end
-        end
-        width == :A ? ambiguous : (width || 1)
+          width += w || 1
+        end
       }
-      # Substract emoji error
-      res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
       # Return result + prevent negative lengths
-      res < 0 ? 0 : res
+      width < 0 ? 0 : width
     end
-    # Same as .width_no_overwrite - but with applying overwrites for each char
-    def self.width_all_features(string, ambiguous, overwrite, options)
-      # Sum of all chars widths
-      res = string.codepoints.sum{ |codepoint|
-        next overwrite[codepoint] if overwrite[codepoint]
+    # Returns width of custom overwrites and remaining string
+    def self.width_custom(string, overwrite)
+      width = 0
-        if codepoint > 15 && codepoint < 161 # very common
-          next 1
-        elsif codepoint < 0x1001
-          width = FIRST_4096[codepoint]
+      string = string.each_codepoint.select{ |codepoint|
+        if overwrite[codepoint]
+          width += overwrite[codepoint]
+          nil
         else
-          width = INDEX
-          depth = INITIAL_DEPTH
-          while (width = width[codepoint / depth]).instance_of? Array
-            codepoint %= depth
-            depth /= 16
-          end
+          codepoint
         end
+      }.pack("U*")
-        width == :A ? ambiguous : (width || 1)
-      }
+      [width, string]
+    end
-      # Substract emoji error
-      res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
+    # Returns width for ASCII-only strings. Will consider zero-width control symbols.
+    def self.width_ascii(string)
+      if string.match?(ASCII_NON_ZERO_REGEX)
+        res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE)
+        return res < 0 ? 0 : res
+      end
-      # Return result + prevent negative lengths
-      res < 0 ? 0 : res
+      string.bytesize
     end
+    # Returns width of all considered Emoji and remaining string
+    def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
+      res = 0
-    def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
-      require "unicode/emoji"
+      if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
+        emoji_width_via_possible(
+          string,
+          Unicode::Emoji.const_get(emoji_set_regex),
+          mode == :rgi_at,
+          ambiguous,
+        )
-      extra_width = 0
-      modifier_regex = /[#{ Unicode::Emoji::EMOJI_MODIFIERS.pack("U*") }]/
-      zwj_regex = /(?<=#{ [Unicode::Emoji::ZWJ].pack("U") })./
+      elsif mode == :all_no_vs16
+        no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ res += 2; "" }
+        [res, no_emoji_string]
-      string.scan(Unicode::Emoji::REGEX){ |emoji|
-        extra_width += 2 * emoji.scan(modifier_regex).size
+      elsif mode == :vs16
+        no_emoji_string = string.gsub(REGEX_EMOJI_VS16){ res += 2; "" }
+        [res, no_emoji_string]
-        emoji.scan(zwj_regex){ |zwj_succ|
-          extra_width += self.of(zwj_succ, ambiguous, overwrite)
-        }
+      elsif mode == :all
+        no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ res += 2; "" }
+        [res, no_emoji_string]
+      else
+        [0, string]
+      end
+    end
+    # Match possible Emoji first, then refine
+    def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS)
+      res = 0
+      # For each string possibly an emoji
+      no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate|
+        # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
+        if emoji_candidate == emoji_candidate[emoji_set_regex]
+          if strict_eaw
+            res += self.of(emoji_candidate[0], ambiguous, emoji: false)
+          else
+            res += 2
+          end
+          ""
+        # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
+        else
+          if !strict_eaw
+            # Ensure all explicit VS16 sequences have width 2
+            emoji_candidate.gsub!(REGEX_EMOJI_VS16){ res += 2; "" }
+          end
+          emoji_candidate
+        end
       }
-      extra_width
+      [res, no_emoji_string]
     end
-    def initialize(ambiguous: 1, overwrite: {}, emoji: false)
+    def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
+      unless old_options.empty?
+        warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
+        options.merge! old_options
+      end
+      options[:ambiguous] = ambiguous if ambiguous
+      options[:ambiguous] ||= DEFAULT_AMBIGUOUS
+      if options[:ambiguous] != 1 && options[:ambiguous] != 2
+        raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
+      end
+      if overwrite && !overwrite.empty?
+        warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
+        options[:overwrite] = overwrite
+      end
+      options[:overwrite] ||= {}
+      if [nil, true, :auto].include?(options[:emoji])
+        options[:emoji] = EmojiSupport.recommended
+      elsif options[:emoji] == false
+        options[:emoji] = :none
+      end
+      options
+    end
+    def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
       @ambiguous = ambiguous
       @overwrite = overwrite
       @emoji     = emoji
     end
     def get_config(**kwargs)
-      [
-        kwargs[:ambiguous] || @ambiguous,
-        kwargs[:overwrite] || @overwrite,
-        { emoji: kwargs[:emoji] || @emoji },
-      ]
+      {
+        ambiguous: kwargs[:ambiguous] || @ambiguous,
+        overwrite: kwargs[:overwrite] || @overwrite,
+        emoji:     kwargs[:emoji]     || @emoji,
+      }
     end
     def of(string, **kwargs)
-      self.class.of(string, *get_config(**kwargs))
+      self.class.of(string, **get_config(**kwargs))
     end
   end
 end

metadata CHANGED Viewed

@@ -1,15 +1,35 @@
 --- !ruby/object:Gem::Specification
 name: unicode-display_width
 version: !ruby/object:Gem::Version
-  version: 2.6.0
+  version: 3.1.4
 platform: ruby
 authors:
 - Jan Lelis
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-09-13 00:00:00.000000000 Z
+date: 2025-01-13 00:00:00.000000000 Z
 dependencies:
+- !ruby/object:Gem::Dependency
+  name: unicode-emoji
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 4.0.4
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 4.0.4
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
@@ -39,7 +59,8 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '13.0'
 description: "[Unicode 16.0.0] Determines the monospace display width of a string
-  using EastAsianWidth.txt, Unicode general category, and other data."
+  using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
+  data."
 email:
 - hi@ruby.consulting
 executables: []
@@ -55,8 +76,10 @@ files:
 - data/display_width.marshal.gz
 - lib/unicode/display_width.rb
 - lib/unicode/display_width/constants.rb
+- lib/unicode/display_width/emoji_support.rb
 - lib/unicode/display_width/index.rb
 - lib/unicode/display_width/no_string_ext.rb
+- lib/unicode/display_width/reline_ext.rb
 - lib/unicode/display_width/string_ext.rb
 homepage: https://github.com/janlelis/unicode-display_width
 licenses:
@@ -74,14 +97,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 2.4.0
+      version: 2.5.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.5.9
+rubygems_version: 3.5.21
 signing_key:
 specification_version: 4
 summary: Determines the monospace display width of a string in Ruby.