RubyGems - unicode-display_width - Versions diffs - 2.6.0 → 3.0.0 - Mend

unicode-display_width 2.6.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +49 -2
data/README.md +68 -50
data/data/display_width.marshal.gz +0 -0
data/lib/unicode/display_width/constants.rb +1 -1
data/lib/unicode/display_width/emoji_support.rb +41 -0
data/lib/unicode/display_width/reline_ext.rb +14 -0
data/lib/unicode/display_width/string_ext.rb +3 -3
data/lib/unicode/display_width.rb +191 -73
metadata +22 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eb663fb7dd6d3409dd3b21bd4a793d954ad8fd9b974593868292b5ec59ba7c01
-  data.tar.gz: e960ab9c24135cb1d7872e84c4e3d7b24f83a5ae85b12a14af27312a90241597
+  metadata.gz: 8ee4f0ac31dae0855f4de659fac788fb36a7298cc5a69cd2b4e104a709bab351
+  data.tar.gz: 5928bdbfd92df1baba4249fca92302240dfb4ad90579248085c1f5e103c3fc0d
 SHA512:
-  metadata.gz: bd4fb14101159588eec1c2bf6871a94e297e6314317ee10ce320b09fc30e01f35d8610cdd3dfe32edb979f0ece6053914a23298d6b46732f22d16f642576aacf
-  data.tar.gz: b02c66363a1740303715e30b8023e0cc2de99baf78f183ec2ad48b18ff26ac95da4e496f604aa4bda1dcf691ad4fdff644b9e1d7c41922ba078b6c35653debc6
+  metadata.gz: e5af487be1d49d54f383cd8fc5cc0ea384714297537f5e23ef37f457bea5ff80a469e954ecdbf345eb6e966b29aad2f9557af0986a11642d66e21e9bf8309603
+  data.tar.gz: 7e6441597613b829540389e36b6d51415858ba05222ba14c33dec37c883d125a96d6566f3e4f65b666b7de89e9833e568fcb33e8a1a5f2f9ba5cb69616e63b0f

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,34 @@
 # CHANGELOG
+## 3.0.0
+**Rework Emoji support:**
+- Emoji widths are now enabled by default
+- Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
+- VS16 turns Emoji characters of width 1 into full-width
+- Please note that Emoji parsing has a notable impact on performance.
+  You can use the `emoji: false` option to disable Emoji adjustments
+- Tries to detect terminal's Emoji support level automatically (from ENV vars)
+**Index fixes and updates:**
+- Private-use characters are considered ambiguous (were given width 1 before)
+- Fix that a few zero-width ignorable codepoints from recent Unicode were missing
+- Consider the following separators to be zero-width:
+  - U+2028 - LINE SEPARATOR - Zl
+  - U+2029 - PARAGRAPH SEPARATOR - Zp
+**Other:**
+- Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
+  with overwrite values as third parameter, be sure to put it in curly braces.
+- Using third parameter or explicit hash as fourth parameter is deprecated,
+  please migrate to the keyword arguments API
+- Gem raises `ArgumentError` for ambiguous values other than 1 or 2
+- Performance optimizations
+- Require Ruby 2.5
 ## 2.6.0
 - Unicode 16
@@ -40,8 +69,26 @@ More performance improvements:
 ## 2.0.0
-- Release 2.0.0
-- Supports Ruby 3.0
+Add Support for Ruby 3.0
+### Breaking Changes
+Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
+- Aliases of display\_width (…\_size, …\_length) have been removed
+- Auto-loading of string core extension has been removed:
+If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
+```ruby
+require "unicode/display_width/string_ext"
+```
+You could also change your `Gemfile` line to achieve this:
+```ruby
+gem "unicode-display_width", require: "unicode/display_width/string_ext"
+```
 ## 2.0.0.pre2

data/README.md CHANGED Viewed

@@ -1,39 +1,22 @@
-## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
+# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
-Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
+Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
 Unicode version: **16.0.0** (September 2024)
-Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
+## Gem Version 3.0 — Improved Emoji Support
-Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**
+**Emoji support is now enabled by default.** See below for description and configuration possibilities.
-For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
+**Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
-## Version 2.4.2 — Performance Updates
+See [CHANGELOG](/CHANGELOG.md) for details.
-**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
-This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
-## Version 2.0 — Breaking Changes
-Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
+## Gem Version 2.4.2 — Performance Updates
-- Aliases of display_width (…\_size, …\_length) have been removed
-- Auto-loading of string core extension has been removed:
-If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
-```ruby
-require "unicode/display_width/string_ext"
-```
-You could also change your `Gemfile` line to achieve this:
+**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
-```ruby
-gem "unicode-display_width", require: "unicode/display_width/string_ext"
-```
+This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
 ## Introduction to Character Widths
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
 Width  | Characters                   | Comment
 -------|------------------------------|--------------------------------------------------
-X      | (user defined)               | Overwrites any other values
+?      | (user defined)               | Overwrites any other values
+?      | Emoji                        | See "How this Library Handles Emoji Width" below
 -1     | `"\b"`                       | Backspace (total width never below 0)
 0      | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
 1      | `"\u{00AD}"`                 | SOFT HYPHEN
 2      | `"\u{2E3A}"`                 | TWO-EM DASH
 3      | `"\u{2E3B}"`                 | THREE-EM DASH
-0      | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
-0      | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"`     | HANGUL JUNGSEONG
-0      | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
+0      | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
+0      | Derived Property: Default_Ignorable_Code_Point     | Ignorable ranges
+0      | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
 2      | East Asian Width: F, W       | Full-width characters
 2      | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
 1 or 2 | East Asian Width: A          | Ambiguous characters, user defined, default: 1
@@ -71,8 +55,6 @@ Or add to your Gemfile:
 ## Usage
-### Classic API
 ```ruby
 require 'unicode/display_width'
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
 Unicode::DisplayWidth.of("一") # => 2
 ```
-#### Ambiguous Characters
+### Ambiguous Characters
 The second parameter defines the value returned by characters defined as ambiguous:
@@ -89,34 +71,70 @@ Unicode::DisplayWidth.of("·", 1) # => 1
 Unicode::DisplayWidth.of("·", 2) # => 2
 ```
-#### Custom Overwrites
+### Custom Overwrites
-You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
+You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
 ```ruby
-Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
+Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
 ```
 Please note that using overwrites disables some perfomance optimizations of this gem.
+### Emoji Option
-#### Emoji Support
-Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
+The gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
 ```ruby
-gem 'unicode-display_width'
-gem 'unicode-emoji'
+Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 2
+Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
 ```
-Enable the emoji string width adjustments by passing `emoji: true` as fourth parameter:
+Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:
 ```ruby
-Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
-Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2
+Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_mqe # => 3
+Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_uqe # => 2
 ```
-#### Usage with String Extension
+#### How this Library Handles Emoji Width
+There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
+Emoji Type  | Width / Comment
+------------|----------------
+Basic/Single Emoji character without Variation Selector | No special handling, uses mechanism from table above
+Basic/Single Emoji character with VS15 (Text)           | No special handling, uses mechanism from table above
+Basic/Single Emoji character with VS16 (Emoji)          | 2
+Emoji Sequence                                          | 2 (only if sequence belongs to configured Emoji set)
+The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
+Option | Descriptions
+-------|-------------
+`emoji: true`     | Use recommended Emoji set on your platform, see section below
+`emoji: :basic`   | No width adjustments for Emoji sequences: all partial Emoji treated separately
+`emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
+`emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
+`emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
+`emoji: :all`     | All possible/well-formed Emoji sequences are considered to have a width of 2
+`emoji: false`    | No Emoji adjustments, Emoji characters with VS16 not handled
+*RGI Emoji:* Emoji Recommended for General Interchange
+*Qualification:* Whether an Emoji sequence has all required VS16 codepoints
+See [emoji-test.txt](https://www.unicode.org/Public/emoji/16.0/emoji-test.txt), the [unicode-emoji gem](https://github.com/janlelis/unicode-emoji) and [UTS-51](https://www.unicode.org/reports/tr51/#def_qualified_emoji_character) for more details about qualified and unqualified Emoji sequences.
+#### Emoji Support in Terminals
+Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji set for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:basic` on Gnome's terminal widget).
+Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
+You are encouraged to give your users the option to configure the level of Emoji support in your library or application and for the best developer experience in their terminals. (same is true for ambigouos width).
+### Usage with String Extension
 ```ruby
 require 'unicode/display_width/string_ext'
@@ -125,9 +143,9 @@ require 'unicode/display_width/string_ext'
 '一'.display_width # => 2
 ```
-### Modern API: Keyword-arguments Based Config Object
+### Usage with Config Object
-Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
+You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
 ```ruby
 require 'unicode/display_width'
@@ -135,15 +153,15 @@ require 'unicode/display_width'
 display_width = Unicode::DisplayWidth.new(
   # ambiguous: 1,
   overwrite: { "A".ord => 100 },
-  emoji: true,
+  emoji: :all,
 )
 display_width.of "⚀" # => 1
-display_width.of "🤾🏽‍♀️" # => 2
+display_width.of "🤠‍🤢" # => 2
 display_width.of "A" # => 100
 ```
-### Usage From the CLI
+### Usage from the Command-Line
 Use this one-liner to print out display widths for strings from the command-line:

data/data/display_width.marshal.gz CHANGED Viewed

Binary file

data/lib/unicode/display_width/constants.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module Unicode
   class DisplayWidth
-    VERSION = "2.6.0"
+    VERSION = "3.0.0"
     UNICODE_VERSION = "16.0.0"
     DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
     INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"

data/lib/unicode/display_width/emoji_support.rb ADDED Viewed

@@ -0,0 +1,41 @@
+# require "rbconfig"
+# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
+module Unicode
+  class DisplayWidth
+    module EmojiSupport
+      # Tries to find out which terminal emulator is used to
+      # set emoji: config to best suiting value
+      #
+      # Please note: Many terminals do not set any ENV vars
+      def self.recommended
+        if ENV["CI"]
+          return :rqi_uqe
+        end
+        case ENV["TERM_PROGRAM"]
+        when "iTerm.app"
+          return :all
+        when "Apple_Terminal"
+          return :rgi_uqe
+        end
+        case ENV["TERM"]
+        when "contour"
+          return :rgi_uqe
+        when /kitty/
+          return :rgi_fqe
+        end
+        # As of last time checked: gnome-terminal, vscode, alacritty, konsole
+        :basic
+      end
+      # Maybe: Implement something like https://github.com/jquast/ucs-detect
+      #        which uses the terminal cursor to check for best support level
+      #        at runtime
+      # def self.detect!
+      # end
+    end
+  end
+end

data/lib/unicode/display_width/reline_ext.rb ADDED Viewed

@@ -0,0 +1,14 @@
+# Experimental
+# Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
+require "reline"
+require "reline/unicode"
+require_relative "../display_width"
+class Reline::Unicode
+  def self.get_mbchar_width(mbchar)
+    Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
+  end
+end

data/lib/unicode/display_width/string_ext.rb CHANGED Viewed

@@ -1,9 +1,9 @@
 # frozen_string_literal: true
-require_relative "../display_width" unless defined? Unicode::DisplayWidth
+require_relative "../display_width"
 class String
-  def display_width(ambiguous = 1, overwrite = {}, options = {})
-    Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
+  def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
+    Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
   end
 end

data/lib/unicode/display_width.rb CHANGED Viewed

@@ -1,122 +1,240 @@
 # frozen_string_literal: true
+require "unicode/emoji"
 require_relative "display_width/constants"
 require_relative "display_width/index"
+require_relative "display_width/emoji_support"
 module Unicode
   class DisplayWidth
     INITIAL_DEPTH = 0x10000
     ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
-    FIRST_4096 = decompress_index(INDEX[0][0], 1)
-    def self.of(string, ambiguous = 1, overwrite = {}, options = {})
-      if overwrite.empty?
-        # Optimization for ASCII-only strings without certain control symbols
-        if string.ascii_only?
-          if string.match?(ASCII_NON_ZERO_REGEX)
-            res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
-            res < 0 ? 0 : res
-          else
-            string.size
-          end
-        else
-          width_no_overwrite(string, ambiguous, options)
+    ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
+    ASCII_BACKSPACE = "\b"
+    AMBIGUOUS_MAP = {
+      1 => :WIDTH_ONE,
+      2 => :WIDTH_TWO,
+    }
+    FIRST_AMBIGUOUS = {
+      WIDTH_ONE: 768,
+      WIDTH_TWO: 161,
+    }
+    FIRST_4096 = {
+      WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
+      WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
+    }
+    EMOJI_SEQUENCES_REGEX_MAPPING = {
+      rgi_fqe: :REGEX,
+      rgi_mqe: :REGEX_INCLUDE_MQE,
+      rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
+      all:     :REGEX_WELL_FORMED,
+    }
+    EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
+    REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
+    # Returns monospace display width of string
+    def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
+      unless old_options.empty?
+        warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
+        options.merge! old_options
+      end
+      options[:ambiguous] = ambiguous if ambiguous
+      options[:ambiguous] ||= 1
+      if options[:ambiguous] != 1 && options[:ambiguous] != 2
+        raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
+      end
+      if overwrite && !overwrite.empty?
+        warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
+        options[:overwrite] = overwrite
+      end
+      options[:overwrite] ||= {}
+      if options[:emoji] == nil || options[:emoji] == true
+        options[:emoji] = EmojiSupport.recommended
+      end
+      # # #
+      if !options[:overwrite].empty?
+        return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
+          width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite])
         end
-      else
-        width_all_features(string, ambiguous, overwrite, options)
       end
-    end
-    def self.width_no_overwrite(string, ambiguous, options = {})
-      # Sum of all chars widths
-      res = string.codepoints.sum{ |codepoint|
-        if codepoint > 15 && codepoint < 161 # very common
-          next 1
-        elsif codepoint < 0x1001
-          width = FIRST_4096[codepoint]
-        else
-          width = INDEX
-          depth = INITIAL_DEPTH
-          while (width = width[codepoint / depth]).instance_of? Array
-            codepoint %= depth
-            depth /= 16
-          end
+      if !string.ascii_only?
+        return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
+          width_no_overwrite(string, index_full, index_low, first_ambiguous)
         end
+      end
-        width == :A ? ambiguous : (width || 1)
-      }
+      width_ascii(string)
+    end
+    def self.width_ascii(string)
+      # Optimization for ASCII-only strings without certain control symbols
+      if string.match?(ASCII_NON_ZERO_REGEX)
+        res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE)
+        return res < 0 ? 0 : res
+      end
+      # Pure ASCII
+      string.size
+    end
+    def self.width_frame(string, options)
+      # Retrieve Emoji width
+      if !options[:emoji]
+        res = 0
+      else options[:emoji]
+        res, string = emoji_width(
+          string,
+          options[:emoji],
+        )
+      end
-      # Substract emoji error
-      res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
+      # Prepare indexes
+      ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
+      # Get general width
+      res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name])
       # Return result + prevent negative lengths
       res < 0 ? 0 : res
     end
-    # Same as .width_no_overwrite - but with applying overwrites for each char
-    def self.width_all_features(string, ambiguous, overwrite, options)
-      # Sum of all chars widths
-      res = string.codepoints.sum{ |codepoint|
-        next overwrite[codepoint] if overwrite[codepoint]
+    def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {})
+      res = 0
+      # Make sure we have UTF-8
+      string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
+      string.scan(/.{,80}/m){ |batch|
+        if batch.ascii_only?
+          res += batch.size
+        else
+          batch.each_codepoint{ |codepoint|
+            if codepoint > 15 && codepoint < first_ambiguous
+              res += 1
+            elsif codepoint < 0x1001
+              res += index_low[codepoint] || 1
+            else
+              d = INITIAL_DEPTH
+              w = index_full[codepoint / d]
+              while w.instance_of? Array
+                w = w[(codepoint %= d) / (d /= 16)]
+              end
+              res += w || 1
+            end
+          }
+        end
+      }
-        if codepoint > 15 && codepoint < 161 # very common
-          next 1
+      res
+    end
+    # Same as .width_no_overwrite - but with applying overwrites for each char
+    def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite)
+      res = 0
+      string.each_codepoint{ |codepoint|
+        if overwrite[codepoint]
+          res += overwrite[codepoint]
+        elsif codepoint > 15 && codepoint < first_ambiguous
+          res += 1
         elsif codepoint < 0x1001
-          width = FIRST_4096[codepoint]
+          res += index_low[codepoint] || 1
         else
-          width = INDEX
-          depth = INITIAL_DEPTH
-          while (width = width[codepoint / depth]).instance_of? Array
-            codepoint %= depth
-            depth /= 16
+          d = INITIAL_DEPTH
+          w = index_full[codepoint / d]
+          while w.instance_of? Array
+            w = w[(codepoint %= d) / (d /= 16)]
           end
-        end
-        width == :A ? ambiguous : (width || 1)
+          res += w || 1
+        end
       }
-      # Substract emoji error
-      res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
-      # Return result + prevent negative lengths
-      res < 0 ? 0 : res
+      res
     end
-    def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
-      require "unicode/emoji"
+    def self.emoji_width(string, sequences = :rgi_fqe)
+      res = 0
+      if regex = EMOJI_SEQUENCES_REGEX_MAPPING[sequences]
+        emoji_sequence_regex = Unicode::Emoji.const_get(regex)
+      else # sequences == :basic
+        emoji_sequence_regex = nil
+      end
+      # Make sure we have UTF-8
+      string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
-      extra_width = 0
-      modifier_regex = /[#{ Unicode::Emoji::EMOJI_MODIFIERS.pack("U*") }]/
-      zwj_regex = /(?<=#{ [Unicode::Emoji::ZWJ].pack("U") })./
+      if emoji_sequence_regex
+        # For each string possibly an emoji
+        no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
+          # Skip notorious false positives
+          if EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
+            emoji_candidate
-      string.scan(Unicode::Emoji::REGEX){ |emoji|
-        extra_width += 2 * emoji.scan(modifier_regex).size
+          # Check if we have a combined Emoji with width 2
+          elsif emoji_candidate == emoji_candidate[emoji_sequence_regex]
+            res += 2
+            ""
-        emoji.scan(zwj_regex){ |zwj_succ|
-          extra_width += self.of(zwj_succ, ambiguous, overwrite)
+          # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
+          else
+            # Ensure all explicit VS16 sequences have width 2
+            emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
+              if basic_emoji.size == 2 # VS16 present
+                res += 2
+                ""
+              else
+                basic_emoji
+              end
+            }
+            emoji_candidate
+          end
         }
-      }
+      else
+        # Only consider basic emoji
-      extra_width
+        # Ensure all explicit VS16 sequences have width 2
+        no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
+          if basic_emoji.size >= 2 # VS16 present
+            res += 2
+            ""
+          else
+            basic_emoji
+          end
+        }
+      end
+      [res, no_emoji_string]
     end
-    def initialize(ambiguous: 1, overwrite: {}, emoji: false)
+    def initialize(ambiguous: 1, overwrite: {}, emoji: true)
       @ambiguous = ambiguous
       @overwrite = overwrite
       @emoji     = emoji
     end
     def get_config(**kwargs)
-      [
-        kwargs[:ambiguous] || @ambiguous,
-        kwargs[:overwrite] || @overwrite,
-        { emoji: kwargs[:emoji] || @emoji },
-      ]
+      {
+        ambiguous: kwargs[:ambiguous] || @ambiguous,
+        overwrite: kwargs[:overwrite] || @overwrite,
+        emoji:     kwargs[:emoji]     || @emoji,
+      }
     end
     def of(string, **kwargs)
-      self.class.of(string, *get_config(**kwargs))
+      self.class.of(string, **get_config(**kwargs))
     end
   end
 end

metadata CHANGED Viewed

@@ -1,15 +1,29 @@
 --- !ruby/object:Gem::Specification
 name: unicode-display_width
 version: !ruby/object:Gem::Version
-  version: 2.6.0
+  version: 3.0.0
 platform: ruby
 authors:
 - Jan Lelis
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-09-13 00:00:00.000000000 Z
+date: 2024-11-13 00:00:00.000000000 Z
 dependencies:
+- !ruby/object:Gem::Dependency
+  name: unicode-emoji
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '4.0'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
@@ -39,7 +53,8 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '13.0'
 description: "[Unicode 16.0.0] Determines the monospace display width of a string
-  using EastAsianWidth.txt, Unicode general category, and other data."
+  using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
+  data."
 email:
 - hi@ruby.consulting
 executables: []
@@ -55,8 +70,10 @@ files:
 - data/display_width.marshal.gz
 - lib/unicode/display_width.rb
 - lib/unicode/display_width/constants.rb
+- lib/unicode/display_width/emoji_support.rb
 - lib/unicode/display_width/index.rb
 - lib/unicode/display_width/no_string_ext.rb
+- lib/unicode/display_width/reline_ext.rb
 - lib/unicode/display_width/string_ext.rb
 homepage: https://github.com/janlelis/unicode-display_width
 licenses:
@@ -74,14 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: 2.4.0
+      version: 2.5.0
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.5.9
+rubygems_version: 3.5.21
 signing_key:
 specification_version: 4
 summary: Determines the monospace display width of a string in Ruby.