unicode-display_width 2.6.0 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb663fb7dd6d3409dd3b21bd4a793d954ad8fd9b974593868292b5ec59ba7c01
4
- data.tar.gz: e960ab9c24135cb1d7872e84c4e3d7b24f83a5ae85b12a14af27312a90241597
3
+ metadata.gz: 8ee4f0ac31dae0855f4de659fac788fb36a7298cc5a69cd2b4e104a709bab351
4
+ data.tar.gz: 5928bdbfd92df1baba4249fca92302240dfb4ad90579248085c1f5e103c3fc0d
5
5
  SHA512:
6
- metadata.gz: bd4fb14101159588eec1c2bf6871a94e297e6314317ee10ce320b09fc30e01f35d8610cdd3dfe32edb979f0ece6053914a23298d6b46732f22d16f642576aacf
7
- data.tar.gz: b02c66363a1740303715e30b8023e0cc2de99baf78f183ec2ad48b18ff26ac95da4e496f604aa4bda1dcf691ad4fdff644b9e1d7c41922ba078b6c35653debc6
6
+ metadata.gz: e5af487be1d49d54f383cd8fc5cc0ea384714297537f5e23ef37f457bea5ff80a469e954ecdbf345eb6e966b29aad2f9557af0986a11642d66e21e9bf8309603
7
+ data.tar.gz: 7e6441597613b829540389e36b6d51415858ba05222ba14c33dec37c883d125a96d6566f3e4f65b666b7de89e9833e568fcb33e8a1a5f2f9ba5cb69616e63b0f
data/CHANGELOG.md CHANGED
@@ -1,5 +1,34 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 3.0.0
4
+
5
+ **Rework Emoji support:**
6
+
7
+ - Emoji widths are now enabled by default
8
+ - Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
9
+ - VS16 turns Emoji characters of width 1 into full-width
10
+ - Please note that Emoji parsing has a notable impact on performance.
11
+ You can use the `emoji: false` option to disable Emoji adjustments
12
+ - Tries to detect terminal's Emoji support level automatically (from ENV vars)
13
+
14
+ **Index fixes and updates:**
15
+
16
+ - Private-use characters are considered ambiguous (were given width 1 before)
17
+ - Fix that a few zero-width ignorable codepoints from recent Unicode were missing
18
+ - Consider the following separators to be zero-width:
19
+ - U+2028 - LINE SEPARATOR - Zl
20
+ - U+2029 - PARAGRAPH SEPARATOR - Zp
21
+
22
+ **Other:**
23
+
24
+ - Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
25
+ with overwrite values as third parameter, be sure to put it in curly braces.
26
+ - Using third parameter or explicit hash as fourth parameter is deprecated,
27
+ please migrate to the keyword arguments API
28
+ - Gem raises `ArgumentError` for ambiguous values other than 1 or 2
29
+ - Performance optimizations
30
+ - Require Ruby 2.5
31
+
3
32
  ## 2.6.0
4
33
 
5
34
  - Unicode 16
@@ -40,8 +69,26 @@ More performance improvements:
40
69
 
41
70
  ## 2.0.0
42
71
 
43
- - Release 2.0.0
44
- - Supports Ruby 3.0
72
+ Add Support for Ruby 3.0
73
+
74
+ ### Breaking Changes
75
+
76
+ Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
77
+
78
+ - Aliases of display\_width (…\_size, …\_length) have been removed
79
+ - Auto-loading of string core extension has been removed:
80
+
81
+ If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
82
+
83
+ ```ruby
84
+ require "unicode/display_width/string_ext"
85
+ ```
86
+
87
+ You could also change your `Gemfile` line to achieve this:
88
+
89
+ ```ruby
90
+ gem "unicode-display_width", require: "unicode/display_width/string_ext"
91
+ ```
45
92
 
46
93
  ## 2.0.0.pre2
47
94
 
data/README.md CHANGED
@@ -1,39 +1,22 @@
1
- ## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
1
+ # Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
2
2
 
3
- Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
3
+ Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
4
4
 
5
5
  Unicode version: **16.0.0** (September 2024)
6
6
 
7
- Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
7
+ ## Gem Version 3.0 Improved Emoji Support
8
8
 
9
- Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**
9
+ **Emoji support is now enabled by default.** See below for description and configuration possibilities.
10
10
 
11
- For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
11
+ **Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
12
12
 
13
- ## Version 2.4.2 Performance Updates
13
+ See [CHANGELOG](/CHANGELOG.md) for details.
14
14
 
15
- **If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
16
-
17
- This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
18
-
19
- ## Version 2.0 — Breaking Changes
20
-
21
- Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
15
+ ## Gem Version 2.4.2 Performance Updates
22
16
 
23
- - Aliases of display_width (…\_size, …\_length) have been removed
24
- - Auto-loading of string core extension has been removed:
25
-
26
- If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
27
-
28
- ```ruby
29
- require "unicode/display_width/string_ext"
30
- ```
31
-
32
- You could also change your `Gemfile` line to achieve this:
17
+ **If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
33
18
 
34
- ```ruby
35
- gem "unicode-display_width", require: "unicode/display_width/string_ext"
36
- ```
19
+ This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
37
20
 
38
21
  ## Introduction to Character Widths
39
22
 
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
45
28
 
46
29
  Width | Characters | Comment
47
30
  -------|------------------------------|--------------------------------------------------
48
- X | (user defined) | Overwrites any other values
31
+ ? | (user defined) | Overwrites any other values
32
+ ? | Emoji | See "How this Library Handles Emoji Width" below
49
33
  -1 | `"\b"` | Backspace (total width never below 0)
50
34
  0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
51
35
  1 | `"\u{00AD}"` | SOFT HYPHEN
52
36
  2 | `"\u{2E3A}"` | TWO-EM DASH
53
37
  3 | `"\u{2E3B}"` | THREE-EM DASH
54
- 0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
55
- 0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
56
- 0 | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
38
+ 0 | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
39
+ 0 | Derived Property: Default_Ignorable_Code_Point | Ignorable ranges
40
+ 0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
57
41
  2 | East Asian Width: F, W | Full-width characters
58
42
  2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
59
43
  1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1
@@ -71,8 +55,6 @@ Or add to your Gemfile:
71
55
 
72
56
  ## Usage
73
57
 
74
- ### Classic API
75
-
76
58
  ```ruby
77
59
  require 'unicode/display_width'
78
60
 
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
80
62
  Unicode::DisplayWidth.of("一") # => 2
81
63
  ```
82
64
 
83
- #### Ambiguous Characters
65
+ ### Ambiguous Characters
84
66
 
85
67
  The second parameter defines the value returned by characters defined as ambiguous:
86
68
 
@@ -89,34 +71,70 @@ Unicode::DisplayWidth.of("·", 1) # => 1
89
71
  Unicode::DisplayWidth.of("·", 2) # => 2
90
72
  ```
91
73
 
92
- #### Custom Overwrites
74
+ ### Custom Overwrites
93
75
 
94
- You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
76
+ You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
95
77
 
96
78
  ```ruby
97
- Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
79
+ Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
98
80
  ```
99
81
 
100
82
  Please note that using overwrites disables some perfomance optimizations of this gem.
101
83
 
84
+ ### Emoji Option
102
85
 
103
- #### Emoji Support
104
-
105
- Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
86
+ The gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
106
87
 
107
88
  ```ruby
108
- gem 'unicode-display_width'
109
- gem 'unicode-emoji'
89
+ Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 2
90
+ Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
110
91
  ```
111
92
 
112
- Enable the emoji string width adjustments by passing `emoji: true` as fourth parameter:
93
+ Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:
113
94
 
114
95
  ```ruby
115
- Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
116
- Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2
96
+ Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_mqe # => 3
97
+ Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_uqe # => 2
117
98
  ```
118
99
 
119
- #### Usage with String Extension
100
+ #### How this Library Handles Emoji Width
101
+
102
+ There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
103
+
104
+ Emoji Type | Width / Comment
105
+ ------------|----------------
106
+ Basic/Single Emoji character without Variation Selector | No special handling, uses mechanism from table above
107
+ Basic/Single Emoji character with VS15 (Text) | No special handling, uses mechanism from table above
108
+ Basic/Single Emoji character with VS16 (Emoji) | 2
109
+ Emoji Sequence | 2 (only if sequence belongs to configured Emoji set)
110
+
111
+ The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
112
+
113
+ Option | Descriptions
114
+ -------|-------------
115
+ `emoji: true` | Use recommended Emoji set on your platform, see section below
116
+ `emoji: :basic` | No width adjustments for Emoji sequences: all partial Emoji treated separately
117
+ `emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
118
+ `emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
119
+ `emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
120
+ `emoji: :all` | All possible/well-formed Emoji sequences are considered to have a width of 2
121
+ `emoji: false` | No Emoji adjustments, Emoji characters with VS16 not handled
122
+
123
+ *RGI Emoji:* Emoji Recommended for General Interchange
124
+
125
+ *Qualification:* Whether an Emoji sequence has all required VS16 codepoints
126
+
127
+ See [emoji-test.txt](https://www.unicode.org/Public/emoji/16.0/emoji-test.txt), the [unicode-emoji gem](https://github.com/janlelis/unicode-emoji) and [UTS-51](https://www.unicode.org/reports/tr51/#def_qualified_emoji_character) for more details about qualified and unqualified Emoji sequences.
128
+
129
+ #### Emoji Support in Terminals
130
+
131
+ Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji set for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:basic` on Gnome's terminal widget).
132
+
133
+ Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
134
+
135
+ You are encouraged to give your users the option to configure the level of Emoji support in your library or application and for the best developer experience in their terminals. (same is true for ambigouos width).
136
+
137
+ ### Usage with String Extension
120
138
 
121
139
  ```ruby
122
140
  require 'unicode/display_width/string_ext'
@@ -125,9 +143,9 @@ require 'unicode/display_width/string_ext'
125
143
  '一'.display_width # => 2
126
144
  ```
127
145
 
128
- ### Modern API: Keyword-arguments Based Config Object
146
+ ### Usage with Config Object
129
147
 
130
- Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
148
+ You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
131
149
 
132
150
  ```ruby
133
151
  require 'unicode/display_width'
@@ -135,15 +153,15 @@ require 'unicode/display_width'
135
153
  display_width = Unicode::DisplayWidth.new(
136
154
  # ambiguous: 1,
137
155
  overwrite: { "A".ord => 100 },
138
- emoji: true,
156
+ emoji: :all,
139
157
  )
140
158
 
141
159
  display_width.of "⚀" # => 1
142
- display_width.of "🤾🏽‍♀️" # => 2
160
+ display_width.of "🤠‍🤢" # => 2
143
161
  display_width.of "A" # => 100
144
162
  ```
145
163
 
146
- ### Usage From the CLI
164
+ ### Usage from the Command-Line
147
165
 
148
166
  Use this one-liner to print out display widths for strings from the command-line:
149
167
 
Binary file
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "2.6.0"
5
+ VERSION = "3.0.0"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
@@ -0,0 +1,41 @@
1
+ # require "rbconfig"
2
+ # RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
3
+
4
+ module Unicode
5
+ class DisplayWidth
6
+ module EmojiSupport
7
+ # Tries to find out which terminal emulator is used to
8
+ # set emoji: config to best suiting value
9
+ #
10
+ # Please note: Many terminals do not set any ENV vars
11
+ def self.recommended
12
+ if ENV["CI"]
13
+ return :rqi_uqe
14
+ end
15
+
16
+ case ENV["TERM_PROGRAM"]
17
+ when "iTerm.app"
18
+ return :all
19
+ when "Apple_Terminal"
20
+ return :rgi_uqe
21
+ end
22
+
23
+ case ENV["TERM"]
24
+ when "contour"
25
+ return :rgi_uqe
26
+ when /kitty/
27
+ return :rgi_fqe
28
+ end
29
+
30
+ # As of last time checked: gnome-terminal, vscode, alacritty, konsole
31
+ :basic
32
+ end
33
+
34
+ # Maybe: Implement something like https://github.com/jquast/ucs-detect
35
+ # which uses the terminal cursor to check for best support level
36
+ # at runtime
37
+ # def self.detect!
38
+ # end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,14 @@
1
+ # Experimental
2
+ # Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
3
+
4
+ require "reline"
5
+ require "reline/unicode"
6
+
7
+ require_relative "../display_width"
8
+
9
+ class Reline::Unicode
10
+ def self.get_mbchar_width(mbchar)
11
+ Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
12
+ end
13
+ end
14
+
@@ -1,9 +1,9 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require_relative "../display_width" unless defined? Unicode::DisplayWidth
3
+ require_relative "../display_width"
4
4
 
5
5
  class String
6
- def display_width(ambiguous = 1, overwrite = {}, options = {})
7
- Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
6
+ def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
7
+ Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
8
8
  end
9
9
  end
@@ -1,122 +1,240 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "unicode/emoji"
4
+
3
5
  require_relative "display_width/constants"
4
6
  require_relative "display_width/index"
7
+ require_relative "display_width/emoji_support"
5
8
 
6
9
  module Unicode
7
10
  class DisplayWidth
8
11
  INITIAL_DEPTH = 0x10000
9
12
  ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
10
- FIRST_4096 = decompress_index(INDEX[0][0], 1)
11
-
12
- def self.of(string, ambiguous = 1, overwrite = {}, options = {})
13
- if overwrite.empty?
14
- # Optimization for ASCII-only strings without certain control symbols
15
- if string.ascii_only?
16
- if string.match?(ASCII_NON_ZERO_REGEX)
17
- res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
18
- res < 0 ? 0 : res
19
- else
20
- string.size
21
- end
22
- else
23
- width_no_overwrite(string, ambiguous, options)
13
+ ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
14
+ ASCII_BACKSPACE = "\b"
15
+ AMBIGUOUS_MAP = {
16
+ 1 => :WIDTH_ONE,
17
+ 2 => :WIDTH_TWO,
18
+ }
19
+ FIRST_AMBIGUOUS = {
20
+ WIDTH_ONE: 768,
21
+ WIDTH_TWO: 161,
22
+ }
23
+ FIRST_4096 = {
24
+ WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
25
+ WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
26
+ }
27
+ EMOJI_SEQUENCES_REGEX_MAPPING = {
28
+ rgi_fqe: :REGEX,
29
+ rgi_mqe: :REGEX_INCLUDE_MQE,
30
+ rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
31
+ all: :REGEX_WELL_FORMED,
32
+ }
33
+ EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
34
+ REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
35
+
36
+ # Returns monospace display width of string
37
+ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
38
+ unless old_options.empty?
39
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
40
+ options.merge! old_options
41
+ end
42
+
43
+ options[:ambiguous] = ambiguous if ambiguous
44
+ options[:ambiguous] ||= 1
45
+
46
+ if options[:ambiguous] != 1 && options[:ambiguous] != 2
47
+ raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
48
+ end
49
+
50
+ if overwrite && !overwrite.empty?
51
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
52
+ options[:overwrite] = overwrite
53
+ end
54
+ options[:overwrite] ||= {}
55
+
56
+ if options[:emoji] == nil || options[:emoji] == true
57
+ options[:emoji] = EmojiSupport.recommended
58
+ end
59
+
60
+ # # #
61
+
62
+ if !options[:overwrite].empty?
63
+ return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
64
+ width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite])
24
65
  end
25
- else
26
- width_all_features(string, ambiguous, overwrite, options)
27
66
  end
28
- end
29
67
 
30
- def self.width_no_overwrite(string, ambiguous, options = {})
31
- # Sum of all chars widths
32
- res = string.codepoints.sum{ |codepoint|
33
- if codepoint > 15 && codepoint < 161 # very common
34
- next 1
35
- elsif codepoint < 0x1001
36
- width = FIRST_4096[codepoint]
37
- else
38
- width = INDEX
39
- depth = INITIAL_DEPTH
40
- while (width = width[codepoint / depth]).instance_of? Array
41
- codepoint %= depth
42
- depth /= 16
43
- end
68
+ if !string.ascii_only?
69
+ return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
70
+ width_no_overwrite(string, index_full, index_low, first_ambiguous)
44
71
  end
72
+ end
45
73
 
46
- width == :A ? ambiguous : (width || 1)
47
- }
74
+ width_ascii(string)
75
+ end
76
+
77
+ def self.width_ascii(string)
78
+ # Optimization for ASCII-only strings without certain control symbols
79
+ if string.match?(ASCII_NON_ZERO_REGEX)
80
+ res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE)
81
+ return res < 0 ? 0 : res
82
+ end
83
+
84
+ # Pure ASCII
85
+ string.size
86
+ end
87
+
88
+ def self.width_frame(string, options)
89
+ # Retrieve Emoji width
90
+ if !options[:emoji]
91
+ res = 0
92
+ else options[:emoji]
93
+ res, string = emoji_width(
94
+ string,
95
+ options[:emoji],
96
+ )
97
+ end
48
98
 
49
- # Substract emoji error
50
- res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
99
+ # Prepare indexes
100
+ ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
101
+
102
+ # Get general width
103
+ res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name])
51
104
 
52
105
  # Return result + prevent negative lengths
53
106
  res < 0 ? 0 : res
54
107
  end
55
108
 
56
- # Same as .width_no_overwrite - but with applying overwrites for each char
57
- def self.width_all_features(string, ambiguous, overwrite, options)
58
- # Sum of all chars widths
59
- res = string.codepoints.sum{ |codepoint|
60
- next overwrite[codepoint] if overwrite[codepoint]
109
+ def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {})
110
+ res = 0
111
+
112
+ # Make sure we have UTF-8
113
+ string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
114
+
115
+ string.scan(/.{,80}/m){ |batch|
116
+ if batch.ascii_only?
117
+ res += batch.size
118
+ else
119
+ batch.each_codepoint{ |codepoint|
120
+ if codepoint > 15 && codepoint < first_ambiguous
121
+ res += 1
122
+ elsif codepoint < 0x1001
123
+ res += index_low[codepoint] || 1
124
+ else
125
+ d = INITIAL_DEPTH
126
+ w = index_full[codepoint / d]
127
+ while w.instance_of? Array
128
+ w = w[(codepoint %= d) / (d /= 16)]
129
+ end
130
+
131
+ res += w || 1
132
+ end
133
+ }
134
+ end
135
+ }
61
136
 
62
- if codepoint > 15 && codepoint < 161 # very common
63
- next 1
137
+ res
138
+ end
139
+
140
+ # Same as .width_no_overwrite - but with applying overwrites for each char
141
+ def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite)
142
+ res = 0
143
+
144
+ string.each_codepoint{ |codepoint|
145
+ if overwrite[codepoint]
146
+ res += overwrite[codepoint]
147
+ elsif codepoint > 15 && codepoint < first_ambiguous
148
+ res += 1
64
149
  elsif codepoint < 0x1001
65
- width = FIRST_4096[codepoint]
150
+ res += index_low[codepoint] || 1
66
151
  else
67
- width = INDEX
68
- depth = INITIAL_DEPTH
69
- while (width = width[codepoint / depth]).instance_of? Array
70
- codepoint %= depth
71
- depth /= 16
152
+ d = INITIAL_DEPTH
153
+ w = index_full[codepoint / d]
154
+ while w.instance_of? Array
155
+ w = w[(codepoint %= d) / (d /= 16)]
72
156
  end
73
- end
74
157
 
75
- width == :A ? ambiguous : (width || 1)
158
+ res += w || 1
159
+ end
76
160
  }
77
161
 
78
- # Substract emoji error
79
- res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
80
-
81
- # Return result + prevent negative lengths
82
- res < 0 ? 0 : res
162
+ res
83
163
  end
84
164
 
85
165
 
86
- def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
87
- require "unicode/emoji"
166
+ def self.emoji_width(string, sequences = :rgi_fqe)
167
+ res = 0
168
+
169
+ if regex = EMOJI_SEQUENCES_REGEX_MAPPING[sequences]
170
+ emoji_sequence_regex = Unicode::Emoji.const_get(regex)
171
+ else # sequences == :basic
172
+ emoji_sequence_regex = nil
173
+ end
174
+
175
+ # Make sure we have UTF-8
176
+ string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
88
177
 
89
- extra_width = 0
90
- modifier_regex = /[#{ Unicode::Emoji::EMOJI_MODIFIERS.pack("U*") }]/
91
- zwj_regex = /(?<=#{ [Unicode::Emoji::ZWJ].pack("U") })./
178
+ if emoji_sequence_regex
179
+ # For each string possibly an emoji
180
+ no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
181
+ # Skip notorious false positives
182
+ if EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
183
+ emoji_candidate
92
184
 
93
- string.scan(Unicode::Emoji::REGEX){ |emoji|
94
- extra_width += 2 * emoji.scan(modifier_regex).size
185
+ # Check if we have a combined Emoji with width 2
186
+ elsif emoji_candidate == emoji_candidate[emoji_sequence_regex]
187
+ res += 2
188
+ ""
95
189
 
96
- emoji.scan(zwj_regex){ |zwj_succ|
97
- extra_width += self.of(zwj_succ, ambiguous, overwrite)
190
+ # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
191
+ else
192
+ # Ensure all explicit VS16 sequences have width 2
193
+ emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
194
+ if basic_emoji.size == 2 # VS16 present
195
+ res += 2
196
+ ""
197
+ else
198
+ basic_emoji
199
+ end
200
+ }
201
+
202
+ emoji_candidate
203
+ end
98
204
  }
99
- }
205
+ else
206
+ # Only consider basic emoji
100
207
 
101
- extra_width
208
+ # Ensure all explicit VS16 sequences have width 2
209
+ no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
210
+ if basic_emoji.size >= 2 # VS16 present
211
+ res += 2
212
+ ""
213
+ else
214
+ basic_emoji
215
+ end
216
+ }
217
+ end
218
+
219
+ [res, no_emoji_string]
102
220
  end
103
221
 
104
- def initialize(ambiguous: 1, overwrite: {}, emoji: false)
222
+ def initialize(ambiguous: 1, overwrite: {}, emoji: true)
105
223
  @ambiguous = ambiguous
106
224
  @overwrite = overwrite
107
225
  @emoji = emoji
108
226
  end
109
227
 
110
228
  def get_config(**kwargs)
111
- [
112
- kwargs[:ambiguous] || @ambiguous,
113
- kwargs[:overwrite] || @overwrite,
114
- { emoji: kwargs[:emoji] || @emoji },
115
- ]
229
+ {
230
+ ambiguous: kwargs[:ambiguous] || @ambiguous,
231
+ overwrite: kwargs[:overwrite] || @overwrite,
232
+ emoji: kwargs[:emoji] || @emoji,
233
+ }
116
234
  end
117
235
 
118
236
  def of(string, **kwargs)
119
- self.class.of(string, *get_config(**kwargs))
237
+ self.class.of(string, **get_config(**kwargs))
120
238
  end
121
239
  end
122
240
  end
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.6.0
4
+ version: 3.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-09-13 00:00:00.000000000 Z
11
+ date: 2024-11-13 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode-emoji
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '4.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '4.0'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: rspec
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -39,7 +53,8 @@ dependencies:
39
53
  - !ruby/object:Gem::Version
40
54
  version: '13.0'
41
55
  description: "[Unicode 16.0.0] Determines the monospace display width of a string
42
- using EastAsianWidth.txt, Unicode general category, and other data."
56
+ using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
57
+ data."
43
58
  email:
44
59
  - hi@ruby.consulting
45
60
  executables: []
@@ -55,8 +70,10 @@ files:
55
70
  - data/display_width.marshal.gz
56
71
  - lib/unicode/display_width.rb
57
72
  - lib/unicode/display_width/constants.rb
73
+ - lib/unicode/display_width/emoji_support.rb
58
74
  - lib/unicode/display_width/index.rb
59
75
  - lib/unicode/display_width/no_string_ext.rb
76
+ - lib/unicode/display_width/reline_ext.rb
60
77
  - lib/unicode/display_width/string_ext.rb
61
78
  homepage: https://github.com/janlelis/unicode-display_width
62
79
  licenses:
@@ -74,14 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
74
91
  requirements:
75
92
  - - ">="
76
93
  - !ruby/object:Gem::Version
77
- version: 2.4.0
94
+ version: 2.5.0
78
95
  required_rubygems_version: !ruby/object:Gem::Requirement
79
96
  requirements:
80
97
  - - ">="
81
98
  - !ruby/object:Gem::Version
82
99
  version: '0'
83
100
  requirements: []
84
- rubygems_version: 3.5.9
101
+ rubygems_version: 3.5.21
85
102
  signing_key:
86
103
  specification_version: 4
87
104
  summary: Determines the monospace display width of a string in Ruby.