unicode-display_width 2.6.0 → 3.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb663fb7dd6d3409dd3b21bd4a793d954ad8fd9b974593868292b5ec59ba7c01
4
- data.tar.gz: e960ab9c24135cb1d7872e84c4e3d7b24f83a5ae85b12a14af27312a90241597
3
+ metadata.gz: 9dadf5b8337ac74b8e5af2a6cd481708c39050506440ccf5d30c6cdc9eb5ade3
4
+ data.tar.gz: ae3bb12a0fabe7a53a1b533909f42a86f1fc7acecc33c9b57218ed3607f71d3c
5
5
  SHA512:
6
- metadata.gz: bd4fb14101159588eec1c2bf6871a94e297e6314317ee10ce320b09fc30e01f35d8610cdd3dfe32edb979f0ece6053914a23298d6b46732f22d16f642576aacf
7
- data.tar.gz: b02c66363a1740303715e30b8023e0cc2de99baf78f183ec2ad48b18ff26ac95da4e496f604aa4bda1dcf691ad4fdff644b9e1d7c41922ba078b6c35653debc6
6
+ metadata.gz: 414227480c3ae2ca0afcee225bb68b6506ac2f7dac630422db87c0a6d28d9d921bcc4645146ffa0a75ace5eb338dc71df2f051aeb0732824dbf1c7bb3117ba22
7
+ data.tar.gz: 1002b3752d47df6f3d416d378148af38ff7c92698a118aded36f55e3b6bbdc99a29552fdee792636d5edea1ce2104266b1dcf9198631a7669334a48fa425af29
data/CHANGELOG.md CHANGED
@@ -1,5 +1,78 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 3.1.4
4
+
5
+ - Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence
6
+ context (= single emoji char + modifier) #29
7
+ - Add more docs and specs about modifier handling
8
+
9
+ ## 3.1.3
10
+
11
+ Better handling of non-UTF-8 strings, patch by @Earlopain:
12
+
13
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
14
+ - Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
15
+
16
+ ## 3.1.2
17
+
18
+ - Performance improvements
19
+
20
+ ## 3.1.1
21
+
22
+ - Performance improvements
23
+
24
+ ## 3.1.0
25
+
26
+ **Improve Emoji support:**
27
+
28
+ - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
29
+ ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
30
+ to implement.
31
+ - Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
32
+ the former `:rgi_uqe` option). Most terminals that want to support the RGI set
33
+ will probably want to catch Emoji sequences with missing VS16s.
34
+ - Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
35
+ that needs these quirks
36
+ - Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
37
+ - `:auto` mode: Only consider terminal cells when recommending Emoji support level
38
+ (Emoji themselves might display differently)
39
+ - `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
40
+ - Rename `:basic` mode to `:vs16`
41
+
42
+ ## 3.0.1
43
+
44
+
45
+ - Add WezTerm and foot as good Emoji terminals
46
+
47
+ ## 3.0.0
48
+
49
+ **Rework Emoji support:**
50
+
51
+ - Emoji widths are now enabled by default
52
+ - Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
53
+ - VS16 turns Emoji characters of width 1 into full-width
54
+ - Please note that Emoji parsing has a notable impact on performance.
55
+ You can use the `emoji: false` option to disable Emoji adjustments
56
+ - Tries to detect terminal's Emoji support level automatically (from ENV vars)
57
+
58
+ **Index fixes and updates:**
59
+
60
+ - Private-use characters are considered ambiguous (were given width 1 before)
61
+ - Fix that a few zero-width ignorable codepoints from recent Unicode were missing
62
+ - Consider the following separators to be zero-width:
63
+ - U+2028 - LINE SEPARATOR - Zl
64
+ - U+2029 - PARAGRAPH SEPARATOR - Zp
65
+
66
+ **Other:**
67
+
68
+ - Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
69
+ with overwrite values as third parameter, be sure to put it in curly braces.
70
+ - Using third parameter or explicit hash as fourth parameter is deprecated,
71
+ please migrate to the keyword arguments API
72
+ - Gem raises `ArgumentError` for ambiguous values other than 1 or 2
73
+ - Performance optimizations
74
+ - Require Ruby 2.5
75
+
3
76
  ## 2.6.0
4
77
 
5
78
  - Unicode 16
@@ -40,8 +113,26 @@ More performance improvements:
40
113
 
41
114
  ## 2.0.0
42
115
 
43
- - Release 2.0.0
44
- - Supports Ruby 3.0
116
+ Add Support for Ruby 3.0
117
+
118
+ ### Breaking Changes
119
+
120
+ Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
121
+
122
+ - Aliases of display\_width (…\_size, …\_length) have been removed
123
+ - Auto-loading of string core extension has been removed:
124
+
125
+ If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
126
+
127
+ ```ruby
128
+ require "unicode/display_width/string_ext"
129
+ ```
130
+
131
+ You could also change your `Gemfile` line to achieve this:
132
+
133
+ ```ruby
134
+ gem "unicode-display_width", require: "unicode/display_width/string_ext"
135
+ ```
45
136
 
46
137
  ## 2.0.0.pre2
47
138
 
data/README.md CHANGED
@@ -1,39 +1,22 @@
1
- ## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
1
+ # Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
2
2
 
3
- Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
3
+ Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
4
4
 
5
5
  Unicode version: **16.0.0** (September 2024)
6
6
 
7
- Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
7
+ ## Gem Version 3 Improved Emoji Support
8
8
 
9
- Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**
9
+ **Emoji support is now enabled by default.** See below for description and configuration possibilities.
10
10
 
11
- For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
11
+ **Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
12
12
 
13
- ## Version 2.4.2 Performance Updates
13
+ See [CHANGELOG](/CHANGELOG.md) for details.
14
14
 
15
- **If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
16
-
17
- This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
18
-
19
- ## Version 2.0 — Breaking Changes
20
-
21
- Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
15
+ ## Gem Version 2.4.2 Performance Updates
22
16
 
23
- - Aliases of display_width (…\_size, …\_length) have been removed
24
- - Auto-loading of string core extension has been removed:
25
-
26
- If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
27
-
28
- ```ruby
29
- require "unicode/display_width/string_ext"
30
- ```
31
-
32
- You could also change your `Gemfile` line to achieve this:
17
+ **If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
33
18
 
34
- ```ruby
35
- gem "unicode-display_width", require: "unicode/display_width/string_ext"
36
- ```
19
+ This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
37
20
 
38
21
  ## Introduction to Character Widths
39
22
 
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
45
28
 
46
29
  Width | Characters | Comment
47
30
  -------|------------------------------|--------------------------------------------------
48
- X | (user defined) | Overwrites any other values
31
+ ? | (user defined) | Overwrites any other values
32
+ ? | Emoji | See "How this Library Handles Emoji Width" below
49
33
  -1 | `"\b"` | Backspace (total width never below 0)
50
34
  0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
51
35
  1 | `"\u{00AD}"` | SOFT HYPHEN
52
36
  2 | `"\u{2E3A}"` | TWO-EM DASH
53
37
  3 | `"\u{2E3B}"` | THREE-EM DASH
54
- 0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
55
- 0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
56
- 0 | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
38
+ 0 | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
39
+ 0 | Derived Property: Default_Ignorable_Code_Point | Ignorable ranges
40
+ 0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
57
41
  2 | East Asian Width: F, W | Full-width characters
58
42
  2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
59
43
  1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1
@@ -71,8 +55,6 @@ Or add to your Gemfile:
71
55
 
72
56
  ## Usage
73
57
 
74
- ### Classic API
75
-
76
58
  ```ruby
77
59
  require 'unicode/display_width'
78
60
 
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
80
62
  Unicode::DisplayWidth.of("一") # => 2
81
63
  ```
82
64
 
83
- #### Ambiguous Characters
65
+ ### Ambiguous Characters
84
66
 
85
67
  The second parameter defines the value returned by characters defined as ambiguous:
86
68
 
@@ -89,34 +71,75 @@ Unicode::DisplayWidth.of("·", 1) # => 1
89
71
  Unicode::DisplayWidth.of("·", 2) # => 2
90
72
  ```
91
73
 
92
- #### Custom Overwrites
74
+ ### Encoding Notes
75
+
76
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
77
+ - Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
78
+
79
+ ### Custom Overwrites
93
80
 
94
- You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:
81
+ You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
95
82
 
96
83
  ```ruby
97
- Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
84
+ Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
98
85
  ```
99
86
 
100
87
  Please note that using overwrites disables some perfomance optimizations of this gem.
101
88
 
89
+ ### Emoji
102
90
 
103
- #### Emoji Support
104
-
105
- Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
91
+ If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
106
92
 
107
93
  ```ruby
108
- gem 'unicode-display_width'
109
- gem 'unicode-emoji'
94
+ Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: :all # => 2
95
+ Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
110
96
  ```
111
97
 
112
- Enable the emoji string width adjustments by passing `emoji: true` as fourth parameter:
98
+ #### How this Library Handles Emoji Width
113
99
 
114
- ```ruby
115
- Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
116
- Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2
117
- ```
100
+ There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
101
+
102
+ Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
103
+
104
+ Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property.
105
+
106
+ Emoji Type | Width / Comment
107
+ ------------|----------------
108
+ Basic/Single Emoji character without Variation Selector | No special handling
109
+ Basic/Single Emoji character with VS15 (Text) | No special handling
110
+ Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
111
+ Single Emoji character with Skin Tone Modifier | 2
112
+ Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at`
113
+ Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
114
+
115
+ #### Emoji Modes
116
+
117
+ The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
118
+
119
+ `emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
120
+ ----------------|------------------|---------------------------------|------------------
121
+ `true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | -
122
+ `:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
123
+ `:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
124
+ `:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ?
125
+ `:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
126
+ `:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal
127
+ `:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
128
+ `false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals
129
+
130
+ - *EAW:* East Asian Width
131
+ - *RGI Emoji:* Emoji Recommended for General Interchange
132
+ - *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences
133
+
134
+ #### Emoji Support in Terminals
135
+
136
+ Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
137
+
138
+ Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
139
+
140
+ **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
118
141
 
119
- #### Usage with String Extension
142
+ ### Usage with String Extension
120
143
 
121
144
  ```ruby
122
145
  require 'unicode/display_width/string_ext'
@@ -125,9 +148,9 @@ require 'unicode/display_width/string_ext'
125
148
  '一'.display_width # => 2
126
149
  ```
127
150
 
128
- ### Modern API: Keyword-arguments Based Config Object
151
+ ### Usage with Config Object
129
152
 
130
- Version 2.0 introduces a keyword-argument based API, which allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
153
+ You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
131
154
 
132
155
  ```ruby
133
156
  require 'unicode/display_width'
@@ -135,15 +158,15 @@ require 'unicode/display_width'
135
158
  display_width = Unicode::DisplayWidth.new(
136
159
  # ambiguous: 1,
137
160
  overwrite: { "A".ord => 100 },
138
- emoji: true,
161
+ emoji: :all,
139
162
  )
140
163
 
141
164
  display_width.of "⚀" # => 1
142
- display_width.of "🤾🏽‍♀️" # => 2
165
+ display_width.of "🤠‍🤢" # => 2
143
166
  display_width.of "A" # => 100
144
167
  ```
145
168
 
146
- ### Usage From the CLI
169
+ ### Usage from the Command-Line
147
170
 
148
171
  Use this one-liner to print out display widths for strings from the command-line:
149
172
 
Binary file
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "2.6.0"
5
+ VERSION = "3.1.4"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
@@ -0,0 +1,52 @@
1
+ # require "rbconfig"
2
+ # RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
3
+
4
+ module Unicode
5
+ class DisplayWidth
6
+ module EmojiSupport
7
+ # Tries to find out which terminal emulator is used to
8
+ # set emoji: config to best suiting value
9
+ #
10
+ # Please also see section in README.md and
11
+ # misc/terminal-emoji-width.rb
12
+ #
13
+ # Please note: Many terminals do not set any ENV vars,
14
+ # maybe CSI queries can help?
15
+ def self.recommended
16
+ if ENV["CI"]
17
+ return :rqi
18
+ end
19
+
20
+ case ENV["TERM_PROGRAM"]
21
+ when "iTerm.app"
22
+ return :all
23
+ when "Apple_Terminal"
24
+ return :rgi_at
25
+ when "WezTerm"
26
+ return :all_no_vs16
27
+ end
28
+
29
+ case ENV["TERM"]
30
+ when "contour","foot"
31
+ # konsole: all, how to detect?
32
+ return :all
33
+ when /kitty/
34
+ return :vs16
35
+ end
36
+
37
+ if ENV["WT_SESSION"] # Windows Terminal
38
+ return :vs16
39
+ end
40
+
41
+ # As of last time checked: gnome-terminal, vscode, alacritty
42
+ :none
43
+ end
44
+
45
+ # Maybe: Implement something like https://github.com/jquast/ucs-detect
46
+ # which uses the terminal cursor to check for best support level
47
+ # at runtime
48
+ # def self.detect!
49
+ # end
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,14 @@
1
+ # Experimental
2
+ # Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
3
+
4
+ require "reline"
5
+ require "reline/unicode"
6
+
7
+ require_relative "../display_width"
8
+
9
+ class Reline::Unicode
10
+ def self.get_mbchar_width(mbchar)
11
+ Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
12
+ end
13
+ end
14
+
@@ -1,9 +1,9 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require_relative "../display_width" unless defined? Unicode::DisplayWidth
3
+ require_relative "../display_width"
4
4
 
5
5
  class String
6
- def display_width(ambiguous = 1, overwrite = {}, options = {})
7
- Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
6
+ def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
7
+ Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
8
8
  end
9
9
  end
@@ -1,123 +1,247 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "unicode/emoji"
4
+
3
5
  require_relative "display_width/constants"
4
6
  require_relative "display_width/index"
7
+ require_relative "display_width/emoji_support"
5
8
 
6
9
  module Unicode
7
10
  class DisplayWidth
11
+ DEFAULT_AMBIGUOUS = 1
8
12
  INITIAL_DEPTH = 0x10000
9
- ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
10
- FIRST_4096 = decompress_index(INDEX[0][0], 1)
11
-
12
- def self.of(string, ambiguous = 1, overwrite = {}, options = {})
13
- if overwrite.empty?
14
- # Optimization for ASCII-only strings without certain control symbols
15
- if string.ascii_only?
16
- if string.match?(ASCII_NON_ZERO_REGEX)
17
- res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
18
- res < 0 ? 0 : res
19
- else
20
- string.size
21
- end
22
- else
23
- width_no_overwrite(string, ambiguous, options)
13
+ ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/
14
+ ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F"
15
+ ASCII_BACKSPACE = "\b"
16
+ AMBIGUOUS_MAP = {
17
+ 1 => :WIDTH_ONE,
18
+ 2 => :WIDTH_TWO,
19
+ }
20
+ FIRST_AMBIGUOUS = {
21
+ WIDTH_ONE: 768,
22
+ WIDTH_TWO: 161,
23
+ }
24
+ NOT_COMMON_NARROW_REGEX = {
25
+ WIDTH_ONE: /[^\u{10}-\u{2FF}]/m,
26
+ WIDTH_TWO: /[^\u{10}-\u{A1}]/m,
27
+ }
28
+ FIRST_4096 = {
29
+ WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
30
+ WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
31
+ }
32
+ EMOJI_SEQUENCES_REGEX_MAPPING = {
33
+ rgi: :REGEX_INCLUDE_MQE_UQE,
34
+ rgi_at: :REGEX_INCLUDE_MQE_UQE,
35
+ possible: :REGEX_WELL_FORMED,
36
+ }
37
+ REGEX_EMOJI_VS16 = Regexp.union(
38
+ Regexp.compile(
39
+ Unicode::Emoji::REGEX_TEXT_PRESENTATION.source +
40
+ "(?<![#*0-9])" +
41
+ "\u{FE0F}"
42
+ ),
43
+ Unicode::Emoji::REGEX_EMOJI_KEYCAP
44
+ )
45
+
46
+ # ebase = Unicode::Emoji::REGEX_PROP_MODIFIER_BASE.source
47
+ REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+|.[\u{1F3FB}-\u{1F3FF}]/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
48
+ REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16)
49
+
50
+ # Returns monospace display width of string
51
+ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
52
+ # Binary strings don't make much sense when calculating display width.
53
+ # Assume it's valid UTF-8
54
+ if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
55
+ # Didn't work out, go back to binary
56
+ string.force_encoding(Encoding::BINARY)
57
+ end
58
+
59
+ string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
60
+ options = normalize_options(string, ambiguous, overwrite, old_options, **options)
61
+
62
+ width = 0
63
+
64
+ unless options[:overwrite].empty?
65
+ width, string = width_custom(string, options[:overwrite])
66
+ end
67
+
68
+ if string.ascii_only?
69
+ return width + width_ascii(string)
70
+ end
71
+
72
+ ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
73
+
74
+ unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
75
+ return width + string.size
76
+ end
77
+
78
+ # Retrieve Emoji width
79
+ if options[:emoji] != :none
80
+ e_width, string = emoji_width(
81
+ string,
82
+ options[:emoji],
83
+ options[:ambiguous],
84
+ )
85
+ width += e_width
86
+
87
+ unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
88
+ return width + string.size
24
89
  end
25
- else
26
- width_all_features(string, ambiguous, overwrite, options)
27
90
  end
28
- end
29
91
 
30
- def self.width_no_overwrite(string, ambiguous, options = {})
31
- # Sum of all chars widths
32
- res = string.codepoints.sum{ |codepoint|
33
- if codepoint > 15 && codepoint < 161 # very common
34
- next 1
92
+ index_full = INDEX[ambiguous_index_name]
93
+ index_low = FIRST_4096[ambiguous_index_name]
94
+ first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name]
95
+
96
+ string.each_codepoint{ |codepoint|
97
+ if codepoint > 15 && codepoint < first_ambiguous
98
+ width += 1
35
99
  elsif codepoint < 0x1001
36
- width = FIRST_4096[codepoint]
100
+ width += index_low[codepoint] || 1
37
101
  else
38
- width = INDEX
39
- depth = INITIAL_DEPTH
40
- while (width = width[codepoint / depth]).instance_of? Array
41
- codepoint %= depth
42
- depth /= 16
102
+ d = INITIAL_DEPTH
103
+ w = index_full[codepoint / d]
104
+ while w.instance_of? Array
105
+ w = w[(codepoint %= d) / (d /= 16)]
43
106
  end
44
- end
45
107
 
46
- width == :A ? ambiguous : (width || 1)
108
+ width += w || 1
109
+ end
47
110
  }
48
111
 
49
- # Substract emoji error
50
- res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
51
-
52
112
  # Return result + prevent negative lengths
53
- res < 0 ? 0 : res
113
+ width < 0 ? 0 : width
54
114
  end
55
115
 
56
- # Same as .width_no_overwrite - but with applying overwrites for each char
57
- def self.width_all_features(string, ambiguous, overwrite, options)
58
- # Sum of all chars widths
59
- res = string.codepoints.sum{ |codepoint|
60
- next overwrite[codepoint] if overwrite[codepoint]
116
+ # Returns width of custom overwrites and remaining string
117
+ def self.width_custom(string, overwrite)
118
+ width = 0
61
119
 
62
- if codepoint > 15 && codepoint < 161 # very common
63
- next 1
64
- elsif codepoint < 0x1001
65
- width = FIRST_4096[codepoint]
120
+ string = string.each_codepoint.select{ |codepoint|
121
+ if overwrite[codepoint]
122
+ width += overwrite[codepoint]
123
+ nil
66
124
  else
67
- width = INDEX
68
- depth = INITIAL_DEPTH
69
- while (width = width[codepoint / depth]).instance_of? Array
70
- codepoint %= depth
71
- depth /= 16
72
- end
125
+ codepoint
73
126
  end
127
+ }.pack("U*")
74
128
 
75
- width == :A ? ambiguous : (width || 1)
76
- }
129
+ [width, string]
130
+ end
77
131
 
78
- # Substract emoji error
79
- res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
132
+ # Returns width for ASCII-only strings. Will consider zero-width control symbols.
133
+ def self.width_ascii(string)
134
+ if string.match?(ASCII_NON_ZERO_REGEX)
135
+ res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE)
136
+ return res < 0 ? 0 : res
137
+ end
80
138
 
81
- # Return result + prevent negative lengths
82
- res < 0 ? 0 : res
139
+ string.bytesize
83
140
  end
84
141
 
142
+ # Returns width of all considered Emoji and remaining string
143
+ def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
144
+ res = 0
85
145
 
86
- def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
87
- require "unicode/emoji"
146
+ if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
147
+ emoji_width_via_possible(
148
+ string,
149
+ Unicode::Emoji.const_get(emoji_set_regex),
150
+ mode == :rgi_at,
151
+ ambiguous,
152
+ )
88
153
 
89
- extra_width = 0
90
- modifier_regex = /[#{ Unicode::Emoji::EMOJI_MODIFIERS.pack("U*") }]/
91
- zwj_regex = /(?<=#{ [Unicode::Emoji::ZWJ].pack("U") })./
154
+ elsif mode == :all_no_vs16
155
+ no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ res += 2; "" }
156
+ [res, no_emoji_string]
92
157
 
93
- string.scan(Unicode::Emoji::REGEX){ |emoji|
94
- extra_width += 2 * emoji.scan(modifier_regex).size
158
+ elsif mode == :vs16
159
+ no_emoji_string = string.gsub(REGEX_EMOJI_VS16){ res += 2; "" }
160
+ [res, no_emoji_string]
95
161
 
96
- emoji.scan(zwj_regex){ |zwj_succ|
97
- extra_width += self.of(zwj_succ, ambiguous, overwrite)
98
- }
162
+ elsif mode == :all
163
+ no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ res += 2; "" }
164
+ [res, no_emoji_string]
165
+
166
+ else
167
+ [0, string]
168
+
169
+ end
170
+ end
171
+
172
+ # Match possible Emoji first, then refine
173
+ def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS)
174
+ res = 0
175
+
176
+ # For each string possibly an emoji
177
+ no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate|
178
+ # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
179
+ if emoji_candidate == emoji_candidate[emoji_set_regex]
180
+ if strict_eaw
181
+ res += self.of(emoji_candidate[0], ambiguous, emoji: false)
182
+ else
183
+ res += 2
184
+ end
185
+ ""
186
+
187
+ # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
188
+ else
189
+ if !strict_eaw
190
+ # Ensure all explicit VS16 sequences have width 2
191
+ emoji_candidate.gsub!(REGEX_EMOJI_VS16){ res += 2; "" }
192
+ end
193
+
194
+ emoji_candidate
195
+ end
99
196
  }
100
197
 
101
- extra_width
198
+ [res, no_emoji_string]
102
199
  end
103
200
 
104
- def initialize(ambiguous: 1, overwrite: {}, emoji: false)
201
+ def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
202
+ unless old_options.empty?
203
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
204
+ options.merge! old_options
205
+ end
206
+
207
+ options[:ambiguous] = ambiguous if ambiguous
208
+ options[:ambiguous] ||= DEFAULT_AMBIGUOUS
209
+
210
+ if options[:ambiguous] != 1 && options[:ambiguous] != 2
211
+ raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
212
+ end
213
+
214
+ if overwrite && !overwrite.empty?
215
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
216
+ options[:overwrite] = overwrite
217
+ end
218
+ options[:overwrite] ||= {}
219
+
220
+ if [nil, true, :auto].include?(options[:emoji])
221
+ options[:emoji] = EmojiSupport.recommended
222
+ elsif options[:emoji] == false
223
+ options[:emoji] = :none
224
+ end
225
+
226
+ options
227
+ end
228
+
229
+ def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
105
230
  @ambiguous = ambiguous
106
231
  @overwrite = overwrite
107
232
  @emoji = emoji
108
233
  end
109
234
 
110
235
  def get_config(**kwargs)
111
- [
112
- kwargs[:ambiguous] || @ambiguous,
113
- kwargs[:overwrite] || @overwrite,
114
- { emoji: kwargs[:emoji] || @emoji },
115
- ]
236
+ {
237
+ ambiguous: kwargs[:ambiguous] || @ambiguous,
238
+ overwrite: kwargs[:overwrite] || @overwrite,
239
+ emoji: kwargs[:emoji] || @emoji,
240
+ }
116
241
  end
117
242
 
118
243
  def of(string, **kwargs)
119
- self.class.of(string, *get_config(**kwargs))
244
+ self.class.of(string, **get_config(**kwargs))
120
245
  end
121
246
  end
122
247
  end
123
-
metadata CHANGED
@@ -1,15 +1,35 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.6.0
4
+ version: 3.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-09-13 00:00:00.000000000 Z
11
+ date: 2025-01-13 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode-emoji
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '4.0'
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 4.0.4
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - "~>"
28
+ - !ruby/object:Gem::Version
29
+ version: '4.0'
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: 4.0.4
13
33
  - !ruby/object:Gem::Dependency
14
34
  name: rspec
15
35
  requirement: !ruby/object:Gem::Requirement
@@ -39,7 +59,8 @@ dependencies:
39
59
  - !ruby/object:Gem::Version
40
60
  version: '13.0'
41
61
  description: "[Unicode 16.0.0] Determines the monospace display width of a string
42
- using EastAsianWidth.txt, Unicode general category, and other data."
62
+ using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
63
+ data."
43
64
  email:
44
65
  - hi@ruby.consulting
45
66
  executables: []
@@ -55,8 +76,10 @@ files:
55
76
  - data/display_width.marshal.gz
56
77
  - lib/unicode/display_width.rb
57
78
  - lib/unicode/display_width/constants.rb
79
+ - lib/unicode/display_width/emoji_support.rb
58
80
  - lib/unicode/display_width/index.rb
59
81
  - lib/unicode/display_width/no_string_ext.rb
82
+ - lib/unicode/display_width/reline_ext.rb
60
83
  - lib/unicode/display_width/string_ext.rb
61
84
  homepage: https://github.com/janlelis/unicode-display_width
62
85
  licenses:
@@ -74,14 +97,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
74
97
  requirements:
75
98
  - - ">="
76
99
  - !ruby/object:Gem::Version
77
- version: 2.4.0
100
+ version: 2.5.0
78
101
  required_rubygems_version: !ruby/object:Gem::Requirement
79
102
  requirements:
80
103
  - - ">="
81
104
  - !ruby/object:Gem::Version
82
105
  version: '0'
83
106
  requirements: []
84
- rubygems_version: 3.5.9
107
+ rubygems_version: 3.5.21
85
108
  signing_key:
86
109
  specification_version: 4
87
110
  summary: Determines the monospace display width of a string in Ruby.