unicode-display_width 3.0.0 → 3.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -0
- data/README.md +27 -32
- data/lib/unicode/display_width/constants.rb +1 -1
- data/lib/unicode/display_width/emoji_support.rb +19 -8
- data/lib/unicode/display_width.rb +82 -43
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 01657362aaf60cf79bb03c63bb96e01914139c7bb965dc9bed18e7988b8c6709
|
4
|
+
data.tar.gz: 297cc1ab03e72a02e9f33eb4eec2dea2006f23987818083c4bc12aa168e437c3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a3878d504a273e44268762fca4857bf26a9322e0e54c0afc437d953dca675822262c9aec54cb5de3d23390b4b778403b36ce0f73ba5b0f1d2c8554a1f796d210
|
7
|
+
data.tar.gz: 00de0d22f3b245f16de15b3b4864ff754da04ea94eafdeaf06c0e38fec8cfb2559fbeaafc17f165534e70a386e154f70d7b071f5a226c9f64d7088bbb408cabb
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,27 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 3.1.0
|
4
|
+
|
5
|
+
**Improve Emoji support:**
|
6
|
+
|
7
|
+
- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
|
8
|
+
ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
|
9
|
+
to implement.
|
10
|
+
- Unify `rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
|
11
|
+
the former `:rgi_uqe` option). Most terminals that want to support the RGI set
|
12
|
+
will probably want to catch Emoji sequences with missing VS16s.
|
13
|
+
- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
|
14
|
+
that needs these quirks
|
15
|
+
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
|
16
|
+
- `:auto` mode: Only consider terminal cells when recommending Emoji support level
|
17
|
+
(Emoji themselves might display differently)
|
18
|
+
- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
|
19
|
+
- Rename `:basic` mode to `:vs16`
|
20
|
+
|
21
|
+
## 3.0.1
|
22
|
+
|
23
|
+
- Add WezTerm and foot as good Emoji terminals
|
24
|
+
|
3
25
|
## 3.0.0
|
4
26
|
|
5
27
|
**Rework Emoji support:**
|
data/README.md
CHANGED
@@ -4,7 +4,7 @@ Determines the monospace display width of a string in Ruby, which is useful for
|
|
4
4
|
|
5
5
|
Unicode version: **16.0.0** (September 2024)
|
6
6
|
|
7
|
-
## Gem Version 3
|
7
|
+
## Gem Version 3 — Improved Emoji Support
|
8
8
|
|
9
9
|
**Emoji support is now enabled by default.** See below for description and configuration possibilities.
|
10
10
|
|
@@ -81,58 +81,53 @@ Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB cou
|
|
81
81
|
|
82
82
|
Please note that using overwrites disables some perfomance optimizations of this gem.
|
83
83
|
|
84
|
-
### Emoji
|
84
|
+
### Emoji
|
85
85
|
|
86
|
-
|
86
|
+
If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
|
87
87
|
|
88
88
|
```ruby
|
89
|
-
Unicode::DisplayWidth.of "🤾🏽♀️" # => 2
|
89
|
+
Unicode::DisplayWidth.of "🤾🏽♀️", emoji: :all # => 2
|
90
90
|
Unicode::DisplayWidth.of "🤾🏽♀️", emoji: false # => 5
|
91
91
|
```
|
92
92
|
|
93
|
-
Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:
|
94
|
-
|
95
|
-
```ruby
|
96
|
-
Unicode::DisplayWidth.of "🐻❄", emoji: :rgi_mqe # => 3
|
97
|
-
Unicode::DisplayWidth.of "🐻❄", emoji: :rgi_uqe # => 2
|
98
|
-
```
|
99
|
-
|
100
93
|
#### How this Library Handles Emoji Width
|
101
94
|
|
102
95
|
There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
|
103
96
|
|
97
|
+
Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
|
98
|
+
|
104
99
|
Emoji Type | Width / Comment
|
105
100
|
------------|----------------
|
106
|
-
Basic/Single Emoji character without Variation Selector | No special handling
|
107
|
-
Basic/Single Emoji character with VS15 (Text) | No special handling
|
108
|
-
Basic/Single Emoji character with VS16 (Emoji) | 2
|
109
|
-
Emoji Sequence | 2
|
110
|
-
|
111
|
-
The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
|
101
|
+
Basic/Single Emoji character without Variation Selector | No special handling
|
102
|
+
Basic/Single Emoji character with VS15 (Text) | No special handling
|
103
|
+
Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
|
104
|
+
Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
|
112
105
|
|
113
|
-
|
114
|
-
-------|-------------
|
115
|
-
`emoji: true` | Use recommended Emoji set on your platform, see section below
|
116
|
-
`emoji: :basic` | No width adjustments for Emoji sequences: all partial Emoji treated separately
|
117
|
-
`emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
|
118
|
-
`emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
|
119
|
-
`emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
|
120
|
-
`emoji: :all` | All possible/well-formed Emoji sequences are considered to have a width of 2
|
121
|
-
`emoji: false` | No Emoji adjustments, Emoji characters with VS16 not handled
|
106
|
+
#### Emoji Modes
|
122
107
|
|
123
|
-
|
108
|
+
The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
|
124
109
|
|
125
|
-
|
110
|
+
`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
|
111
|
+
----------------|------------------|---------------------------------|------------------
|
112
|
+
`true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | -
|
113
|
+
`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
|
114
|
+
`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
|
115
|
+
`:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ?
|
116
|
+
`:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
|
117
|
+
`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have width 1 | Apple Terminal
|
118
|
+
`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
|
119
|
+
`false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals
|
126
120
|
|
127
|
-
|
121
|
+
- *RGI Emoji:* Emoji Recommended for General Interchange
|
122
|
+
- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences
|
128
123
|
|
129
124
|
#### Emoji Support in Terminals
|
130
125
|
|
131
|
-
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji
|
126
|
+
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
|
132
127
|
|
133
|
-
Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
|
128
|
+
Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
|
134
129
|
|
135
|
-
|
130
|
+
**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
|
136
131
|
|
137
132
|
### Usage with String Extension
|
138
133
|
|
@@ -7,28 +7,39 @@ module Unicode
|
|
7
7
|
# Tries to find out which terminal emulator is used to
|
8
8
|
# set emoji: config to best suiting value
|
9
9
|
#
|
10
|
-
# Please
|
10
|
+
# Please also see section in README.md and
|
11
|
+
# misc/terminal-emoji-width.rb
|
12
|
+
#
|
13
|
+
# Please note: Many terminals do not set any ENV vars,
|
14
|
+
# maybe CSI queries can help?
|
11
15
|
def self.recommended
|
12
16
|
if ENV["CI"]
|
13
|
-
return :
|
17
|
+
return :rqi
|
14
18
|
end
|
15
19
|
|
16
20
|
case ENV["TERM_PROGRAM"]
|
17
21
|
when "iTerm.app"
|
18
22
|
return :all
|
19
23
|
when "Apple_Terminal"
|
20
|
-
return :
|
24
|
+
return :rgi_at
|
25
|
+
when "WezTerm"
|
26
|
+
return :all_no_vs16
|
21
27
|
end
|
22
28
|
|
23
29
|
case ENV["TERM"]
|
24
|
-
when "contour"
|
25
|
-
|
30
|
+
when "contour","foot"
|
31
|
+
# konsole: all, how to detect?
|
32
|
+
return :all
|
26
33
|
when /kitty/
|
27
|
-
return :
|
34
|
+
return :vs16
|
35
|
+
end
|
36
|
+
|
37
|
+
if ENV["WT_SESSION"] # Windows Terminal
|
38
|
+
return :vs16
|
28
39
|
end
|
29
40
|
|
30
|
-
# As of last time checked: gnome-terminal, vscode, alacritty
|
31
|
-
:
|
41
|
+
# As of last time checked: gnome-terminal, vscode, alacritty
|
42
|
+
:none
|
32
43
|
end
|
33
44
|
|
34
45
|
# Maybe: Implement something like https://github.com/jquast/ucs-detect
|
@@ -8,6 +8,7 @@ require_relative "display_width/emoji_support"
|
|
8
8
|
|
9
9
|
module Unicode
|
10
10
|
class DisplayWidth
|
11
|
+
DEFAULT_AMBIGUOUS = 1
|
11
12
|
INITIAL_DEPTH = 0x10000
|
12
13
|
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
|
13
14
|
ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
|
@@ -25,13 +26,13 @@ module Unicode
|
|
25
26
|
WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
|
26
27
|
}
|
27
28
|
EMOJI_SEQUENCES_REGEX_MAPPING = {
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
all: :REGEX_WELL_FORMED,
|
29
|
+
rgi: :REGEX_INCLUDE_MQE_UQE,
|
30
|
+
rgi_at: :REGEX_INCLUDE_MQE_UQE,
|
31
|
+
possible: :REGEX_WELL_FORMED,
|
32
32
|
}
|
33
|
-
EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
|
34
33
|
REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
|
34
|
+
REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
|
35
|
+
REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
|
35
36
|
|
36
37
|
# Returns monospace display width of string
|
37
38
|
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
@@ -41,7 +42,7 @@ module Unicode
|
|
41
42
|
end
|
42
43
|
|
43
44
|
options[:ambiguous] = ambiguous if ambiguous
|
44
|
-
options[:ambiguous] ||=
|
45
|
+
options[:ambiguous] ||= DEFAULT_AMBIGUOUS
|
45
46
|
|
46
47
|
if options[:ambiguous] != 1 && options[:ambiguous] != 2
|
47
48
|
raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
|
@@ -53,7 +54,7 @@ module Unicode
|
|
53
54
|
end
|
54
55
|
options[:overwrite] ||= {}
|
55
56
|
|
56
|
-
if
|
57
|
+
if [nil, true, :auto].include?(options[:emoji])
|
57
58
|
options[:emoji] = EmojiSupport.recommended
|
58
59
|
end
|
59
60
|
|
@@ -87,12 +88,13 @@ module Unicode
|
|
87
88
|
|
88
89
|
def self.width_frame(string, options)
|
89
90
|
# Retrieve Emoji width
|
90
|
-
if
|
91
|
+
if options[:emoji] == false || options[:emoji] == :none
|
91
92
|
res = 0
|
92
|
-
else
|
93
|
+
else
|
93
94
|
res, string = emoji_width(
|
94
95
|
string,
|
95
96
|
options[:emoji],
|
97
|
+
options[:ambiguous],
|
96
98
|
)
|
97
99
|
end
|
98
100
|
|
@@ -163,32 +165,81 @@ module Unicode
|
|
163
165
|
end
|
164
166
|
|
165
167
|
|
166
|
-
def self.emoji_width(string,
|
168
|
+
def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
|
167
169
|
res = 0
|
168
170
|
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
|
171
|
+
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
|
172
|
+
|
173
|
+
if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
|
174
|
+
emoji_width_via_possible(
|
175
|
+
string,
|
176
|
+
Unicode::Emoji.const_get(emoji_set_regex),
|
177
|
+
mode == :rgi_at,
|
178
|
+
ambiguous,
|
179
|
+
)
|
180
|
+
elsif mode == :all_no_vs16
|
181
|
+
emoji_width_all(string)
|
182
|
+
elsif mode == :vs16
|
183
|
+
emoji_width_basic(string)
|
184
|
+
elsif mode == :all
|
185
|
+
res_all, string = emoji_width_all(string)
|
186
|
+
res_basic, string = emoji_width_basic(string)
|
187
|
+
[res_all + res_basic, string]
|
188
|
+
else
|
189
|
+
[0, string]
|
173
190
|
end
|
191
|
+
end
|
174
192
|
|
175
|
-
|
176
|
-
|
193
|
+
# Ensure all explicit VS16 sequences have width 2
|
194
|
+
def self.emoji_width_basic(string)
|
195
|
+
res = 0
|
177
196
|
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
197
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
|
198
|
+
if basic_emoji.size >= 2 # VS16 present
|
199
|
+
res += 2
|
200
|
+
""
|
201
|
+
else
|
202
|
+
basic_emoji
|
203
|
+
end
|
204
|
+
}
|
184
205
|
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
206
|
+
[res, no_emoji_string]
|
207
|
+
end
|
208
|
+
|
209
|
+
# Use simplistic ZWJ/modifier/kecap sequence matching
|
210
|
+
def self.emoji_width_all(string)
|
211
|
+
res = 0
|
212
|
+
|
213
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){
|
214
|
+
res += 2
|
215
|
+
""
|
216
|
+
}
|
217
|
+
|
218
|
+
[res, no_emoji_string]
|
219
|
+
end
|
220
|
+
|
221
|
+
# Match possible Emoji first, then refine
|
222
|
+
def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS)
|
223
|
+
res = 0
|
224
|
+
|
225
|
+
# For each string possibly an emoji
|
226
|
+
no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
|
227
|
+
# Skip notorious false positives
|
228
|
+
if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
|
229
|
+
emoji_candidate
|
189
230
|
|
190
|
-
|
231
|
+
# Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
|
232
|
+
elsif emoji_candidate == emoji_candidate[emoji_set_regex]
|
233
|
+
if strict_eaw
|
234
|
+
res += self.of(emoji_candidate[0], ambiguous, emoji: false)
|
191
235
|
else
|
236
|
+
res += 2
|
237
|
+
end
|
238
|
+
""
|
239
|
+
|
240
|
+
# We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
|
241
|
+
else
|
242
|
+
if !strict_eaw
|
192
243
|
# Ensure all explicit VS16 sequences have width 2
|
193
244
|
emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
|
194
245
|
if basic_emoji.size == 2 # VS16 present
|
@@ -198,28 +249,16 @@ module Unicode
|
|
198
249
|
basic_emoji
|
199
250
|
end
|
200
251
|
}
|
201
|
-
|
202
|
-
emoji_candidate
|
203
252
|
end
|
204
|
-
}
|
205
|
-
else
|
206
|
-
# Only consider basic emoji
|
207
253
|
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
res += 2
|
212
|
-
""
|
213
|
-
else
|
214
|
-
basic_emoji
|
215
|
-
end
|
216
|
-
}
|
217
|
-
end
|
254
|
+
emoji_candidate
|
255
|
+
end
|
256
|
+
}
|
218
257
|
|
219
258
|
[res, no_emoji_string]
|
220
259
|
end
|
221
260
|
|
222
|
-
def initialize(ambiguous:
|
261
|
+
def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
|
223
262
|
@ambiguous = ambiguous
|
224
263
|
@overwrite = overwrite
|
225
264
|
@emoji = emoji
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-11-
|
11
|
+
date: 2024-11-18 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: unicode-emoji
|