unicode-display_width 3.1.1 → 3.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -1
- data/README.md +6 -1
- data/data/display_width.marshal.gz +0 -0
- data/lib/unicode/display_width/constants.rb +1 -1
- data/lib/unicode/display_width.rb +92 -106
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4b0b5fe12467a22c6b21ad6dfb8dc422eb547e252690e432afc7504e8dae641c
|
4
|
+
data.tar.gz: a2c0e4c856034b1ef64946861d33845615cd8c950da462441faea2f900c14502
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8a9499ffcdc0f6def0ac88fc13aaaaea0e46031a63e05c00c32872e1f066d7550abd8e0cb3efaea084a07ebc168e87ed2a3a32effa040c1a651c3288157704f1
|
7
|
+
data.tar.gz: f6a6c7e002476db323d52073ef00567c6eafb564864f1b9c6b72ba7228c5f1a0325e12a07445e0d7a7af66ff988ccd05e89d268898c78c2eaaf97735f79b3f90
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,16 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 3.1.3
|
4
|
+
|
5
|
+
Better handling of non-UTF-8 strings, patch by @Earlopain:
|
6
|
+
|
7
|
+
- Data with *BINARY* encoding is interpreted as UTF-8, if possible
|
8
|
+
- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
|
9
|
+
|
10
|
+
## 3.1.2
|
11
|
+
|
12
|
+
- Performance improvements
|
13
|
+
|
3
14
|
## 3.1.1
|
4
15
|
|
5
16
|
- Performance improvements
|
@@ -11,7 +22,7 @@
|
|
11
22
|
- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
|
12
23
|
ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
|
13
24
|
to implement.
|
14
|
-
- Unify
|
25
|
+
- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
|
15
26
|
the former `:rgi_uqe` option). Most terminals that want to support the RGI set
|
16
27
|
will probably want to catch Emoji sequences with missing VS16s.
|
17
28
|
- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
|
@@ -24,6 +35,7 @@
|
|
24
35
|
|
25
36
|
## 3.0.1
|
26
37
|
|
38
|
+
|
27
39
|
- Add WezTerm and foot as good Emoji terminals
|
28
40
|
|
29
41
|
## 3.0.0
|
data/README.md
CHANGED
@@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1
|
|
71
71
|
Unicode::DisplayWidth.of("·", 2) # => 2
|
72
72
|
```
|
73
73
|
|
74
|
+
### Encoding Notes
|
75
|
+
|
76
|
+
- Data with *BINARY* encoding is interpreted as UTF-8, if possible
|
77
|
+
- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
|
78
|
+
|
74
79
|
### Custom Overwrites
|
75
80
|
|
76
81
|
You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
|
@@ -126,7 +131,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi
|
|
126
131
|
|
127
132
|
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
|
128
133
|
|
129
|
-
Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
|
134
|
+
Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can checkout how your terminals renders different kind of Emoji types with this [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
|
130
135
|
|
131
136
|
**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
|
132
137
|
|
Binary file
|
@@ -10,8 +10,8 @@ module Unicode
|
|
10
10
|
class DisplayWidth
|
11
11
|
DEFAULT_AMBIGUOUS = 1
|
12
12
|
INITIAL_DEPTH = 0x10000
|
13
|
-
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n
|
14
|
-
ASCII_NON_ZERO_STRING = "\0\x05\a\b\n
|
13
|
+
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/
|
14
|
+
ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F"
|
15
15
|
ASCII_BACKSPACE = "\b"
|
16
16
|
AMBIGUOUS_MAP = {
|
17
17
|
1 => :WIDTH_ONE,
|
@@ -21,6 +21,10 @@ module Unicode
|
|
21
21
|
WIDTH_ONE: 768,
|
22
22
|
WIDTH_TWO: 161,
|
23
23
|
}
|
24
|
+
NOT_COMMON_NARROW_REGEX = {
|
25
|
+
WIDTH_ONE: /[^\u{10}-\u{2FF}]/m,
|
26
|
+
WIDTH_TWO: /[^\u{10}-\u{A1}]/m,
|
27
|
+
}
|
24
28
|
FIRST_4096 = {
|
25
29
|
WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
|
26
30
|
WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
|
@@ -30,7 +34,6 @@ module Unicode
|
|
30
34
|
rgi_at: :REGEX_INCLUDE_MQE_UQE,
|
31
35
|
possible: :REGEX_WELL_FORMED,
|
32
36
|
}
|
33
|
-
REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
|
34
37
|
REGEX_EMOJI_VS16 = Regexp.union(
|
35
38
|
Regexp.compile(
|
36
39
|
Unicode::Emoji::REGEX_TEXT_PRESENTATION.source +
|
@@ -44,120 +47,55 @@ module Unicode
|
|
44
47
|
|
45
48
|
# Returns monospace display width of string
|
46
49
|
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
+
# Binary strings don't make much sense when calculating display width.
|
51
|
+
# Assume it's valid UTF-8
|
52
|
+
if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
|
53
|
+
# Didn't work out, go back to binary
|
54
|
+
string.force_encoding(Encoding::BINARY)
|
50
55
|
end
|
51
56
|
|
52
|
-
|
53
|
-
options
|
57
|
+
string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
|
58
|
+
options = normalize_options(string, ambiguous, overwrite, old_options, **options)
|
54
59
|
|
55
|
-
|
56
|
-
raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
|
57
|
-
end
|
60
|
+
width = 0
|
58
61
|
|
59
|
-
|
60
|
-
|
61
|
-
options[:overwrite] = overwrite
|
62
|
+
unless options[:overwrite].empty?
|
63
|
+
width, string = width_custom(string, options[:overwrite])
|
62
64
|
end
|
63
|
-
options[:overwrite] ||= {}
|
64
65
|
|
65
|
-
if
|
66
|
-
|
66
|
+
if string.ascii_only?
|
67
|
+
return width + width_ascii(string)
|
67
68
|
end
|
68
69
|
|
69
|
-
|
70
|
-
|
71
|
-
if !options[:overwrite].empty?
|
72
|
-
return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
|
73
|
-
width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite])
|
74
|
-
end
|
75
|
-
end
|
76
|
-
|
77
|
-
if !string.ascii_only?
|
78
|
-
return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
|
79
|
-
width_no_overwrite(string, index_full, index_low, first_ambiguous)
|
80
|
-
end
|
81
|
-
end
|
82
|
-
|
83
|
-
width_ascii(string)
|
84
|
-
end
|
70
|
+
ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
|
85
71
|
|
86
|
-
|
87
|
-
|
88
|
-
if string.match?(ASCII_NON_ZERO_REGEX)
|
89
|
-
res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE)
|
90
|
-
return res < 0 ? 0 : res
|
72
|
+
unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
|
73
|
+
return width + string.size
|
91
74
|
end
|
92
75
|
|
93
|
-
# Pure ASCII
|
94
|
-
string.size
|
95
|
-
end
|
96
|
-
|
97
|
-
def self.width_frame(string, options)
|
98
76
|
# Retrieve Emoji width
|
99
|
-
if options[:emoji]
|
100
|
-
|
101
|
-
else
|
102
|
-
res, string = emoji_width(
|
77
|
+
if options[:emoji] != :none
|
78
|
+
e_width, string = emoji_width(
|
103
79
|
string,
|
104
80
|
options[:emoji],
|
105
81
|
options[:ambiguous],
|
106
82
|
)
|
107
|
-
|
108
|
-
|
109
|
-
# Prepare indexes
|
110
|
-
ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
|
111
|
-
|
112
|
-
# Get general width
|
113
|
-
res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name])
|
83
|
+
width += e_width
|
114
84
|
|
115
|
-
|
116
|
-
|
117
|
-
end
|
118
|
-
|
119
|
-
def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {})
|
120
|
-
res = 0
|
121
|
-
|
122
|
-
# Make sure we have UTF-8
|
123
|
-
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
|
124
|
-
|
125
|
-
string.scan(/.{,80}/m){ |batch|
|
126
|
-
if batch.ascii_only?
|
127
|
-
res += batch.size
|
128
|
-
else
|
129
|
-
batch.each_codepoint{ |codepoint|
|
130
|
-
if codepoint > 15 && codepoint < first_ambiguous
|
131
|
-
res += 1
|
132
|
-
elsif codepoint < 0x1001
|
133
|
-
res += index_low[codepoint] || 1
|
134
|
-
else
|
135
|
-
d = INITIAL_DEPTH
|
136
|
-
w = index_full[codepoint / d]
|
137
|
-
while w.instance_of? Array
|
138
|
-
w = w[(codepoint %= d) / (d /= 16)]
|
139
|
-
end
|
140
|
-
|
141
|
-
res += w || 1
|
142
|
-
end
|
143
|
-
}
|
85
|
+
unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
|
86
|
+
return width + string.size
|
144
87
|
end
|
145
|
-
|
88
|
+
end
|
146
89
|
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
# Same as .width_no_overwrite - but with applying overwrites for each char
|
151
|
-
def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite)
|
152
|
-
res = 0
|
90
|
+
index_full = INDEX[ambiguous_index_name]
|
91
|
+
index_low = FIRST_4096[ambiguous_index_name]
|
92
|
+
first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name]
|
153
93
|
|
154
94
|
string.each_codepoint{ |codepoint|
|
155
|
-
if
|
156
|
-
|
157
|
-
elsif codepoint > 15 && codepoint < first_ambiguous
|
158
|
-
res += 1
|
95
|
+
if codepoint > 15 && codepoint < first_ambiguous
|
96
|
+
width += 1
|
159
97
|
elsif codepoint < 0x1001
|
160
|
-
|
98
|
+
width += index_low[codepoint] || 1
|
161
99
|
else
|
162
100
|
d = INITIAL_DEPTH
|
163
101
|
w = index_full[codepoint / d]
|
@@ -165,19 +103,44 @@ module Unicode
|
|
165
103
|
w = w[(codepoint %= d) / (d /= 16)]
|
166
104
|
end
|
167
105
|
|
168
|
-
|
106
|
+
width += w || 1
|
169
107
|
end
|
170
108
|
}
|
171
109
|
|
172
|
-
|
110
|
+
# Return result + prevent negative lengths
|
111
|
+
width < 0 ? 0 : width
|
173
112
|
end
|
174
113
|
|
114
|
+
# Returns width of custom overwrites and remaining string
|
115
|
+
def self.width_custom(string, overwrite)
|
116
|
+
width = 0
|
117
|
+
|
118
|
+
string = string.each_codepoint.select{ |codepoint|
|
119
|
+
if overwrite[codepoint]
|
120
|
+
width += overwrite[codepoint]
|
121
|
+
nil
|
122
|
+
else
|
123
|
+
codepoint
|
124
|
+
end
|
125
|
+
}.pack("U*")
|
126
|
+
|
127
|
+
[width, string]
|
128
|
+
end
|
129
|
+
|
130
|
+
# Returns width for ASCII-only strings. Will consider zero-width control symbols.
|
131
|
+
def self.width_ascii(string)
|
132
|
+
if string.match?(ASCII_NON_ZERO_REGEX)
|
133
|
+
res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE)
|
134
|
+
return res < 0 ? 0 : res
|
135
|
+
end
|
136
|
+
|
137
|
+
string.bytesize
|
138
|
+
end
|
175
139
|
|
140
|
+
# Returns width of all considered Emoji and remaining string
|
176
141
|
def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
|
177
142
|
res = 0
|
178
143
|
|
179
|
-
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
|
180
|
-
|
181
144
|
if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
|
182
145
|
emoji_width_via_possible(
|
183
146
|
string,
|
@@ -209,13 +172,9 @@ module Unicode
|
|
209
172
|
res = 0
|
210
173
|
|
211
174
|
# For each string possibly an emoji
|
212
|
-
no_emoji_string = string.gsub(
|
213
|
-
# Skip notorious false positives
|
214
|
-
if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
|
215
|
-
emoji_candidate
|
216
|
-
|
175
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate|
|
217
176
|
# Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
|
218
|
-
|
177
|
+
if emoji_candidate == emoji_candidate[emoji_set_regex]
|
219
178
|
if strict_eaw
|
220
179
|
res += self.of(emoji_candidate[0], ambiguous, emoji: false)
|
221
180
|
else
|
@@ -237,6 +196,34 @@ module Unicode
|
|
237
196
|
[res, no_emoji_string]
|
238
197
|
end
|
239
198
|
|
199
|
+
def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
200
|
+
unless old_options.empty?
|
201
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
|
202
|
+
options.merge! old_options
|
203
|
+
end
|
204
|
+
|
205
|
+
options[:ambiguous] = ambiguous if ambiguous
|
206
|
+
options[:ambiguous] ||= DEFAULT_AMBIGUOUS
|
207
|
+
|
208
|
+
if options[:ambiguous] != 1 && options[:ambiguous] != 2
|
209
|
+
raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
|
210
|
+
end
|
211
|
+
|
212
|
+
if overwrite && !overwrite.empty?
|
213
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
|
214
|
+
options[:overwrite] = overwrite
|
215
|
+
end
|
216
|
+
options[:overwrite] ||= {}
|
217
|
+
|
218
|
+
if [nil, true, :auto].include?(options[:emoji])
|
219
|
+
options[:emoji] = EmojiSupport.recommended
|
220
|
+
elsif options[:emoji] == false
|
221
|
+
options[:emoji] = :none
|
222
|
+
end
|
223
|
+
|
224
|
+
options
|
225
|
+
end
|
226
|
+
|
240
227
|
def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
|
241
228
|
@ambiguous = ambiguous
|
242
229
|
@overwrite = overwrite
|
@@ -256,4 +243,3 @@ module Unicode
|
|
256
243
|
end
|
257
244
|
end
|
258
245
|
end
|
259
|
-
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.1.
|
4
|
+
version: 3.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-12-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: unicode-emoji
|
@@ -104,7 +104,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
104
104
|
- !ruby/object:Gem::Version
|
105
105
|
version: '0'
|
106
106
|
requirements: []
|
107
|
-
rubygems_version: 3.
|
107
|
+
rubygems_version: 3.1.6
|
108
108
|
signing_key:
|
109
109
|
specification_version: 4
|
110
110
|
summary: Determines the monospace display width of a string in Ruby.
|