unicode-display_width 3.1.1 → 3.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1886b39340f01645fd4e64d032ab1611e471e60c4dbd46e3d8867125ef45232d
4
- data.tar.gz: dc452df48efa0f7f9cd862b0390f611df83e4dcd82c640f4faa151b05022596f
3
+ metadata.gz: 4b0b5fe12467a22c6b21ad6dfb8dc422eb547e252690e432afc7504e8dae641c
4
+ data.tar.gz: a2c0e4c856034b1ef64946861d33845615cd8c950da462441faea2f900c14502
5
5
  SHA512:
6
- metadata.gz: 85dfef303836ba1c13271144ad24f89dbc40591d9056ee187c9ee5b7b6ff1f19d8d9ebd4e21108f78a61d2e3d3c6ce44f560005e3216cf3f8fa595466c50dfc7
7
- data.tar.gz: eae08ff81ed83a3965820aaf09ce10fd028428adf6653807d3fb87b0c96b9ae7be757d38ffa90a8a27b39a8454e4b2694109a8aaddcc75d21d0e449ce8f7f628
6
+ metadata.gz: 8a9499ffcdc0f6def0ac88fc13aaaaea0e46031a63e05c00c32872e1f066d7550abd8e0cb3efaea084a07ebc168e87ed2a3a32effa040c1a651c3288157704f1
7
+ data.tar.gz: f6a6c7e002476db323d52073ef00567c6eafb564864f1b9c6b72ba7228c5f1a0325e12a07445e0d7a7af66ff988ccd05e89d268898c78c2eaaf97735f79b3f90
data/CHANGELOG.md CHANGED
@@ -1,5 +1,16 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 3.1.3
4
+
5
+ Better handling of non-UTF-8 strings, patch by @Earlopain:
6
+
7
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
8
+ - Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
9
+
10
+ ## 3.1.2
11
+
12
+ - Performance improvements
13
+
3
14
  ## 3.1.1
4
15
 
5
16
  - Performance improvements
@@ -11,7 +22,7 @@
11
22
  - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
12
23
  ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
13
24
  to implement.
14
- - Unify `rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
25
+ - Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
15
26
  the former `:rgi_uqe` option). Most terminals that want to support the RGI set
16
27
  will probably want to catch Emoji sequences with missing VS16s.
17
28
  - Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
@@ -24,6 +35,7 @@
24
35
 
25
36
  ## 3.0.1
26
37
 
38
+
27
39
  - Add WezTerm and foot as good Emoji terminals
28
40
 
29
41
  ## 3.0.0
data/README.md CHANGED
@@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1
71
71
  Unicode::DisplayWidth.of("·", 2) # => 2
72
72
  ```
73
73
 
74
+ ### Encoding Notes
75
+
76
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
77
+ - Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
78
+
74
79
  ### Custom Overwrites
75
80
 
76
81
  You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
@@ -126,7 +131,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi
126
131
 
127
132
  Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
128
133
 
129
- Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
134
+ Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can checkout how your terminals renders different kind of Emoji types with this [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
130
135
 
131
136
  **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
132
137
 
Binary file
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "3.1.1"
5
+ VERSION = "3.1.3"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
@@ -10,8 +10,8 @@ module Unicode
10
10
  class DisplayWidth
11
11
  DEFAULT_AMBIGUOUS = 1
12
12
  INITIAL_DEPTH = 0x10000
13
- ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
14
- ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
13
+ ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/
14
+ ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F"
15
15
  ASCII_BACKSPACE = "\b"
16
16
  AMBIGUOUS_MAP = {
17
17
  1 => :WIDTH_ONE,
@@ -21,6 +21,10 @@ module Unicode
21
21
  WIDTH_ONE: 768,
22
22
  WIDTH_TWO: 161,
23
23
  }
24
+ NOT_COMMON_NARROW_REGEX = {
25
+ WIDTH_ONE: /[^\u{10}-\u{2FF}]/m,
26
+ WIDTH_TWO: /[^\u{10}-\u{A1}]/m,
27
+ }
24
28
  FIRST_4096 = {
25
29
  WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
26
30
  WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
@@ -30,7 +34,6 @@ module Unicode
30
34
  rgi_at: :REGEX_INCLUDE_MQE_UQE,
31
35
  possible: :REGEX_WELL_FORMED,
32
36
  }
33
- REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
34
37
  REGEX_EMOJI_VS16 = Regexp.union(
35
38
  Regexp.compile(
36
39
  Unicode::Emoji::REGEX_TEXT_PRESENTATION.source +
@@ -44,120 +47,55 @@ module Unicode
44
47
 
45
48
  # Returns monospace display width of string
46
49
  def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
47
- unless old_options.empty?
48
- warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
49
- options.merge! old_options
50
+ # Binary strings don't make much sense when calculating display width.
51
+ # Assume it's valid UTF-8
52
+ if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
53
+ # Didn't work out, go back to binary
54
+ string.force_encoding(Encoding::BINARY)
50
55
  end
51
56
 
52
- options[:ambiguous] = ambiguous if ambiguous
53
- options[:ambiguous] ||= DEFAULT_AMBIGUOUS
57
+ string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
58
+ options = normalize_options(string, ambiguous, overwrite, old_options, **options)
54
59
 
55
- if options[:ambiguous] != 1 && options[:ambiguous] != 2
56
- raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
57
- end
60
+ width = 0
58
61
 
59
- if overwrite && !overwrite.empty?
60
- warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
61
- options[:overwrite] = overwrite
62
+ unless options[:overwrite].empty?
63
+ width, string = width_custom(string, options[:overwrite])
62
64
  end
63
- options[:overwrite] ||= {}
64
65
 
65
- if [nil, true, :auto].include?(options[:emoji])
66
- options[:emoji] = EmojiSupport.recommended
66
+ if string.ascii_only?
67
+ return width + width_ascii(string)
67
68
  end
68
69
 
69
- # # #
70
-
71
- if !options[:overwrite].empty?
72
- return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
73
- width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite])
74
- end
75
- end
76
-
77
- if !string.ascii_only?
78
- return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
79
- width_no_overwrite(string, index_full, index_low, first_ambiguous)
80
- end
81
- end
82
-
83
- width_ascii(string)
84
- end
70
+ ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
85
71
 
86
- def self.width_ascii(string)
87
- # Optimization for ASCII-only strings without certain control symbols
88
- if string.match?(ASCII_NON_ZERO_REGEX)
89
- res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE)
90
- return res < 0 ? 0 : res
72
+ unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
73
+ return width + string.size
91
74
  end
92
75
 
93
- # Pure ASCII
94
- string.size
95
- end
96
-
97
- def self.width_frame(string, options)
98
76
  # Retrieve Emoji width
99
- if options[:emoji] == false || options[:emoji] == :none
100
- res = 0
101
- else
102
- res, string = emoji_width(
77
+ if options[:emoji] != :none
78
+ e_width, string = emoji_width(
103
79
  string,
104
80
  options[:emoji],
105
81
  options[:ambiguous],
106
82
  )
107
- end
108
-
109
- # Prepare indexes
110
- ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
111
-
112
- # Get general width
113
- res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name])
83
+ width += e_width
114
84
 
115
- # Return result + prevent negative lengths
116
- res < 0 ? 0 : res
117
- end
118
-
119
- def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {})
120
- res = 0
121
-
122
- # Make sure we have UTF-8
123
- string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
124
-
125
- string.scan(/.{,80}/m){ |batch|
126
- if batch.ascii_only?
127
- res += batch.size
128
- else
129
- batch.each_codepoint{ |codepoint|
130
- if codepoint > 15 && codepoint < first_ambiguous
131
- res += 1
132
- elsif codepoint < 0x1001
133
- res += index_low[codepoint] || 1
134
- else
135
- d = INITIAL_DEPTH
136
- w = index_full[codepoint / d]
137
- while w.instance_of? Array
138
- w = w[(codepoint %= d) / (d /= 16)]
139
- end
140
-
141
- res += w || 1
142
- end
143
- }
85
+ unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
86
+ return width + string.size
144
87
  end
145
- }
88
+ end
146
89
 
147
- res
148
- end
149
-
150
- # Same as .width_no_overwrite - but with applying overwrites for each char
151
- def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite)
152
- res = 0
90
+ index_full = INDEX[ambiguous_index_name]
91
+ index_low = FIRST_4096[ambiguous_index_name]
92
+ first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name]
153
93
 
154
94
  string.each_codepoint{ |codepoint|
155
- if overwrite[codepoint]
156
- res += overwrite[codepoint]
157
- elsif codepoint > 15 && codepoint < first_ambiguous
158
- res += 1
95
+ if codepoint > 15 && codepoint < first_ambiguous
96
+ width += 1
159
97
  elsif codepoint < 0x1001
160
- res += index_low[codepoint] || 1
98
+ width += index_low[codepoint] || 1
161
99
  else
162
100
  d = INITIAL_DEPTH
163
101
  w = index_full[codepoint / d]
@@ -165,19 +103,44 @@ module Unicode
165
103
  w = w[(codepoint %= d) / (d /= 16)]
166
104
  end
167
105
 
168
- res += w || 1
106
+ width += w || 1
169
107
  end
170
108
  }
171
109
 
172
- res
110
+ # Return result + prevent negative lengths
111
+ width < 0 ? 0 : width
173
112
  end
174
113
 
114
+ # Returns width of custom overwrites and remaining string
115
+ def self.width_custom(string, overwrite)
116
+ width = 0
117
+
118
+ string = string.each_codepoint.select{ |codepoint|
119
+ if overwrite[codepoint]
120
+ width += overwrite[codepoint]
121
+ nil
122
+ else
123
+ codepoint
124
+ end
125
+ }.pack("U*")
126
+
127
+ [width, string]
128
+ end
129
+
130
+ # Returns width for ASCII-only strings. Will consider zero-width control symbols.
131
+ def self.width_ascii(string)
132
+ if string.match?(ASCII_NON_ZERO_REGEX)
133
+ res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE)
134
+ return res < 0 ? 0 : res
135
+ end
136
+
137
+ string.bytesize
138
+ end
175
139
 
140
+ # Returns width of all considered Emoji and remaining string
176
141
  def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
177
142
  res = 0
178
143
 
179
- string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
180
-
181
144
  if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
182
145
  emoji_width_via_possible(
183
146
  string,
@@ -209,13 +172,9 @@ module Unicode
209
172
  res = 0
210
173
 
211
174
  # For each string possibly an emoji
212
- no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
213
- # Skip notorious false positives
214
- if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
215
- emoji_candidate
216
-
175
+ no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate|
217
176
  # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
218
- elsif emoji_candidate == emoji_candidate[emoji_set_regex]
177
+ if emoji_candidate == emoji_candidate[emoji_set_regex]
219
178
  if strict_eaw
220
179
  res += self.of(emoji_candidate[0], ambiguous, emoji: false)
221
180
  else
@@ -237,6 +196,34 @@ module Unicode
237
196
  [res, no_emoji_string]
238
197
  end
239
198
 
199
+ def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
200
+ unless old_options.empty?
201
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
202
+ options.merge! old_options
203
+ end
204
+
205
+ options[:ambiguous] = ambiguous if ambiguous
206
+ options[:ambiguous] ||= DEFAULT_AMBIGUOUS
207
+
208
+ if options[:ambiguous] != 1 && options[:ambiguous] != 2
209
+ raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
210
+ end
211
+
212
+ if overwrite && !overwrite.empty?
213
+ warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
214
+ options[:overwrite] = overwrite
215
+ end
216
+ options[:overwrite] ||= {}
217
+
218
+ if [nil, true, :auto].include?(options[:emoji])
219
+ options[:emoji] = EmojiSupport.recommended
220
+ elsif options[:emoji] == false
221
+ options[:emoji] = :none
222
+ end
223
+
224
+ options
225
+ end
226
+
240
227
  def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
241
228
  @ambiguous = ambiguous
242
229
  @overwrite = overwrite
@@ -256,4 +243,3 @@ module Unicode
256
243
  end
257
244
  end
258
245
  end
259
-
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.1.1
4
+ version: 3.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-11-19 00:00:00.000000000 Z
11
+ date: 2024-12-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: unicode-emoji
@@ -104,7 +104,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
104
104
  - !ruby/object:Gem::Version
105
105
  version: '0'
106
106
  requirements: []
107
- rubygems_version: 3.5.21
107
+ rubygems_version: 3.1.6
108
108
  signing_key:
109
109
  specification_version: 4
110
110
  summary: Determines the monospace display width of a string in Ruby.