unicode-display_width 3.0.1 → 3.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2d2f36aeac48ce4e9e4fb84f578bbf652d59ee81cca87f19608d302a0cf175bc
4
- data.tar.gz: d7df0d9fe9d2cc82a615aaa2ceb0d149606d0992aeb9bc886f1ec565ed19ae5c
3
+ metadata.gz: 01657362aaf60cf79bb03c63bb96e01914139c7bb965dc9bed18e7988b8c6709
4
+ data.tar.gz: 297cc1ab03e72a02e9f33eb4eec2dea2006f23987818083c4bc12aa168e437c3
5
5
  SHA512:
6
- metadata.gz: 5d5ec1bea71952674bb8d868d06eaf0d634869acbd57cf3bb2119e7b0b24b719ee193ac43cdc7b36146c35398b13e0e6567eea120ec7bc6abd91c68192786532
7
- data.tar.gz: f6f94791603e6eb719e02e0054eb8a7b539747859ab460ecee61b4e383107594d1b9dc2aacae947542c4a8c8826e47e07659e3883e2e32051a81054f3bc8c75e
6
+ metadata.gz: a3878d504a273e44268762fca4857bf26a9322e0e54c0afc437d953dca675822262c9aec54cb5de3d23390b4b778403b36ce0f73ba5b0f1d2c8554a1f796d210
7
+ data.tar.gz: 00de0d22f3b245f16de15b3b4864ff754da04ea94eafdeaf06c0e38fec8cfb2559fbeaafc17f165534e70a386e154f70d7b071f5a226c9f64d7088bbb408cabb
data/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 3.1.0
4
+
5
+ **Improve Emoji support:**
6
+
7
+ - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
8
+ ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
9
+ to implement.
10
+ - Unify `rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
11
+ the former `:rgi_uqe` option). Most terminals that want to support the RGI set
12
+ will probably want to catch Emoji sequences with missing VS16s.
13
+ - Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
14
+ that needs these quirks
15
+ - Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
16
+ - `:auto` mode: Only consider terminal cells when recommending Emoji support level
17
+ (Emoji themselves might display differently)
18
+ - `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
19
+ - Rename `:basic` mode to `:vs16`
20
+
3
21
  ## 3.0.1
4
22
 
5
23
  - Add WezTerm and foot as good Emoji terminals
data/README.md CHANGED
@@ -4,7 +4,7 @@ Determines the monospace display width of a string in Ruby, which is useful for
4
4
 
5
5
  Unicode version: **16.0.0** (September 2024)
6
6
 
7
- ## Gem Version 3.0 — Improved Emoji Support
7
+ ## Gem Version 3 — Improved Emoji Support
8
8
 
9
9
  **Emoji support is now enabled by default.** See below for description and configuration possibilities.
10
10
 
@@ -81,58 +81,53 @@ Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB cou
81
81
 
82
82
  Please note that using overwrites disables some perfomance optimizations of this gem.
83
83
 
84
- ### Emoji Option
84
+ ### Emoji
85
85
 
86
- The gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
86
+ If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
87
87
 
88
88
  ```ruby
89
- Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 2
89
+ Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: :all # => 2
90
90
  Unicode::DisplayWidth.of "🤾🏽‍♀️", emoji: false # => 5
91
91
  ```
92
92
 
93
- Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:
94
-
95
- ```ruby
96
- Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_mqe # => 3
97
- Unicode::DisplayWidth.of "🐻‍❄", emoji: :rgi_uqe # => 2
98
- ```
99
-
100
93
  #### How this Library Handles Emoji Width
101
94
 
102
95
  There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
103
96
 
97
+ Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
98
+
104
99
  Emoji Type | Width / Comment
105
100
  ------------|----------------
106
- Basic/Single Emoji character without Variation Selector | No special handling, uses mechanism from table above
107
- Basic/Single Emoji character with VS15 (Text) | No special handling, uses mechanism from table above
108
- Basic/Single Emoji character with VS16 (Emoji) | 2
109
- Emoji Sequence | 2 (only if sequence belongs to configured Emoji set)
110
-
111
- The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
101
+ Basic/Single Emoji character without Variation Selector | No special handling
102
+ Basic/Single Emoji character with VS15 (Text) | No special handling
103
+ Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
104
+ Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
112
105
 
113
- Option | Descriptions
114
- -------|-------------
115
- `emoji: true` | Use recommended Emoji set on your platform, see section below
116
- `emoji: :basic` | No width adjustments for Emoji sequences: all partial Emoji treated separately
117
- `emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
118
- `emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
119
- `emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
120
- `emoji: :all` | All possible/well-formed Emoji sequences are considered to have a width of 2
121
- `emoji: false` | No Emoji adjustments, Emoji characters with VS16 not handled
106
+ #### Emoji Modes
122
107
 
123
- *RGI Emoji:* Emoji Recommended for General Interchange
108
+ The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
124
109
 
125
- *Qualification:* Whether an Emoji sequence has all required VS16 codepoints
110
+ `emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
111
+ ----------------|------------------|---------------------------------|------------------
112
+ `true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | -
113
+ `:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
114
+ `:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
115
+ `:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ?
116
+ `:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
117
+ `:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have width 1 | Apple Terminal
118
+ `:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
119
+ `false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals
126
120
 
127
- See [emoji-test.txt](https://www.unicode.org/Public/emoji/16.0/emoji-test.txt), the [unicode-emoji gem](https://github.com/janlelis/unicode-emoji) and [UTS-51](https://www.unicode.org/reports/tr51/#def_qualified_emoji_character) for more details about qualified and unqualified Emoji sequences.
121
+ - *RGI Emoji:* Emoji Recommended for General Interchange
122
+ - *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences
128
123
 
129
124
  #### Emoji Support in Terminals
130
125
 
131
- Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji set for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:basic` on Gnome's terminal widget).
126
+ Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
132
127
 
133
- Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
128
+ Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
134
129
 
135
- You are encouraged to give your users the option to configure the level of Emoji support in your library or application and for the best developer experience in their terminals. (same is true for ambigouos width).
130
+ **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
136
131
 
137
132
  ### Usage with String Extension
138
133
 
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "3.0.1"
5
+ VERSION = "3.1.0"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
@@ -7,30 +7,39 @@ module Unicode
7
7
  # Tries to find out which terminal emulator is used to
8
8
  # set emoji: config to best suiting value
9
9
  #
10
- # Please note: Many terminals do not set any ENV vars
10
+ # Please also see section in README.md and
11
+ # misc/terminal-emoji-width.rb
12
+ #
13
+ # Please note: Many terminals do not set any ENV vars,
14
+ # maybe CSI queries can help?
11
15
  def self.recommended
12
16
  if ENV["CI"]
13
- return :rqi_uqe
17
+ return :rqi
14
18
  end
15
19
 
16
20
  case ENV["TERM_PROGRAM"]
17
- when "iTerm.app", "WezTerm"
21
+ when "iTerm.app"
18
22
  return :all
19
23
  when "Apple_Terminal"
20
- return :rgi_uqe
24
+ return :rgi_at
25
+ when "WezTerm"
26
+ return :all_no_vs16
21
27
  end
22
28
 
23
29
  case ENV["TERM"]
24
- when "foot"
30
+ when "contour","foot"
31
+ # konsole: all, how to detect?
25
32
  return :all
26
- when "contour"
27
- return :rgi_uqe
28
33
  when /kitty/
29
- return :rgi_fqe
34
+ return :vs16
35
+ end
36
+
37
+ if ENV["WT_SESSION"] # Windows Terminal
38
+ return :vs16
30
39
  end
31
40
 
32
- # As of last time checked: gnome-terminal, vscode, alacritty, konsole
33
- :basic
41
+ # As of last time checked: gnome-terminal, vscode, alacritty
42
+ :none
34
43
  end
35
44
 
36
45
  # Maybe: Implement something like https://github.com/jquast/ucs-detect
@@ -8,6 +8,7 @@ require_relative "display_width/emoji_support"
8
8
 
9
9
  module Unicode
10
10
  class DisplayWidth
11
+ DEFAULT_AMBIGUOUS = 1
11
12
  INITIAL_DEPTH = 0x10000
12
13
  ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
13
14
  ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
@@ -25,13 +26,13 @@ module Unicode
25
26
  WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
26
27
  }
27
28
  EMOJI_SEQUENCES_REGEX_MAPPING = {
28
- rgi_fqe: :REGEX,
29
- rgi_mqe: :REGEX_INCLUDE_MQE,
30
- rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
31
- all: :REGEX_WELL_FORMED,
29
+ rgi: :REGEX_INCLUDE_MQE_UQE,
30
+ rgi_at: :REGEX_INCLUDE_MQE_UQE,
31
+ possible: :REGEX_WELL_FORMED,
32
32
  }
33
- EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
34
33
  REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
34
+ REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
35
+ REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
35
36
 
36
37
  # Returns monospace display width of string
37
38
  def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
@@ -41,7 +42,7 @@ module Unicode
41
42
  end
42
43
 
43
44
  options[:ambiguous] = ambiguous if ambiguous
44
- options[:ambiguous] ||= 1
45
+ options[:ambiguous] ||= DEFAULT_AMBIGUOUS
45
46
 
46
47
  if options[:ambiguous] != 1 && options[:ambiguous] != 2
47
48
  raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
@@ -53,7 +54,7 @@ module Unicode
53
54
  end
54
55
  options[:overwrite] ||= {}
55
56
 
56
- if options[:emoji] == nil || options[:emoji] == true
57
+ if [nil, true, :auto].include?(options[:emoji])
57
58
  options[:emoji] = EmojiSupport.recommended
58
59
  end
59
60
 
@@ -87,12 +88,13 @@ module Unicode
87
88
 
88
89
  def self.width_frame(string, options)
89
90
  # Retrieve Emoji width
90
- if !options[:emoji]
91
+ if options[:emoji] == false || options[:emoji] == :none
91
92
  res = 0
92
- else options[:emoji]
93
+ else
93
94
  res, string = emoji_width(
94
95
  string,
95
96
  options[:emoji],
97
+ options[:ambiguous],
96
98
  )
97
99
  end
98
100
 
@@ -163,32 +165,81 @@ module Unicode
163
165
  end
164
166
 
165
167
 
166
- def self.emoji_width(string, sequences = :rgi_fqe)
168
+ def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
167
169
  res = 0
168
170
 
169
- if regex = EMOJI_SEQUENCES_REGEX_MAPPING[sequences]
170
- emoji_sequence_regex = Unicode::Emoji.const_get(regex)
171
- else # sequences == :basic
172
- emoji_sequence_regex = nil
171
+ string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
172
+
173
+ if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
174
+ emoji_width_via_possible(
175
+ string,
176
+ Unicode::Emoji.const_get(emoji_set_regex),
177
+ mode == :rgi_at,
178
+ ambiguous,
179
+ )
180
+ elsif mode == :all_no_vs16
181
+ emoji_width_all(string)
182
+ elsif mode == :vs16
183
+ emoji_width_basic(string)
184
+ elsif mode == :all
185
+ res_all, string = emoji_width_all(string)
186
+ res_basic, string = emoji_width_basic(string)
187
+ [res_all + res_basic, string]
188
+ else
189
+ [0, string]
173
190
  end
191
+ end
174
192
 
175
- # Make sure we have UTF-8
176
- string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
193
+ # Ensure all explicit VS16 sequences have width 2
194
+ def self.emoji_width_basic(string)
195
+ res = 0
177
196
 
178
- if emoji_sequence_regex
179
- # For each string possibly an emoji
180
- no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
181
- # Skip notorious false positives
182
- if EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
183
- emoji_candidate
197
+ no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
198
+ if basic_emoji.size >= 2 # VS16 present
199
+ res += 2
200
+ ""
201
+ else
202
+ basic_emoji
203
+ end
204
+ }
184
205
 
185
- # Check if we have a combined Emoji with width 2
186
- elsif emoji_candidate == emoji_candidate[emoji_sequence_regex]
187
- res += 2
188
- ""
206
+ [res, no_emoji_string]
207
+ end
208
+
209
+ # Use simplistic ZWJ/modifier/kecap sequence matching
210
+ def self.emoji_width_all(string)
211
+ res = 0
212
+
213
+ no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){
214
+ res += 2
215
+ ""
216
+ }
217
+
218
+ [res, no_emoji_string]
219
+ end
220
+
221
+ # Match possible Emoji first, then refine
222
+ def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS)
223
+ res = 0
224
+
225
+ # For each string possibly an emoji
226
+ no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
227
+ # Skip notorious false positives
228
+ if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
229
+ emoji_candidate
189
230
 
190
- # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
231
+ # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
232
+ elsif emoji_candidate == emoji_candidate[emoji_set_regex]
233
+ if strict_eaw
234
+ res += self.of(emoji_candidate[0], ambiguous, emoji: false)
191
235
  else
236
+ res += 2
237
+ end
238
+ ""
239
+
240
+ # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
241
+ else
242
+ if !strict_eaw
192
243
  # Ensure all explicit VS16 sequences have width 2
193
244
  emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
194
245
  if basic_emoji.size == 2 # VS16 present
@@ -198,28 +249,16 @@ module Unicode
198
249
  basic_emoji
199
250
  end
200
251
  }
201
-
202
- emoji_candidate
203
252
  end
204
- }
205
- else
206
- # Only consider basic emoji
207
253
 
208
- # Ensure all explicit VS16 sequences have width 2
209
- no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
210
- if basic_emoji.size >= 2 # VS16 present
211
- res += 2
212
- ""
213
- else
214
- basic_emoji
215
- end
216
- }
217
- end
254
+ emoji_candidate
255
+ end
256
+ }
218
257
 
219
258
  [res, no_emoji_string]
220
259
  end
221
260
 
222
- def initialize(ambiguous: 1, overwrite: {}, emoji: true)
261
+ def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
223
262
  @ambiguous = ambiguous
224
263
  @overwrite = overwrite
225
264
  @emoji = emoji
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.1
4
+ version: 3.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-11-13 00:00:00.000000000 Z
11
+ date: 2024-11-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: unicode-emoji