unicode-display_width 2.6.0 → 3.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +79 -2
- data/README.md +66 -52
- data/data/display_width.marshal.gz +0 -0
- data/lib/unicode/display_width/constants.rb +1 -1
- data/lib/unicode/display_width/emoji_support.rb +52 -0
- data/lib/unicode/display_width/reline_ext.rb +14 -0
- data/lib/unicode/display_width/string_ext.rb +3 -3
- data/lib/unicode/display_width.rb +190 -74
- metadata +28 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a85ca57ca5e291c17993e526d222dda44b884286484b3831bb8173ce92aafb1a
|
4
|
+
data.tar.gz: d1036dfc6464459de04a713e273d09dea767a3b9a9629d9e491052c2ffe97c23
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d669e8a2866b56a78bafb3fff6d2d6430fab6bb1ca2633aeaac68e0634ca14374ac0b325bc7159ef90afe0bdffd9c154700cae1fc3183b1d74281ff4b5024e1b
|
7
|
+
data.tar.gz: 5f319484d27dad70b3851398e11cd3cb93b5c4f41a6c3a76c958d505d8357f9e303b661fd7a0339262d1458b82cb8619e6682ee2dbf8c583d33fbde4fd1a8680
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,64 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 3.1.2
|
4
|
+
|
5
|
+
- Performance improvements
|
6
|
+
|
7
|
+
## 3.1.1
|
8
|
+
|
9
|
+
- Performance improvements
|
10
|
+
|
11
|
+
## 3.1.0
|
12
|
+
|
13
|
+
**Improve Emoji support:**
|
14
|
+
|
15
|
+
- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
|
16
|
+
ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
|
17
|
+
to implement.
|
18
|
+
- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
|
19
|
+
the former `:rgi_uqe` option). Most terminals that want to support the RGI set
|
20
|
+
will probably want to catch Emoji sequences with missing VS16s.
|
21
|
+
- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
|
22
|
+
that needs these quirks
|
23
|
+
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
|
24
|
+
- `:auto` mode: Only consider terminal cells when recommending Emoji support level
|
25
|
+
(Emoji themselves might display differently)
|
26
|
+
- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
|
27
|
+
- Rename `:basic` mode to `:vs16`
|
28
|
+
|
29
|
+
## 3.0.1
|
30
|
+
|
31
|
+
- Add WezTerm and foot as good Emoji terminals
|
32
|
+
|
33
|
+
## 3.0.0
|
34
|
+
|
35
|
+
**Rework Emoji support:**
|
36
|
+
|
37
|
+
- Emoji widths are now enabled by default
|
38
|
+
- Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
|
39
|
+
- VS16 turns Emoji characters of width 1 into full-width
|
40
|
+
- Please note that Emoji parsing has a notable impact on performance.
|
41
|
+
You can use the `emoji: false` option to disable Emoji adjustments
|
42
|
+
- Tries to detect terminal's Emoji support level automatically (from ENV vars)
|
43
|
+
|
44
|
+
**Index fixes and updates:**
|
45
|
+
|
46
|
+
- Private-use characters are considered ambiguous (were given width 1 before)
|
47
|
+
- Fix that a few zero-width ignorable codepoints from recent Unicode were missing
|
48
|
+
- Consider the following separators to be zero-width:
|
49
|
+
- U+2028 - LINE SEPARATOR - Zl
|
50
|
+
- U+2029 - PARAGRAPH SEPARATOR - Zp
|
51
|
+
|
52
|
+
**Other:**
|
53
|
+
|
54
|
+
- Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
|
55
|
+
with overwrite values as third parameter, be sure to put it in curly braces.
|
56
|
+
- Using third parameter or explicit hash as fourth parameter is deprecated,
|
57
|
+
please migrate to the keyword arguments API
|
58
|
+
- Gem raises `ArgumentError` for ambiguous values other than 1 or 2
|
59
|
+
- Performance optimizations
|
60
|
+
- Require Ruby 2.5
|
61
|
+
|
3
62
|
## 2.6.0
|
4
63
|
|
5
64
|
- Unicode 16
|
@@ -40,8 +99,26 @@ More performance improvements:
|
|
40
99
|
|
41
100
|
## 2.0.0
|
42
101
|
|
43
|
-
|
44
|
-
|
102
|
+
Add Support for Ruby 3.0
|
103
|
+
|
104
|
+
### Breaking Changes
|
105
|
+
|
106
|
+
Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
|
107
|
+
|
108
|
+
- Aliases of display\_width (…\_size, …\_length) have been removed
|
109
|
+
- Auto-loading of string core extension has been removed:
|
110
|
+
|
111
|
+
If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
|
112
|
+
|
113
|
+
```ruby
|
114
|
+
require "unicode/display_width/string_ext"
|
115
|
+
```
|
116
|
+
|
117
|
+
You could also change your `Gemfile` line to achieve this:
|
118
|
+
|
119
|
+
```ruby
|
120
|
+
gem "unicode-display_width", require: "unicode/display_width/string_ext"
|
121
|
+
```
|
45
122
|
|
46
123
|
## 2.0.0.pre2
|
47
124
|
|
data/README.md
CHANGED
@@ -1,39 +1,22 @@
|
|
1
|
-
|
1
|
+
# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
|
2
2
|
|
3
|
-
Determines the monospace display width of a string in Ruby
|
3
|
+
Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
|
4
4
|
|
5
5
|
Unicode version: **16.0.0** (September 2024)
|
6
6
|
|
7
|
-
|
7
|
+
## Gem Version 3 — Improved Emoji Support
|
8
8
|
|
9
|
-
|
9
|
+
**Emoji support is now enabled by default.** See below for description and configuration possibilities.
|
10
10
|
|
11
|
-
|
11
|
+
**Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
|
12
12
|
|
13
|
-
|
13
|
+
See [CHANGELOG](/CHANGELOG.md) for details.
|
14
14
|
|
15
|
-
|
16
|
-
|
17
|
-
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
|
18
|
-
|
19
|
-
## Version 2.0 — Breaking Changes
|
20
|
-
|
21
|
-
Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
|
15
|
+
## Gem Version 2.4.2 — Performance Updates
|
22
16
|
|
23
|
-
|
24
|
-
- Auto-loading of string core extension has been removed:
|
25
|
-
|
26
|
-
If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
|
27
|
-
|
28
|
-
```ruby
|
29
|
-
require "unicode/display_width/string_ext"
|
30
|
-
```
|
31
|
-
|
32
|
-
You could also change your `Gemfile` line to achieve this:
|
17
|
+
**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
|
33
18
|
|
34
|
-
|
35
|
-
gem "unicode-display_width", require: "unicode/display_width/string_ext"
|
36
|
-
```
|
19
|
+
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
|
37
20
|
|
38
21
|
## Introduction to Character Widths
|
39
22
|
|
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
|
|
45
28
|
|
46
29
|
Width | Characters | Comment
|
47
30
|
-------|------------------------------|--------------------------------------------------
|
48
|
-
|
31
|
+
? | (user defined) | Overwrites any other values
|
32
|
+
? | Emoji | See "How this Library Handles Emoji Width" below
|
49
33
|
-1 | `"\b"` | Backspace (total width never below 0)
|
50
34
|
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
|
51
35
|
1 | `"\u{00AD}"` | SOFT HYPHEN
|
52
36
|
2 | `"\u{2E3A}"` | TWO-EM DASH
|
53
37
|
3 | `"\u{2E3B}"` | THREE-EM DASH
|
54
|
-
0 | General Categories: Mn, Me, Cf (non-arabic)
|
55
|
-
0 |
|
56
|
-
0 | `"\u{
|
38
|
+
0 | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
|
39
|
+
0 | Derived Property: Default_Ignorable_Code_Point | Ignorable ranges
|
40
|
+
0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
|
57
41
|
2 | East Asian Width: F, W | Full-width characters
|
58
42
|
2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
|
59
43
|
1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1
|
@@ -71,8 +55,6 @@ Or add to your Gemfile:
|
|
71
55
|
|
72
56
|
## Usage
|
73
57
|
|
74
|
-
### Classic API
|
75
|
-
|
76
58
|
```ruby
|
77
59
|
require 'unicode/display_width'
|
78
60
|
|
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
|
|
80
62
|
Unicode::DisplayWidth.of("一") # => 2
|
81
63
|
```
|
82
64
|
|
83
|
-
|
65
|
+
### Ambiguous Characters
|
84
66
|
|
85
67
|
The second parameter defines the value returned by characters defined as ambiguous:
|
86
68
|
|
@@ -89,34 +71,66 @@ Unicode::DisplayWidth.of("·", 1) # => 1
|
|
89
71
|
Unicode::DisplayWidth.of("·", 2) # => 2
|
90
72
|
```
|
91
73
|
|
92
|
-
|
74
|
+
### Custom Overwrites
|
93
75
|
|
94
|
-
You can overwrite how to handle specific code points by passing a hash (or even a proc) as
|
76
|
+
You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
|
95
77
|
|
96
78
|
```ruby
|
97
|
-
Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # =>
|
79
|
+
Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
|
98
80
|
```
|
99
81
|
|
100
82
|
Please note that using overwrites disables some perfomance optimizations of this gem.
|
101
83
|
|
84
|
+
### Emoji
|
102
85
|
|
103
|
-
|
104
|
-
|
105
|
-
Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
|
86
|
+
If your terminal supports it, the gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
|
106
87
|
|
107
88
|
```ruby
|
108
|
-
|
109
|
-
|
89
|
+
Unicode::DisplayWidth.of "🤾🏽♀️", emoji: :all # => 2
|
90
|
+
Unicode::DisplayWidth.of "🤾🏽♀️", emoji: false # => 5
|
110
91
|
```
|
111
92
|
|
112
|
-
|
93
|
+
#### How this Library Handles Emoji Width
|
113
94
|
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
95
|
+
There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
|
96
|
+
|
97
|
+
Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
|
98
|
+
|
99
|
+
Emoji Type | Width / Comment
|
100
|
+
------------|----------------
|
101
|
+
Basic/Single Emoji character without Variation Selector | No special handling
|
102
|
+
Basic/Single Emoji character with VS15 (Text) | No special handling
|
103
|
+
Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
|
104
|
+
Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
|
105
|
+
|
106
|
+
#### Emoji Modes
|
107
|
+
|
108
|
+
The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
|
109
|
+
|
110
|
+
`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
|
111
|
+
----------------|------------------|---------------------------------|------------------
|
112
|
+
`true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | -
|
113
|
+
`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
|
114
|
+
`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
|
115
|
+
`:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ?
|
116
|
+
`:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
|
117
|
+
`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal
|
118
|
+
`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
|
119
|
+
`false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals
|
120
|
+
|
121
|
+
- *EAW:* East Asian Width
|
122
|
+
- *RGI Emoji:* Emoji Recommended for General Interchange
|
123
|
+
- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences
|
124
|
+
|
125
|
+
#### Emoji Support in Terminals
|
126
|
+
|
127
|
+
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
|
128
|
+
|
129
|
+
Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
|
130
|
+
|
131
|
+
**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
|
118
132
|
|
119
|
-
|
133
|
+
### Usage with String Extension
|
120
134
|
|
121
135
|
```ruby
|
122
136
|
require 'unicode/display_width/string_ext'
|
@@ -125,9 +139,9 @@ require 'unicode/display_width/string_ext'
|
|
125
139
|
'一'.display_width # => 2
|
126
140
|
```
|
127
141
|
|
128
|
-
###
|
142
|
+
### Usage with Config Object
|
129
143
|
|
130
|
-
|
144
|
+
You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
|
131
145
|
|
132
146
|
```ruby
|
133
147
|
require 'unicode/display_width'
|
@@ -135,15 +149,15 @@ require 'unicode/display_width'
|
|
135
149
|
display_width = Unicode::DisplayWidth.new(
|
136
150
|
# ambiguous: 1,
|
137
151
|
overwrite: { "A".ord => 100 },
|
138
|
-
emoji:
|
152
|
+
emoji: :all,
|
139
153
|
)
|
140
154
|
|
141
155
|
display_width.of "⚀" # => 1
|
142
|
-
display_width.of "
|
156
|
+
display_width.of "🤠🤢" # => 2
|
143
157
|
display_width.of "A" # => 100
|
144
158
|
```
|
145
159
|
|
146
|
-
### Usage
|
160
|
+
### Usage from the Command-Line
|
147
161
|
|
148
162
|
Use this one-liner to print out display widths for strings from the command-line:
|
149
163
|
|
Binary file
|
@@ -0,0 +1,52 @@
|
|
1
|
+
# require "rbconfig"
|
2
|
+
# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
|
3
|
+
|
4
|
+
module Unicode
|
5
|
+
class DisplayWidth
|
6
|
+
module EmojiSupport
|
7
|
+
# Tries to find out which terminal emulator is used to
|
8
|
+
# set emoji: config to best suiting value
|
9
|
+
#
|
10
|
+
# Please also see section in README.md and
|
11
|
+
# misc/terminal-emoji-width.rb
|
12
|
+
#
|
13
|
+
# Please note: Many terminals do not set any ENV vars,
|
14
|
+
# maybe CSI queries can help?
|
15
|
+
def self.recommended
|
16
|
+
if ENV["CI"]
|
17
|
+
return :rqi
|
18
|
+
end
|
19
|
+
|
20
|
+
case ENV["TERM_PROGRAM"]
|
21
|
+
when "iTerm.app"
|
22
|
+
return :all
|
23
|
+
when "Apple_Terminal"
|
24
|
+
return :rgi_at
|
25
|
+
when "WezTerm"
|
26
|
+
return :all_no_vs16
|
27
|
+
end
|
28
|
+
|
29
|
+
case ENV["TERM"]
|
30
|
+
when "contour","foot"
|
31
|
+
# konsole: all, how to detect?
|
32
|
+
return :all
|
33
|
+
when /kitty/
|
34
|
+
return :vs16
|
35
|
+
end
|
36
|
+
|
37
|
+
if ENV["WT_SESSION"] # Windows Terminal
|
38
|
+
return :vs16
|
39
|
+
end
|
40
|
+
|
41
|
+
# As of last time checked: gnome-terminal, vscode, alacritty
|
42
|
+
:none
|
43
|
+
end
|
44
|
+
|
45
|
+
# Maybe: Implement something like https://github.com/jquast/ucs-detect
|
46
|
+
# which uses the terminal cursor to check for best support level
|
47
|
+
# at runtime
|
48
|
+
# def self.detect!
|
49
|
+
# end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
# Experimental
|
2
|
+
# Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
|
3
|
+
|
4
|
+
require "reline"
|
5
|
+
require "reline/unicode"
|
6
|
+
|
7
|
+
require_relative "../display_width"
|
8
|
+
|
9
|
+
class Reline::Unicode
|
10
|
+
def self.get_mbchar_width(mbchar)
|
11
|
+
Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
@@ -1,9 +1,9 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require_relative "../display_width"
|
3
|
+
require_relative "../display_width"
|
4
4
|
|
5
5
|
class String
|
6
|
-
def display_width(ambiguous =
|
7
|
-
Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
|
6
|
+
def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
7
|
+
Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
|
8
8
|
end
|
9
9
|
end
|
@@ -1,122 +1,238 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require "unicode/emoji"
|
4
|
+
|
3
5
|
require_relative "display_width/constants"
|
4
6
|
require_relative "display_width/index"
|
7
|
+
require_relative "display_width/emoji_support"
|
5
8
|
|
6
9
|
module Unicode
|
7
10
|
class DisplayWidth
|
11
|
+
DEFAULT_AMBIGUOUS = 1
|
8
12
|
INITIAL_DEPTH = 0x10000
|
9
|
-
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
13
|
+
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/
|
14
|
+
ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F"
|
15
|
+
ASCII_BACKSPACE = "\b"
|
16
|
+
AMBIGUOUS_MAP = {
|
17
|
+
1 => :WIDTH_ONE,
|
18
|
+
2 => :WIDTH_TWO,
|
19
|
+
}
|
20
|
+
FIRST_AMBIGUOUS = {
|
21
|
+
WIDTH_ONE: 768,
|
22
|
+
WIDTH_TWO: 161,
|
23
|
+
}
|
24
|
+
NOT_COMMON_NARROW_REGEX = {
|
25
|
+
WIDTH_ONE: /[^\u{10}-\u{2FF}]/m,
|
26
|
+
WIDTH_TWO: /[^\u{10}-\u{A1}]/m,
|
27
|
+
}
|
28
|
+
FIRST_4096 = {
|
29
|
+
WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
|
30
|
+
WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
|
31
|
+
}
|
32
|
+
EMOJI_SEQUENCES_REGEX_MAPPING = {
|
33
|
+
rgi: :REGEX_INCLUDE_MQE_UQE,
|
34
|
+
rgi_at: :REGEX_INCLUDE_MQE_UQE,
|
35
|
+
possible: :REGEX_WELL_FORMED,
|
36
|
+
}
|
37
|
+
REGEX_EMOJI_VS16 = Regexp.union(
|
38
|
+
Regexp.compile(
|
39
|
+
Unicode::Emoji::REGEX_TEXT_PRESENTATION.source +
|
40
|
+
"(?<![#*0-9])" +
|
41
|
+
"\u{FE0F}"
|
42
|
+
),
|
43
|
+
Unicode::Emoji::REGEX_EMOJI_KEYCAP
|
44
|
+
)
|
45
|
+
REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
|
46
|
+
REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16)
|
47
|
+
|
48
|
+
# Returns monospace display width of string
|
49
|
+
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
50
|
+
string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8
|
51
|
+
options = normalize_options(string, ambiguous, overwrite, old_options, **options)
|
52
|
+
|
53
|
+
width = 0
|
54
|
+
|
55
|
+
unless options[:overwrite].empty?
|
56
|
+
width, string = width_custom(string, options[:overwrite])
|
57
|
+
end
|
58
|
+
|
59
|
+
if string.ascii_only?
|
60
|
+
return width + width_ascii(string)
|
61
|
+
end
|
62
|
+
|
63
|
+
ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
|
64
|
+
|
65
|
+
unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
|
66
|
+
return width + string.size
|
67
|
+
end
|
68
|
+
|
69
|
+
# Retrieve Emoji width
|
70
|
+
if options[:emoji] != :none
|
71
|
+
e_width, string = emoji_width(
|
72
|
+
string,
|
73
|
+
options[:emoji],
|
74
|
+
options[:ambiguous],
|
75
|
+
)
|
76
|
+
width += e_width
|
77
|
+
|
78
|
+
unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name])
|
79
|
+
return width + string.size
|
24
80
|
end
|
25
|
-
else
|
26
|
-
width_all_features(string, ambiguous, overwrite, options)
|
27
81
|
end
|
28
|
-
end
|
29
82
|
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
83
|
+
index_full = INDEX[ambiguous_index_name]
|
84
|
+
index_low = FIRST_4096[ambiguous_index_name]
|
85
|
+
first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name]
|
86
|
+
|
87
|
+
string.each_codepoint{ |codepoint|
|
88
|
+
if codepoint > 15 && codepoint < first_ambiguous
|
89
|
+
width += 1
|
35
90
|
elsif codepoint < 0x1001
|
36
|
-
width
|
91
|
+
width += index_low[codepoint] || 1
|
37
92
|
else
|
38
|
-
|
39
|
-
|
40
|
-
while
|
41
|
-
codepoint %=
|
42
|
-
depth /= 16
|
93
|
+
d = INITIAL_DEPTH
|
94
|
+
w = index_full[codepoint / d]
|
95
|
+
while w.instance_of? Array
|
96
|
+
w = w[(codepoint %= d) / (d /= 16)]
|
43
97
|
end
|
44
|
-
end
|
45
98
|
|
46
|
-
|
99
|
+
width += w || 1
|
100
|
+
end
|
47
101
|
}
|
48
102
|
|
49
|
-
# Substract emoji error
|
50
|
-
res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
|
51
|
-
|
52
103
|
# Return result + prevent negative lengths
|
53
|
-
|
104
|
+
width < 0 ? 0 : width
|
54
105
|
end
|
55
106
|
|
56
|
-
#
|
57
|
-
def self.
|
58
|
-
|
59
|
-
res = string.codepoints.sum{ |codepoint|
|
60
|
-
next overwrite[codepoint] if overwrite[codepoint]
|
107
|
+
# Returns width of custom overwrites and remaining string
|
108
|
+
def self.width_custom(string, overwrite)
|
109
|
+
width = 0
|
61
110
|
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
111
|
+
string = string.each_codepoint.select{ |codepoint|
|
112
|
+
if overwrite[codepoint]
|
113
|
+
width += overwrite[codepoint]
|
114
|
+
nil
|
66
115
|
else
|
67
|
-
|
68
|
-
depth = INITIAL_DEPTH
|
69
|
-
while (width = width[codepoint / depth]).instance_of? Array
|
70
|
-
codepoint %= depth
|
71
|
-
depth /= 16
|
72
|
-
end
|
116
|
+
codepoint
|
73
117
|
end
|
118
|
+
}.pack("U*")
|
74
119
|
|
75
|
-
|
76
|
-
|
120
|
+
[width, string]
|
121
|
+
end
|
77
122
|
|
78
|
-
|
79
|
-
|
123
|
+
# Returns width for ASCII-only strings. Will consider zero-width control symbols.
|
124
|
+
def self.width_ascii(string)
|
125
|
+
if string.match?(ASCII_NON_ZERO_REGEX)
|
126
|
+
res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE)
|
127
|
+
return res < 0 ? 0 : res
|
128
|
+
end
|
80
129
|
|
81
|
-
|
82
|
-
res < 0 ? 0 : res
|
130
|
+
string.bytesize
|
83
131
|
end
|
84
132
|
|
133
|
+
# Returns width of all considered Emoji and remaining string
|
134
|
+
def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS)
|
135
|
+
res = 0
|
136
|
+
|
137
|
+
if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode]
|
138
|
+
emoji_width_via_possible(
|
139
|
+
string,
|
140
|
+
Unicode::Emoji.const_get(emoji_set_regex),
|
141
|
+
mode == :rgi_at,
|
142
|
+
ambiguous,
|
143
|
+
)
|
85
144
|
|
86
|
-
|
87
|
-
|
145
|
+
elsif mode == :all_no_vs16
|
146
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ res += 2; "" }
|
147
|
+
[res, no_emoji_string]
|
88
148
|
|
89
|
-
|
90
|
-
|
91
|
-
|
149
|
+
elsif mode == :vs16
|
150
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_VS16){ res += 2; "" }
|
151
|
+
[res, no_emoji_string]
|
92
152
|
|
93
|
-
|
94
|
-
|
153
|
+
elsif mode == :all
|
154
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ res += 2; "" }
|
155
|
+
[res, no_emoji_string]
|
95
156
|
|
96
|
-
|
97
|
-
|
98
|
-
|
157
|
+
else
|
158
|
+
[0, string]
|
159
|
+
|
160
|
+
end
|
161
|
+
end
|
162
|
+
|
163
|
+
# Match possible Emoji first, then refine
|
164
|
+
def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS)
|
165
|
+
res = 0
|
166
|
+
|
167
|
+
# For each string possibly an emoji
|
168
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate|
|
169
|
+
# Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal)
|
170
|
+
if emoji_candidate == emoji_candidate[emoji_set_regex]
|
171
|
+
if strict_eaw
|
172
|
+
res += self.of(emoji_candidate[0], ambiguous, emoji: false)
|
173
|
+
else
|
174
|
+
res += 2
|
175
|
+
end
|
176
|
+
""
|
177
|
+
|
178
|
+
# We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
|
179
|
+
else
|
180
|
+
if !strict_eaw
|
181
|
+
# Ensure all explicit VS16 sequences have width 2
|
182
|
+
emoji_candidate.gsub!(REGEX_EMOJI_VS16){ res += 2; "" }
|
183
|
+
end
|
184
|
+
|
185
|
+
emoji_candidate
|
186
|
+
end
|
99
187
|
}
|
100
188
|
|
101
|
-
|
189
|
+
[res, no_emoji_string]
|
102
190
|
end
|
103
191
|
|
104
|
-
def
|
192
|
+
def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
193
|
+
unless old_options.empty?
|
194
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
|
195
|
+
options.merge! old_options
|
196
|
+
end
|
197
|
+
|
198
|
+
options[:ambiguous] = ambiguous if ambiguous
|
199
|
+
options[:ambiguous] ||= DEFAULT_AMBIGUOUS
|
200
|
+
|
201
|
+
if options[:ambiguous] != 1 && options[:ambiguous] != 2
|
202
|
+
raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
|
203
|
+
end
|
204
|
+
|
205
|
+
if overwrite && !overwrite.empty?
|
206
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
|
207
|
+
options[:overwrite] = overwrite
|
208
|
+
end
|
209
|
+
options[:overwrite] ||= {}
|
210
|
+
|
211
|
+
if [nil, true, :auto].include?(options[:emoji])
|
212
|
+
options[:emoji] = EmojiSupport.recommended
|
213
|
+
elsif options[:emoji] == false
|
214
|
+
options[:emoji] = :none
|
215
|
+
end
|
216
|
+
|
217
|
+
options
|
218
|
+
end
|
219
|
+
|
220
|
+
def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true)
|
105
221
|
@ambiguous = ambiguous
|
106
222
|
@overwrite = overwrite
|
107
223
|
@emoji = emoji
|
108
224
|
end
|
109
225
|
|
110
226
|
def get_config(**kwargs)
|
111
|
-
|
112
|
-
kwargs[:ambiguous] || @ambiguous,
|
113
|
-
kwargs[:overwrite] || @overwrite,
|
114
|
-
|
115
|
-
|
227
|
+
{
|
228
|
+
ambiguous: kwargs[:ambiguous] || @ambiguous,
|
229
|
+
overwrite: kwargs[:overwrite] || @overwrite,
|
230
|
+
emoji: kwargs[:emoji] || @emoji,
|
231
|
+
}
|
116
232
|
end
|
117
233
|
|
118
234
|
def of(string, **kwargs)
|
119
|
-
self.class.of(string,
|
235
|
+
self.class.of(string, **get_config(**kwargs))
|
120
236
|
end
|
121
237
|
end
|
122
238
|
end
|
metadata
CHANGED
@@ -1,15 +1,35 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 3.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-11-20 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: unicode-emoji
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '4.0'
|
20
|
+
- - ">="
|
21
|
+
- !ruby/object:Gem::Version
|
22
|
+
version: 4.0.4
|
23
|
+
type: :runtime
|
24
|
+
prerelease: false
|
25
|
+
version_requirements: !ruby/object:Gem::Requirement
|
26
|
+
requirements:
|
27
|
+
- - "~>"
|
28
|
+
- !ruby/object:Gem::Version
|
29
|
+
version: '4.0'
|
30
|
+
- - ">="
|
31
|
+
- !ruby/object:Gem::Version
|
32
|
+
version: 4.0.4
|
13
33
|
- !ruby/object:Gem::Dependency
|
14
34
|
name: rspec
|
15
35
|
requirement: !ruby/object:Gem::Requirement
|
@@ -39,7 +59,8 @@ dependencies:
|
|
39
59
|
- !ruby/object:Gem::Version
|
40
60
|
version: '13.0'
|
41
61
|
description: "[Unicode 16.0.0] Determines the monospace display width of a string
|
42
|
-
using EastAsianWidth.txt, Unicode general category, and other
|
62
|
+
using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
|
63
|
+
data."
|
43
64
|
email:
|
44
65
|
- hi@ruby.consulting
|
45
66
|
executables: []
|
@@ -55,8 +76,10 @@ files:
|
|
55
76
|
- data/display_width.marshal.gz
|
56
77
|
- lib/unicode/display_width.rb
|
57
78
|
- lib/unicode/display_width/constants.rb
|
79
|
+
- lib/unicode/display_width/emoji_support.rb
|
58
80
|
- lib/unicode/display_width/index.rb
|
59
81
|
- lib/unicode/display_width/no_string_ext.rb
|
82
|
+
- lib/unicode/display_width/reline_ext.rb
|
60
83
|
- lib/unicode/display_width/string_ext.rb
|
61
84
|
homepage: https://github.com/janlelis/unicode-display_width
|
62
85
|
licenses:
|
@@ -74,14 +97,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
74
97
|
requirements:
|
75
98
|
- - ">="
|
76
99
|
- !ruby/object:Gem::Version
|
77
|
-
version: 2.
|
100
|
+
version: 2.5.0
|
78
101
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
79
102
|
requirements:
|
80
103
|
- - ">="
|
81
104
|
- !ruby/object:Gem::Version
|
82
105
|
version: '0'
|
83
106
|
requirements: []
|
84
|
-
rubygems_version: 3.5.
|
107
|
+
rubygems_version: 3.5.21
|
85
108
|
signing_key:
|
86
109
|
specification_version: 4
|
87
110
|
summary: Determines the monospace display width of a string in Ruby.
|