unicode-display_width 2.6.0 → 3.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +49 -2
- data/README.md +68 -50
- data/data/display_width.marshal.gz +0 -0
- data/lib/unicode/display_width/constants.rb +1 -1
- data/lib/unicode/display_width/emoji_support.rb +41 -0
- data/lib/unicode/display_width/reline_ext.rb +14 -0
- data/lib/unicode/display_width/string_ext.rb +3 -3
- data/lib/unicode/display_width.rb +191 -73
- metadata +22 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8ee4f0ac31dae0855f4de659fac788fb36a7298cc5a69cd2b4e104a709bab351
|
4
|
+
data.tar.gz: 5928bdbfd92df1baba4249fca92302240dfb4ad90579248085c1f5e103c3fc0d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e5af487be1d49d54f383cd8fc5cc0ea384714297537f5e23ef37f457bea5ff80a469e954ecdbf345eb6e966b29aad2f9557af0986a11642d66e21e9bf8309603
|
7
|
+
data.tar.gz: 7e6441597613b829540389e36b6d51415858ba05222ba14c33dec37c883d125a96d6566f3e4f65b666b7de89e9833e568fcb33e8a1a5f2f9ba5cb69616e63b0f
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,34 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 3.0.0
|
4
|
+
|
5
|
+
**Rework Emoji support:**
|
6
|
+
|
7
|
+
- Emoji widths are now enabled by default
|
8
|
+
- Only reduce Emoji width to 2 when RGI Emoji detected (configurable)
|
9
|
+
- VS16 turns Emoji characters of width 1 into full-width
|
10
|
+
- Please note that Emoji parsing has a notable impact on performance.
|
11
|
+
You can use the `emoji: false` option to disable Emoji adjustments
|
12
|
+
- Tries to detect terminal's Emoji support level automatically (from ENV vars)
|
13
|
+
|
14
|
+
**Index fixes and updates:**
|
15
|
+
|
16
|
+
- Private-use characters are considered ambiguous (were given width 1 before)
|
17
|
+
- Fix that a few zero-width ignorable codepoints from recent Unicode were missing
|
18
|
+
- Consider the following separators to be zero-width:
|
19
|
+
- U+2028 - LINE SEPARATOR - Zl
|
20
|
+
- U+2029 - PARAGRAPH SEPARATOR - Zp
|
21
|
+
|
22
|
+
**Other:**
|
23
|
+
|
24
|
+
- Add keyword arguments to `Unicode::DisplayWidth.of`. If you are using a hash
|
25
|
+
with overwrite values as third parameter, be sure to put it in curly braces.
|
26
|
+
- Using third parameter or explicit hash as fourth parameter is deprecated,
|
27
|
+
please migrate to the keyword arguments API
|
28
|
+
- Gem raises `ArgumentError` for ambiguous values other than 1 or 2
|
29
|
+
- Performance optimizations
|
30
|
+
- Require Ruby 2.5
|
31
|
+
|
3
32
|
## 2.6.0
|
4
33
|
|
5
34
|
- Unicode 16
|
@@ -40,8 +69,26 @@ More performance improvements:
|
|
40
69
|
|
41
70
|
## 2.0.0
|
42
71
|
|
43
|
-
|
44
|
-
|
72
|
+
Add Support for Ruby 3.0
|
73
|
+
|
74
|
+
### Breaking Changes
|
75
|
+
|
76
|
+
Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
|
77
|
+
|
78
|
+
- Aliases of display\_width (…\_size, …\_length) have been removed
|
79
|
+
- Auto-loading of string core extension has been removed:
|
80
|
+
|
81
|
+
If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
|
82
|
+
|
83
|
+
```ruby
|
84
|
+
require "unicode/display_width/string_ext"
|
85
|
+
```
|
86
|
+
|
87
|
+
You could also change your `Gemfile` line to achieve this:
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
gem "unicode-display_width", require: "unicode/display_width/string_ext"
|
91
|
+
```
|
45
92
|
|
46
93
|
## 2.0.0.pre2
|
47
94
|
|
data/README.md
CHANGED
@@ -1,39 +1,22 @@
|
|
1
|
-
|
1
|
+
# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
|
2
2
|
|
3
|
-
Determines the monospace display width of a string in Ruby
|
3
|
+
Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
|
4
4
|
|
5
5
|
Unicode version: **16.0.0** (September 2024)
|
6
6
|
|
7
|
-
|
7
|
+
## Gem Version 3.0 — Improved Emoji Support
|
8
8
|
|
9
|
-
|
9
|
+
**Emoji support is now enabled by default.** See below for description and configuration possibilities.
|
10
10
|
|
11
|
-
|
11
|
+
**Unicode::DisplayWidth.of now takes keyword arguments:** { ambiguous:, emoji:, overwrite: }
|
12
12
|
|
13
|
-
|
13
|
+
See [CHANGELOG](/CHANGELOG.md) for details.
|
14
14
|
|
15
|
-
|
16
|
-
|
17
|
-
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
|
18
|
-
|
19
|
-
## Version 2.0 — Breaking Changes
|
20
|
-
|
21
|
-
Some features of this library were marked deprecated for a long time and have been removed with Version 2.0:
|
15
|
+
## Gem Version 2.4.2 — Performance Updates
|
22
16
|
|
23
|
-
|
24
|
-
- Auto-loading of string core extension has been removed:
|
25
|
-
|
26
|
-
If you are relying on the `String#display_width` string extension to be automatically loaded (old behavior), please load it explicitly now:
|
27
|
-
|
28
|
-
```ruby
|
29
|
-
require "unicode/display_width/string_ext"
|
30
|
-
```
|
31
|
-
|
32
|
-
You could also change your `Gemfile` line to achieve this:
|
17
|
+
**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
|
33
18
|
|
34
|
-
|
35
|
-
gem "unicode-display_width", require: "unicode/display_width/string_ext"
|
36
|
-
```
|
19
|
+
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the character width lookup code has been optimized, so even when the string involves full-width or ambiguous characters, the gem is much faster now.
|
37
20
|
|
38
21
|
## Introduction to Character Widths
|
39
22
|
|
@@ -45,15 +28,16 @@ Further at the top means higher precedence. Please expect changes to this algori
|
|
45
28
|
|
46
29
|
Width | Characters | Comment
|
47
30
|
-------|------------------------------|--------------------------------------------------
|
48
|
-
|
31
|
+
? | (user defined) | Overwrites any other values
|
32
|
+
? | Emoji | See "How this Library Handles Emoji Width" below
|
49
33
|
-1 | `"\b"` | Backspace (total width never below 0)
|
50
34
|
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
|
51
35
|
1 | `"\u{00AD}"` | SOFT HYPHEN
|
52
36
|
2 | `"\u{2E3A}"` | TWO-EM DASH
|
53
37
|
3 | `"\u{2E3B}"` | THREE-EM DASH
|
54
|
-
0 | General Categories: Mn, Me, Cf (non-arabic)
|
55
|
-
0 |
|
56
|
-
0 | `"\u{
|
38
|
+
0 | General Categories: Mn, Me, Zl, Zp, Cf (non-arabic)| Excludes ARABIC format characters
|
39
|
+
0 | Derived Property: Default_Ignorable_Code_Point | Ignorable ranges
|
40
|
+
0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
|
57
41
|
2 | East Asian Width: F, W | Full-width characters
|
58
42
|
2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
|
59
43
|
1 or 2 | East Asian Width: A | Ambiguous characters, user defined, default: 1
|
@@ -71,8 +55,6 @@ Or add to your Gemfile:
|
|
71
55
|
|
72
56
|
## Usage
|
73
57
|
|
74
|
-
### Classic API
|
75
|
-
|
76
58
|
```ruby
|
77
59
|
require 'unicode/display_width'
|
78
60
|
|
@@ -80,7 +62,7 @@ Unicode::DisplayWidth.of("⚀") # => 1
|
|
80
62
|
Unicode::DisplayWidth.of("一") # => 2
|
81
63
|
```
|
82
64
|
|
83
|
-
|
65
|
+
### Ambiguous Characters
|
84
66
|
|
85
67
|
The second parameter defines the value returned by characters defined as ambiguous:
|
86
68
|
|
@@ -89,34 +71,70 @@ Unicode::DisplayWidth.of("·", 1) # => 1
|
|
89
71
|
Unicode::DisplayWidth.of("·", 2) # => 2
|
90
72
|
```
|
91
73
|
|
92
|
-
|
74
|
+
### Custom Overwrites
|
93
75
|
|
94
|
-
You can overwrite how to handle specific code points by passing a hash (or even a proc) as
|
76
|
+
You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
|
95
77
|
|
96
78
|
```ruby
|
97
|
-
Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # =>
|
79
|
+
Unicode::DisplayWidth.of("a\tb", 1, overwrite: { "\t".ord => 10 })) # => TAB counted as 10, result is 12
|
98
80
|
```
|
99
81
|
|
100
82
|
Please note that using overwrites disables some perfomance optimizations of this gem.
|
101
83
|
|
84
|
+
### Emoji Option
|
102
85
|
|
103
|
-
|
104
|
-
|
105
|
-
Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
|
86
|
+
The gem detects Emoji and Emoji sequences and adjusts the width of the measured string. This can be disabled by passing `emoji: false` as an argument:
|
106
87
|
|
107
88
|
```ruby
|
108
|
-
|
109
|
-
|
89
|
+
Unicode::DisplayWidth.of "🤾🏽♀️" # => 2
|
90
|
+
Unicode::DisplayWidth.of "🤾🏽♀️", emoji: false # => 5
|
110
91
|
```
|
111
92
|
|
112
|
-
|
93
|
+
Disabling Emoji support yields wrong results, as illustrated in the example above, but increases performance of display width calculation. You can configure [the Emoji set to match for](https://www.unicode.org/reports/tr51/#def_rgi_set) by passing a symbol as value:
|
113
94
|
|
114
95
|
```ruby
|
115
|
-
Unicode::DisplayWidth.of "
|
116
|
-
Unicode::DisplayWidth.of "
|
96
|
+
Unicode::DisplayWidth.of "🐻❄", emoji: :rgi_mqe # => 3
|
97
|
+
Unicode::DisplayWidth.of "🐻❄", emoji: :rgi_uqe # => 2
|
117
98
|
```
|
118
99
|
|
119
|
-
####
|
100
|
+
#### How this Library Handles Emoji Width
|
101
|
+
|
102
|
+
There are many Emoji which get constructed by combining other Emoji in a sequence. This makes measuring the width complicated, since terminals might either display the combined Emoji or the separate parts of the Emoji individually.
|
103
|
+
|
104
|
+
Emoji Type | Width / Comment
|
105
|
+
------------|----------------
|
106
|
+
Basic/Single Emoji character without Variation Selector | No special handling, uses mechanism from table above
|
107
|
+
Basic/Single Emoji character with VS15 (Text) | No special handling, uses mechanism from table above
|
108
|
+
Basic/Single Emoji character with VS16 (Emoji) | 2
|
109
|
+
Emoji Sequence | 2 (only if sequence belongs to configured Emoji set)
|
110
|
+
|
111
|
+
The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji sets can be used:
|
112
|
+
|
113
|
+
Option | Descriptions
|
114
|
+
-------|-------------
|
115
|
+
`emoji: true` | Use recommended Emoji set on your platform, see section below
|
116
|
+
`emoji: :basic` | No width adjustments for Emoji sequences: all partial Emoji treated separately
|
117
|
+
`emoji: :rgi_fqe` | All fully-qualified RGI Emoji sequences are considered to have a width of 2
|
118
|
+
`emoji: :rgi_mqe` | All fully- and minimally-qualified RGI Emoji sequences are considered to have a width of 2
|
119
|
+
`emoji: :rgi_uqe` | All RGI Emoji sequences, regardless of qualification status are considered to have a width of 2
|
120
|
+
`emoji: :all` | All possible/well-formed Emoji sequences are considered to have a width of 2
|
121
|
+
`emoji: false` | No Emoji adjustments, Emoji characters with VS16 not handled
|
122
|
+
|
123
|
+
*RGI Emoji:* Emoji Recommended for General Interchange
|
124
|
+
|
125
|
+
*Qualification:* Whether an Emoji sequence has all required VS16 codepoints
|
126
|
+
|
127
|
+
See [emoji-test.txt](https://www.unicode.org/Public/emoji/16.0/emoji-test.txt), the [unicode-emoji gem](https://github.com/janlelis/unicode-emoji) and [UTS-51](https://www.unicode.org/reports/tr51/#def_qualified_emoji_character) for more details about qualified and unqualified Emoji sequences.
|
128
|
+
|
129
|
+
#### Emoji Support in Terminals
|
130
|
+
|
131
|
+
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` is used, the gem will attempt to set the best fitting Emoji set for you (e.g. `:rgi_uqe` on "Apple_Terminal" or `:basic` on Gnome's terminal widget).
|
132
|
+
|
133
|
+
Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value.
|
134
|
+
|
135
|
+
You are encouraged to give your users the option to configure the level of Emoji support in your library or application and for the best developer experience in their terminals. (same is true for ambigouos width).
|
136
|
+
|
137
|
+
### Usage with String Extension
|
120
138
|
|
121
139
|
```ruby
|
122
140
|
require 'unicode/display_width/string_ext'
|
@@ -125,9 +143,9 @@ require 'unicode/display_width/string_ext'
|
|
125
143
|
'一'.display_width # => 2
|
126
144
|
```
|
127
145
|
|
128
|
-
###
|
146
|
+
### Usage with Config Object
|
129
147
|
|
130
|
-
|
148
|
+
You can use a config object that allows you to save your configuration for later-reuse. This requires an extra line of code, but has the advantage that you'll need to define your string-width options only once:
|
131
149
|
|
132
150
|
```ruby
|
133
151
|
require 'unicode/display_width'
|
@@ -135,15 +153,15 @@ require 'unicode/display_width'
|
|
135
153
|
display_width = Unicode::DisplayWidth.new(
|
136
154
|
# ambiguous: 1,
|
137
155
|
overwrite: { "A".ord => 100 },
|
138
|
-
emoji:
|
156
|
+
emoji: :all,
|
139
157
|
)
|
140
158
|
|
141
159
|
display_width.of "⚀" # => 1
|
142
|
-
display_width.of "
|
160
|
+
display_width.of "🤠🤢" # => 2
|
143
161
|
display_width.of "A" # => 100
|
144
162
|
```
|
145
163
|
|
146
|
-
### Usage
|
164
|
+
### Usage from the Command-Line
|
147
165
|
|
148
166
|
Use this one-liner to print out display widths for strings from the command-line:
|
149
167
|
|
Binary file
|
@@ -0,0 +1,41 @@
|
|
1
|
+
# require "rbconfig"
|
2
|
+
# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows
|
3
|
+
|
4
|
+
module Unicode
|
5
|
+
class DisplayWidth
|
6
|
+
module EmojiSupport
|
7
|
+
# Tries to find out which terminal emulator is used to
|
8
|
+
# set emoji: config to best suiting value
|
9
|
+
#
|
10
|
+
# Please note: Many terminals do not set any ENV vars
|
11
|
+
def self.recommended
|
12
|
+
if ENV["CI"]
|
13
|
+
return :rqi_uqe
|
14
|
+
end
|
15
|
+
|
16
|
+
case ENV["TERM_PROGRAM"]
|
17
|
+
when "iTerm.app"
|
18
|
+
return :all
|
19
|
+
when "Apple_Terminal"
|
20
|
+
return :rgi_uqe
|
21
|
+
end
|
22
|
+
|
23
|
+
case ENV["TERM"]
|
24
|
+
when "contour"
|
25
|
+
return :rgi_uqe
|
26
|
+
when /kitty/
|
27
|
+
return :rgi_fqe
|
28
|
+
end
|
29
|
+
|
30
|
+
# As of last time checked: gnome-terminal, vscode, alacritty, konsole
|
31
|
+
:basic
|
32
|
+
end
|
33
|
+
|
34
|
+
# Maybe: Implement something like https://github.com/jquast/ucs-detect
|
35
|
+
# which uses the terminal cursor to check for best support level
|
36
|
+
# at runtime
|
37
|
+
# def self.detect!
|
38
|
+
# end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
# Experimental
|
2
|
+
# Patches Reline's get_mbchar_width to use Unicode::DisplayWidth
|
3
|
+
|
4
|
+
require "reline"
|
5
|
+
require "reline/unicode"
|
6
|
+
|
7
|
+
require_relative "../display_width"
|
8
|
+
|
9
|
+
class Reline::Unicode
|
10
|
+
def self.get_mbchar_width(mbchar)
|
11
|
+
Unicode::DisplayWidth.of(mbchar, Reline.ambiguous_width)
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
@@ -1,9 +1,9 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require_relative "../display_width"
|
3
|
+
require_relative "../display_width"
|
4
4
|
|
5
5
|
class String
|
6
|
-
def display_width(ambiguous =
|
7
|
-
Unicode::DisplayWidth.of(self, ambiguous, overwrite, options)
|
6
|
+
def display_width(ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
7
|
+
Unicode::DisplayWidth.of(self, ambiguous, overwrite, old_options = {}, **options)
|
8
8
|
end
|
9
9
|
end
|
@@ -1,122 +1,240 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require "unicode/emoji"
|
4
|
+
|
3
5
|
require_relative "display_width/constants"
|
4
6
|
require_relative "display_width/index"
|
7
|
+
require_relative "display_width/emoji_support"
|
5
8
|
|
6
9
|
module Unicode
|
7
10
|
class DisplayWidth
|
8
11
|
INITIAL_DEPTH = 0x10000
|
9
12
|
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
13
|
+
ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F"
|
14
|
+
ASCII_BACKSPACE = "\b"
|
15
|
+
AMBIGUOUS_MAP = {
|
16
|
+
1 => :WIDTH_ONE,
|
17
|
+
2 => :WIDTH_TWO,
|
18
|
+
}
|
19
|
+
FIRST_AMBIGUOUS = {
|
20
|
+
WIDTH_ONE: 768,
|
21
|
+
WIDTH_TWO: 161,
|
22
|
+
}
|
23
|
+
FIRST_4096 = {
|
24
|
+
WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1),
|
25
|
+
WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1),
|
26
|
+
}
|
27
|
+
EMOJI_SEQUENCES_REGEX_MAPPING = {
|
28
|
+
rgi_fqe: :REGEX,
|
29
|
+
rgi_mqe: :REGEX_INCLUDE_MQE,
|
30
|
+
rgi_uqe: :REGEX_INCLUDE_MQE_UQE,
|
31
|
+
all: :REGEX_WELL_FORMED,
|
32
|
+
}
|
33
|
+
EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/
|
34
|
+
REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
|
35
|
+
|
36
|
+
# Returns monospace display width of string
|
37
|
+
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
|
38
|
+
unless old_options.empty?
|
39
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}"
|
40
|
+
options.merge! old_options
|
41
|
+
end
|
42
|
+
|
43
|
+
options[:ambiguous] = ambiguous if ambiguous
|
44
|
+
options[:ambiguous] ||= 1
|
45
|
+
|
46
|
+
if options[:ambiguous] != 1 && options[:ambiguous] != 2
|
47
|
+
raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2"
|
48
|
+
end
|
49
|
+
|
50
|
+
if overwrite && !overwrite.empty?
|
51
|
+
warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}"
|
52
|
+
options[:overwrite] = overwrite
|
53
|
+
end
|
54
|
+
options[:overwrite] ||= {}
|
55
|
+
|
56
|
+
if options[:emoji] == nil || options[:emoji] == true
|
57
|
+
options[:emoji] = EmojiSupport.recommended
|
58
|
+
end
|
59
|
+
|
60
|
+
# # #
|
61
|
+
|
62
|
+
if !options[:overwrite].empty?
|
63
|
+
return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
|
64
|
+
width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite])
|
24
65
|
end
|
25
|
-
else
|
26
|
-
width_all_features(string, ambiguous, overwrite, options)
|
27
66
|
end
|
28
|
-
end
|
29
67
|
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
if codepoint > 15 && codepoint < 161 # very common
|
34
|
-
next 1
|
35
|
-
elsif codepoint < 0x1001
|
36
|
-
width = FIRST_4096[codepoint]
|
37
|
-
else
|
38
|
-
width = INDEX
|
39
|
-
depth = INITIAL_DEPTH
|
40
|
-
while (width = width[codepoint / depth]).instance_of? Array
|
41
|
-
codepoint %= depth
|
42
|
-
depth /= 16
|
43
|
-
end
|
68
|
+
if !string.ascii_only?
|
69
|
+
return width_frame(string, options) do |string, index_full, index_low, first_ambiguous|
|
70
|
+
width_no_overwrite(string, index_full, index_low, first_ambiguous)
|
44
71
|
end
|
72
|
+
end
|
45
73
|
|
46
|
-
|
47
|
-
|
74
|
+
width_ascii(string)
|
75
|
+
end
|
76
|
+
|
77
|
+
def self.width_ascii(string)
|
78
|
+
# Optimization for ASCII-only strings without certain control symbols
|
79
|
+
if string.match?(ASCII_NON_ZERO_REGEX)
|
80
|
+
res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE)
|
81
|
+
return res < 0 ? 0 : res
|
82
|
+
end
|
83
|
+
|
84
|
+
# Pure ASCII
|
85
|
+
string.size
|
86
|
+
end
|
87
|
+
|
88
|
+
def self.width_frame(string, options)
|
89
|
+
# Retrieve Emoji width
|
90
|
+
if !options[:emoji]
|
91
|
+
res = 0
|
92
|
+
else options[:emoji]
|
93
|
+
res, string = emoji_width(
|
94
|
+
string,
|
95
|
+
options[:emoji],
|
96
|
+
)
|
97
|
+
end
|
48
98
|
|
49
|
-
#
|
50
|
-
|
99
|
+
# Prepare indexes
|
100
|
+
ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]]
|
101
|
+
|
102
|
+
# Get general width
|
103
|
+
res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name])
|
51
104
|
|
52
105
|
# Return result + prevent negative lengths
|
53
106
|
res < 0 ? 0 : res
|
54
107
|
end
|
55
108
|
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
109
|
+
def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {})
|
110
|
+
res = 0
|
111
|
+
|
112
|
+
# Make sure we have UTF-8
|
113
|
+
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
|
114
|
+
|
115
|
+
string.scan(/.{,80}/m){ |batch|
|
116
|
+
if batch.ascii_only?
|
117
|
+
res += batch.size
|
118
|
+
else
|
119
|
+
batch.each_codepoint{ |codepoint|
|
120
|
+
if codepoint > 15 && codepoint < first_ambiguous
|
121
|
+
res += 1
|
122
|
+
elsif codepoint < 0x1001
|
123
|
+
res += index_low[codepoint] || 1
|
124
|
+
else
|
125
|
+
d = INITIAL_DEPTH
|
126
|
+
w = index_full[codepoint / d]
|
127
|
+
while w.instance_of? Array
|
128
|
+
w = w[(codepoint %= d) / (d /= 16)]
|
129
|
+
end
|
130
|
+
|
131
|
+
res += w || 1
|
132
|
+
end
|
133
|
+
}
|
134
|
+
end
|
135
|
+
}
|
61
136
|
|
62
|
-
|
63
|
-
|
137
|
+
res
|
138
|
+
end
|
139
|
+
|
140
|
+
# Same as .width_no_overwrite - but with applying overwrites for each char
|
141
|
+
def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite)
|
142
|
+
res = 0
|
143
|
+
|
144
|
+
string.each_codepoint{ |codepoint|
|
145
|
+
if overwrite[codepoint]
|
146
|
+
res += overwrite[codepoint]
|
147
|
+
elsif codepoint > 15 && codepoint < first_ambiguous
|
148
|
+
res += 1
|
64
149
|
elsif codepoint < 0x1001
|
65
|
-
|
150
|
+
res += index_low[codepoint] || 1
|
66
151
|
else
|
67
|
-
|
68
|
-
|
69
|
-
while
|
70
|
-
codepoint %=
|
71
|
-
depth /= 16
|
152
|
+
d = INITIAL_DEPTH
|
153
|
+
w = index_full[codepoint / d]
|
154
|
+
while w.instance_of? Array
|
155
|
+
w = w[(codepoint %= d) / (d /= 16)]
|
72
156
|
end
|
73
|
-
end
|
74
157
|
|
75
|
-
|
158
|
+
res += w || 1
|
159
|
+
end
|
76
160
|
}
|
77
161
|
|
78
|
-
|
79
|
-
res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
|
80
|
-
|
81
|
-
# Return result + prevent negative lengths
|
82
|
-
res < 0 ? 0 : res
|
162
|
+
res
|
83
163
|
end
|
84
164
|
|
85
165
|
|
86
|
-
def self.
|
87
|
-
|
166
|
+
def self.emoji_width(string, sequences = :rgi_fqe)
|
167
|
+
res = 0
|
168
|
+
|
169
|
+
if regex = EMOJI_SEQUENCES_REGEX_MAPPING[sequences]
|
170
|
+
emoji_sequence_regex = Unicode::Emoji.const_get(regex)
|
171
|
+
else # sequences == :basic
|
172
|
+
emoji_sequence_regex = nil
|
173
|
+
end
|
174
|
+
|
175
|
+
# Make sure we have UTF-8
|
176
|
+
string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8"
|
88
177
|
|
89
|
-
|
90
|
-
|
91
|
-
|
178
|
+
if emoji_sequence_regex
|
179
|
+
# For each string possibly an emoji
|
180
|
+
no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate|
|
181
|
+
# Skip notorious false positives
|
182
|
+
if EMOJI_NOT_POSSIBLE.match?(emoji_candidate)
|
183
|
+
emoji_candidate
|
92
184
|
|
93
|
-
|
94
|
-
|
185
|
+
# Check if we have a combined Emoji with width 2
|
186
|
+
elsif emoji_candidate == emoji_candidate[emoji_sequence_regex]
|
187
|
+
res += 2
|
188
|
+
""
|
95
189
|
|
96
|
-
|
97
|
-
|
190
|
+
# We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set
|
191
|
+
else
|
192
|
+
# Ensure all explicit VS16 sequences have width 2
|
193
|
+
emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji|
|
194
|
+
if basic_emoji.size == 2 # VS16 present
|
195
|
+
res += 2
|
196
|
+
""
|
197
|
+
else
|
198
|
+
basic_emoji
|
199
|
+
end
|
200
|
+
}
|
201
|
+
|
202
|
+
emoji_candidate
|
203
|
+
end
|
98
204
|
}
|
99
|
-
|
205
|
+
else
|
206
|
+
# Only consider basic emoji
|
100
207
|
|
101
|
-
|
208
|
+
# Ensure all explicit VS16 sequences have width 2
|
209
|
+
no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji|
|
210
|
+
if basic_emoji.size >= 2 # VS16 present
|
211
|
+
res += 2
|
212
|
+
""
|
213
|
+
else
|
214
|
+
basic_emoji
|
215
|
+
end
|
216
|
+
}
|
217
|
+
end
|
218
|
+
|
219
|
+
[res, no_emoji_string]
|
102
220
|
end
|
103
221
|
|
104
|
-
def initialize(ambiguous: 1, overwrite: {}, emoji:
|
222
|
+
def initialize(ambiguous: 1, overwrite: {}, emoji: true)
|
105
223
|
@ambiguous = ambiguous
|
106
224
|
@overwrite = overwrite
|
107
225
|
@emoji = emoji
|
108
226
|
end
|
109
227
|
|
110
228
|
def get_config(**kwargs)
|
111
|
-
|
112
|
-
kwargs[:ambiguous] || @ambiguous,
|
113
|
-
kwargs[:overwrite] || @overwrite,
|
114
|
-
|
115
|
-
|
229
|
+
{
|
230
|
+
ambiguous: kwargs[:ambiguous] || @ambiguous,
|
231
|
+
overwrite: kwargs[:overwrite] || @overwrite,
|
232
|
+
emoji: kwargs[:emoji] || @emoji,
|
233
|
+
}
|
116
234
|
end
|
117
235
|
|
118
236
|
def of(string, **kwargs)
|
119
|
-
self.class.of(string,
|
237
|
+
self.class.of(string, **get_config(**kwargs))
|
120
238
|
end
|
121
239
|
end
|
122
240
|
end
|
metadata
CHANGED
@@ -1,15 +1,29 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 3.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2024-
|
11
|
+
date: 2024-11-13 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: unicode-emoji
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '4.0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '4.0'
|
13
27
|
- !ruby/object:Gem::Dependency
|
14
28
|
name: rspec
|
15
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -39,7 +53,8 @@ dependencies:
|
|
39
53
|
- !ruby/object:Gem::Version
|
40
54
|
version: '13.0'
|
41
55
|
description: "[Unicode 16.0.0] Determines the monospace display width of a string
|
42
|
-
using EastAsianWidth.txt, Unicode general category, and other
|
56
|
+
using EastAsianWidth.txt, Unicode general category, Emoji specification, and other
|
57
|
+
data."
|
43
58
|
email:
|
44
59
|
- hi@ruby.consulting
|
45
60
|
executables: []
|
@@ -55,8 +70,10 @@ files:
|
|
55
70
|
- data/display_width.marshal.gz
|
56
71
|
- lib/unicode/display_width.rb
|
57
72
|
- lib/unicode/display_width/constants.rb
|
73
|
+
- lib/unicode/display_width/emoji_support.rb
|
58
74
|
- lib/unicode/display_width/index.rb
|
59
75
|
- lib/unicode/display_width/no_string_ext.rb
|
76
|
+
- lib/unicode/display_width/reline_ext.rb
|
60
77
|
- lib/unicode/display_width/string_ext.rb
|
61
78
|
homepage: https://github.com/janlelis/unicode-display_width
|
62
79
|
licenses:
|
@@ -74,14 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
74
91
|
requirements:
|
75
92
|
- - ">="
|
76
93
|
- !ruby/object:Gem::Version
|
77
|
-
version: 2.
|
94
|
+
version: 2.5.0
|
78
95
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
79
96
|
requirements:
|
80
97
|
- - ">="
|
81
98
|
- !ruby/object:Gem::Version
|
82
99
|
version: '0'
|
83
100
|
requirements: []
|
84
|
-
rubygems_version: 3.5.
|
101
|
+
rubygems_version: 3.5.21
|
85
102
|
signing_key:
|
86
103
|
specification_version: 4
|
87
104
|
summary: Determines the monospace display width of a string in Ruby.
|