unicode-display_width 2.1.0 → 2.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +26 -0
- data/MIT-LICENSE.txt +1 -1
- data/README.md +19 -7
- data/data/display_width.marshal.gz +0 -0
- data/lib/unicode/display_width/constants.rb +2 -2
- data/lib/unicode/display_width/index.rb +20 -0
- data/lib/unicode/display_width.rb +70 -12
- metadata +7 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e0bcd900f031999ffa43dd6ef091b07b45d425b1ab04e6559f8c8e2c54e08710
|
4
|
+
data.tar.gz: ec3daad5e92107072f8f590d5f2217fd1213d7b25d6491bb3b20ee103f7a2087
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: da2098a3b5d56518129b453aa88e113583403fefb37bb23eeb3f2ff3213426ea6a7076ab0a2f4c778f91a6cfbe3f98f59ea8dc6a4e37ff4bb6a1499536a5a4b9
|
7
|
+
data.tar.gz: 8559483daad47ca76757cf8701f09060bf66f99017a32d62b9c6f739794e1fcc15f2a4b9aae754252d7c8176571f18b590555ed3aa83748836e8e6a681bc7e10
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,31 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 2.4.2
|
4
|
+
|
5
|
+
More performance improvements:
|
6
|
+
|
7
|
+
- Optimize lookup of first 4096 codepoints
|
8
|
+
- Avoid overwrite lookup if no overwrites are set
|
9
|
+
|
10
|
+
## 2.4.1
|
11
|
+
|
12
|
+
- Improve general performance!
|
13
|
+
- Further improve performance for ASCII strings
|
14
|
+
|
15
|
+
*You should really upgrade - it's much faster now!*
|
16
|
+
|
17
|
+
## 2.4.0
|
18
|
+
- Improve performance for ASCII-only strings, by @fatkodima
|
19
|
+
- Require Ruby 2.4
|
20
|
+
|
21
|
+
## 2.3.0
|
22
|
+
|
23
|
+
- Unicode 15.0
|
24
|
+
|
25
|
+
## 2.2.0
|
26
|
+
|
27
|
+
- Add *Hangul Jamo Extended-B* block to zero-width chars, thanks @ninjalj #22
|
28
|
+
|
3
29
|
## 2.1.0
|
4
30
|
|
5
31
|
- Unicode 14.0
|
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,12 +1,20 @@
|
|
1
1
|
## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
|
2
2
|
|
3
|
-
Determines the monospace display width of a string in Ruby. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
|
3
|
+
Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
|
4
4
|
|
5
|
-
Unicode version: **
|
5
|
+
Unicode version: **15.0.0** (September 2022)
|
6
6
|
|
7
|
-
Supported Rubies: **3.0**, **2.7**
|
7
|
+
Supported Rubies: **3.1**, **3.0**, **2.7**
|
8
8
|
|
9
|
-
Old Rubies which might still work: **2.6**, **2.5**, **2.4
|
9
|
+
Old Rubies which might still work: **2.6**, **2.5**, **2.4**
|
10
|
+
|
11
|
+
For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
|
12
|
+
|
13
|
+
## Version 2.4.2 — Performance Updates
|
14
|
+
|
15
|
+
**If you use this gem, you should really upgrade to 2.4.2. It's often 100x faster, sometimes even 1000x and more!**
|
16
|
+
|
17
|
+
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
|
10
18
|
|
11
19
|
## Version 2.0 — Breaking Changes
|
12
20
|
|
@@ -39,12 +47,12 @@ Width | Characters | Comment
|
|
39
47
|
-------|------------------------------|--------------------------------------------------
|
40
48
|
X | (user defined) | Overwrites any other values
|
41
49
|
-1 | `"\b"` | Backspace (total width never below 0)
|
42
|
-
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29)
|
50
|
+
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
|
43
51
|
1 | `"\u{00AD}"` | SOFT HYPHEN
|
44
52
|
2 | `"\u{2E3A}"` | TWO-EM DASH
|
45
53
|
3 | `"\u{2E3B}"` | THREE-EM DASH
|
46
54
|
0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
|
47
|
-
0 | `"\u{1160}".."\u{11FF}"` | HANGUL JUNGSEONG
|
55
|
+
0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
|
48
56
|
0 | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
|
49
57
|
2 | East Asian Width: F, W | Full-width characters
|
50
58
|
2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
|
@@ -89,6 +97,9 @@ You can overwrite how to handle specific code points by passing a hash (or even
|
|
89
97
|
Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
|
90
98
|
```
|
91
99
|
|
100
|
+
Please note that using overwrites disables some perfomance optimizations of this gem.
|
101
|
+
|
102
|
+
|
92
103
|
#### Emoji Support
|
93
104
|
|
94
105
|
Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
|
@@ -148,12 +159,13 @@ Replace "一" with the actual string to measure
|
|
148
159
|
- JavaScript: https://github.com/mycoboco/wcwidth.js
|
149
160
|
- C: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
|
150
161
|
- C for Julia: https://github.com/JuliaLang/utf8proc/issues/2
|
162
|
+
- Golang: https://github.com/rivo/uniseg
|
151
163
|
|
152
164
|
See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
|
153
165
|
|
154
166
|
## Copyright & Info
|
155
167
|
|
156
|
-
- Copyright (c) 2011, 2015-
|
168
|
+
- Copyright (c) 2011, 2015-2023 Jan Lelis, https://janlelis.com, released under the MIT
|
157
169
|
license
|
158
170
|
- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
|
159
171
|
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
|
Binary file
|
@@ -2,8 +2,8 @@
|
|
2
2
|
|
3
3
|
module Unicode
|
4
4
|
class DisplayWidth
|
5
|
-
VERSION = "2.
|
6
|
-
UNICODE_VERSION = "
|
5
|
+
VERSION = "2.4.2"
|
6
|
+
UNICODE_VERSION = "15.0.0"
|
7
7
|
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
|
8
8
|
INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
|
9
9
|
end
|
@@ -10,5 +10,25 @@ module Unicode
|
|
10
10
|
serialized_data.force_encoding Encoding::BINARY
|
11
11
|
INDEX = Marshal.load(serialized_data)
|
12
12
|
end
|
13
|
+
|
14
|
+
def self.decompress_index(index, level)
|
15
|
+
index.flat_map{ |value|
|
16
|
+
if level > 0
|
17
|
+
if value.instance_of?(Array)
|
18
|
+
value[15] ||= nil
|
19
|
+
decompress_index(value, level - 1)
|
20
|
+
else
|
21
|
+
decompress_index([value] * 16, level - 1)
|
22
|
+
end
|
23
|
+
else
|
24
|
+
if value.instance_of?(Array)
|
25
|
+
value[15] ||= nil
|
26
|
+
value
|
27
|
+
else
|
28
|
+
[value] * 16
|
29
|
+
end
|
30
|
+
end
|
31
|
+
}
|
32
|
+
end
|
13
33
|
end
|
14
34
|
end
|
@@ -5,26 +5,84 @@ require_relative "display_width/index"
|
|
5
5
|
|
6
6
|
module Unicode
|
7
7
|
class DisplayWidth
|
8
|
-
|
8
|
+
INITIAL_DEPTH = 0x10000
|
9
|
+
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
|
10
|
+
FIRST_4096 = decompress_index(INDEX[0][0], 1)
|
9
11
|
|
10
12
|
def self.of(string, ambiguous = 1, overwrite = {}, options = {})
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
13
|
+
if overwrite.empty?
|
14
|
+
# Optimization for ASCII-only strings without certain control symbols
|
15
|
+
if string.ascii_only?
|
16
|
+
if string.match?(ASCII_NON_ZERO_REGEX)
|
17
|
+
res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
|
18
|
+
res < 0 ? 0 : res
|
19
|
+
else
|
20
|
+
string.size
|
21
|
+
end
|
22
|
+
else
|
23
|
+
width_no_overwrite(string, ambiguous, options)
|
24
|
+
end
|
25
|
+
else
|
26
|
+
width_all_features(string, ambiguous, overwrite, options)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.width_no_overwrite(string, ambiguous, options = {})
|
31
|
+
# Sum of all chars widths
|
32
|
+
res = string.codepoints.sum{ |codepoint|
|
33
|
+
if codepoint > 15 && codepoint < 161 # very common
|
34
|
+
next 1
|
35
|
+
elsif codepoint < 0x1001
|
36
|
+
width = FIRST_4096[codepoint]
|
37
|
+
else
|
38
|
+
width = INDEX
|
39
|
+
depth = INITIAL_DEPTH
|
40
|
+
while (width = width[codepoint / depth]).instance_of? Array
|
41
|
+
codepoint %= depth
|
42
|
+
depth /= 16
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
width == :A ? ambiguous : (width || 1)
|
47
|
+
}
|
48
|
+
|
49
|
+
# Substract emoji error
|
50
|
+
res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
|
51
|
+
|
52
|
+
# Return result + prevent negative lengths
|
53
|
+
res < 0 ? 0 : res
|
54
|
+
end
|
55
|
+
|
56
|
+
# Same as .width_no_overwrite - but with applying overwrites for each char
|
57
|
+
def self.width_all_features(string, ambiguous, overwrite, options)
|
58
|
+
# Sum of all chars widths
|
59
|
+
res = string.codepoints.sum{ |codepoint|
|
60
|
+
next overwrite[codepoint] if overwrite[codepoint]
|
61
|
+
|
62
|
+
if codepoint > 15 && codepoint < 161 # very common
|
63
|
+
next 1
|
64
|
+
elsif codepoint < 0x1001
|
65
|
+
width = FIRST_4096[codepoint]
|
66
|
+
else
|
67
|
+
width = INDEX
|
68
|
+
depth = INITIAL_DEPTH
|
69
|
+
while (width = width[codepoint / depth]).instance_of? Array
|
70
|
+
codepoint %= depth
|
71
|
+
depth /= 16
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
width == :A ? ambiguous : (width || 1)
|
22
76
|
}
|
23
77
|
|
78
|
+
# Substract emoji error
|
24
79
|
res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
|
80
|
+
|
81
|
+
# Return result + prevent negative lengths
|
25
82
|
res < 0 ? 0 : res
|
26
83
|
end
|
27
84
|
|
85
|
+
|
28
86
|
def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
|
29
87
|
require "unicode/emoji"
|
30
88
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.4.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2023-01-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -38,7 +38,7 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '13.0'
|
41
|
-
description: "[Unicode
|
41
|
+
description: "[Unicode 15.0.0] Determines the monospace display width of a string
|
42
42
|
using EastAsianWidth.txt, Unicode general category, and other data."
|
43
43
|
email:
|
44
44
|
- hi@ruby.consulting
|
@@ -62,9 +62,10 @@ homepage: https://github.com/janlelis/unicode-display_width
|
|
62
62
|
licenses:
|
63
63
|
- MIT
|
64
64
|
metadata:
|
65
|
-
changelog_uri: https://github.com/janlelis/unicode-display_width/blob/
|
65
|
+
changelog_uri: https://github.com/janlelis/unicode-display_width/blob/main/CHANGELOG.md
|
66
66
|
source_code_uri: https://github.com/janlelis/unicode-display_width
|
67
67
|
bug_tracker_uri: https://github.com/janlelis/unicode-display_width/issues
|
68
|
+
rubygems_mfa_required: 'true'
|
68
69
|
post_install_message:
|
69
70
|
rdoc_options: []
|
70
71
|
require_paths:
|
@@ -73,14 +74,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
73
74
|
requirements:
|
74
75
|
- - ">="
|
75
76
|
- !ruby/object:Gem::Version
|
76
|
-
version:
|
77
|
+
version: 2.4.0
|
77
78
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
78
79
|
requirements:
|
79
80
|
- - ">="
|
80
81
|
- !ruby/object:Gem::Version
|
81
82
|
version: '0'
|
82
83
|
requirements: []
|
83
|
-
rubygems_version: 3.
|
84
|
+
rubygems_version: 3.4.1
|
84
85
|
signing_key:
|
85
86
|
specification_version: 4
|
86
87
|
summary: Determines the monospace display width of a string in Ruby.
|