unicode-display_width 2.3.0 → 2.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +22 -0
- data/MIT-LICENSE.txt +1 -1
- data/README.md +17 -6
- data/data/display_width.marshal.gz +0 -0
- data/lib/unicode/display_width/constants.rb +2 -2
- data/lib/unicode/display_width/index.rb +20 -0
- data/lib/unicode/display_width.rb +70 -12
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 27acdfc12bbc4c8455095846bcd520ae1a816d6927ba8fa607d93c0925127c70
|
4
|
+
data.tar.gz: 5c65efc8de3821260f0080f582d7ece1085fdb762f501b5c2c0306fd95dbf6f2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 00026a3cf58828846a8f1429f818c900f0998c672b0ad637c402fec46cd63b4470751d6ac35168e7d092a33b966de608ebce84a4e5c2fb936610e9866c1ef7fb
|
7
|
+
data.tar.gz: 21f6105c2084a3b4f4f5ad0f6eb12840e0bf8adc408b073dc69f919acfa451bca8adb80e1c813e9a887728467420dcad4198faa30333cc87d63f994b2e396337
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,27 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## 2.5.0
|
4
|
+
|
5
|
+
- Unicode 15.1
|
6
|
+
|
7
|
+
## 2.4.2
|
8
|
+
|
9
|
+
More performance improvements:
|
10
|
+
|
11
|
+
- Optimize lookup of first 4096 codepoints
|
12
|
+
- Avoid overwrite lookup if no overwrites are set
|
13
|
+
|
14
|
+
## 2.4.1
|
15
|
+
|
16
|
+
- Improve general performance!
|
17
|
+
- Further improve performance for ASCII strings
|
18
|
+
|
19
|
+
*You should really upgrade - it's much faster now!*
|
20
|
+
|
21
|
+
## 2.4.0
|
22
|
+
- Improve performance for ASCII-only strings, by @fatkodima
|
23
|
+
- Require Ruby 2.4
|
24
|
+
|
3
25
|
## 2.3.0
|
4
26
|
|
5
27
|
- Unicode 15.0
|
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,12 +1,20 @@
|
|
1
1
|
## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
|
2
2
|
|
3
|
-
Determines the monospace display width of a string in Ruby. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
|
3
|
+
Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
|
4
4
|
|
5
|
-
Unicode version: **15.
|
5
|
+
Unicode version: **15.1.0** (September 2023)
|
6
6
|
|
7
|
-
Supported Rubies: **3.1**, **3.0**, **2.7**
|
7
|
+
Supported Rubies: **3.2**, **3.1**, **3.0**, **2.7**
|
8
8
|
|
9
|
-
Old Rubies which might still work: **2.6**, **2.5**, **2.4
|
9
|
+
Old Rubies which might still work: **2.6**, **2.5**, **2.4**
|
10
|
+
|
11
|
+
For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
|
12
|
+
|
13
|
+
## Version 2.4.2 — Performance Updates
|
14
|
+
|
15
|
+
**If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
|
16
|
+
|
17
|
+
This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
|
10
18
|
|
11
19
|
## Version 2.0 — Breaking Changes
|
12
20
|
|
@@ -39,7 +47,7 @@ Width | Characters | Comment
|
|
39
47
|
-------|------------------------------|--------------------------------------------------
|
40
48
|
X | (user defined) | Overwrites any other values
|
41
49
|
-1 | `"\b"` | Backspace (total width never below 0)
|
42
|
-
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29)
|
50
|
+
0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
|
43
51
|
1 | `"\u{00AD}"` | SOFT HYPHEN
|
44
52
|
2 | `"\u{2E3A}"` | TWO-EM DASH
|
45
53
|
3 | `"\u{2E3B}"` | THREE-EM DASH
|
@@ -89,6 +97,9 @@ You can overwrite how to handle specific code points by passing a hash (or even
|
|
89
97
|
Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
|
90
98
|
```
|
91
99
|
|
100
|
+
Please note that using overwrites disables some perfomance optimizations of this gem.
|
101
|
+
|
102
|
+
|
92
103
|
#### Emoji Support
|
93
104
|
|
94
105
|
Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
|
@@ -154,7 +165,7 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related
|
|
154
165
|
|
155
166
|
## Copyright & Info
|
156
167
|
|
157
|
-
- Copyright (c) 2011, 2015-
|
168
|
+
- Copyright (c) 2011, 2015-2023 Jan Lelis, https://janlelis.com, released under the MIT
|
158
169
|
license
|
159
170
|
- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
|
160
171
|
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
|
Binary file
|
@@ -2,8 +2,8 @@
|
|
2
2
|
|
3
3
|
module Unicode
|
4
4
|
class DisplayWidth
|
5
|
-
VERSION = "2.
|
6
|
-
UNICODE_VERSION = "15.
|
5
|
+
VERSION = "2.5.0"
|
6
|
+
UNICODE_VERSION = "15.1.0"
|
7
7
|
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
|
8
8
|
INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
|
9
9
|
end
|
@@ -10,5 +10,25 @@ module Unicode
|
|
10
10
|
serialized_data.force_encoding Encoding::BINARY
|
11
11
|
INDEX = Marshal.load(serialized_data)
|
12
12
|
end
|
13
|
+
|
14
|
+
def self.decompress_index(index, level)
|
15
|
+
index.flat_map{ |value|
|
16
|
+
if level > 0
|
17
|
+
if value.instance_of?(Array)
|
18
|
+
value[15] ||= nil
|
19
|
+
decompress_index(value, level - 1)
|
20
|
+
else
|
21
|
+
decompress_index([value] * 16, level - 1)
|
22
|
+
end
|
23
|
+
else
|
24
|
+
if value.instance_of?(Array)
|
25
|
+
value[15] ||= nil
|
26
|
+
value
|
27
|
+
else
|
28
|
+
[value] * 16
|
29
|
+
end
|
30
|
+
end
|
31
|
+
}
|
32
|
+
end
|
13
33
|
end
|
14
34
|
end
|
@@ -5,26 +5,84 @@ require_relative "display_width/index"
|
|
5
5
|
|
6
6
|
module Unicode
|
7
7
|
class DisplayWidth
|
8
|
-
|
8
|
+
INITIAL_DEPTH = 0x10000
|
9
|
+
ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
|
10
|
+
FIRST_4096 = decompress_index(INDEX[0][0], 1)
|
9
11
|
|
10
12
|
def self.of(string, ambiguous = 1, overwrite = {}, options = {})
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
13
|
+
if overwrite.empty?
|
14
|
+
# Optimization for ASCII-only strings without certain control symbols
|
15
|
+
if string.ascii_only?
|
16
|
+
if string.match?(ASCII_NON_ZERO_REGEX)
|
17
|
+
res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
|
18
|
+
res < 0 ? 0 : res
|
19
|
+
else
|
20
|
+
string.size
|
21
|
+
end
|
22
|
+
else
|
23
|
+
width_no_overwrite(string, ambiguous, options)
|
24
|
+
end
|
25
|
+
else
|
26
|
+
width_all_features(string, ambiguous, overwrite, options)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.width_no_overwrite(string, ambiguous, options = {})
|
31
|
+
# Sum of all chars widths
|
32
|
+
res = string.codepoints.sum{ |codepoint|
|
33
|
+
if codepoint > 15 && codepoint < 161 # very common
|
34
|
+
next 1
|
35
|
+
elsif codepoint < 0x1001
|
36
|
+
width = FIRST_4096[codepoint]
|
37
|
+
else
|
38
|
+
width = INDEX
|
39
|
+
depth = INITIAL_DEPTH
|
40
|
+
while (width = width[codepoint / depth]).instance_of? Array
|
41
|
+
codepoint %= depth
|
42
|
+
depth /= 16
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
width == :A ? ambiguous : (width || 1)
|
47
|
+
}
|
48
|
+
|
49
|
+
# Substract emoji error
|
50
|
+
res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
|
51
|
+
|
52
|
+
# Return result + prevent negative lengths
|
53
|
+
res < 0 ? 0 : res
|
54
|
+
end
|
55
|
+
|
56
|
+
# Same as .width_no_overwrite - but with applying overwrites for each char
|
57
|
+
def self.width_all_features(string, ambiguous, overwrite, options)
|
58
|
+
# Sum of all chars widths
|
59
|
+
res = string.codepoints.sum{ |codepoint|
|
60
|
+
next overwrite[codepoint] if overwrite[codepoint]
|
61
|
+
|
62
|
+
if codepoint > 15 && codepoint < 161 # very common
|
63
|
+
next 1
|
64
|
+
elsif codepoint < 0x1001
|
65
|
+
width = FIRST_4096[codepoint]
|
66
|
+
else
|
67
|
+
width = INDEX
|
68
|
+
depth = INITIAL_DEPTH
|
69
|
+
while (width = width[codepoint / depth]).instance_of? Array
|
70
|
+
codepoint %= depth
|
71
|
+
depth /= 16
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
width == :A ? ambiguous : (width || 1)
|
22
76
|
}
|
23
77
|
|
78
|
+
# Substract emoji error
|
24
79
|
res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
|
80
|
+
|
81
|
+
# Return result + prevent negative lengths
|
25
82
|
res < 0 ? 0 : res
|
26
83
|
end
|
27
84
|
|
85
|
+
|
28
86
|
def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
|
29
87
|
require "unicode/emoji"
|
30
88
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-display_width
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2023-10-01 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -38,7 +38,7 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: '13.0'
|
41
|
-
description: "[Unicode 15.
|
41
|
+
description: "[Unicode 15.1.0] Determines the monospace display width of a string
|
42
42
|
using EastAsianWidth.txt, Unicode general category, and other data."
|
43
43
|
email:
|
44
44
|
- hi@ruby.consulting
|
@@ -62,7 +62,7 @@ homepage: https://github.com/janlelis/unicode-display_width
|
|
62
62
|
licenses:
|
63
63
|
- MIT
|
64
64
|
metadata:
|
65
|
-
changelog_uri: https://github.com/janlelis/unicode-display_width/blob/
|
65
|
+
changelog_uri: https://github.com/janlelis/unicode-display_width/blob/main/CHANGELOG.md
|
66
66
|
source_code_uri: https://github.com/janlelis/unicode-display_width
|
67
67
|
bug_tracker_uri: https://github.com/janlelis/unicode-display_width/issues
|
68
68
|
rubygems_mfa_required: 'true'
|
@@ -74,14 +74,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
74
74
|
requirements:
|
75
75
|
- - ">="
|
76
76
|
- !ruby/object:Gem::Version
|
77
|
-
version:
|
77
|
+
version: 2.4.0
|
78
78
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
80
|
- - ">="
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: '0'
|
83
83
|
requirements: []
|
84
|
-
rubygems_version: 3.
|
84
|
+
rubygems_version: 3.4.4
|
85
85
|
signing_key:
|
86
86
|
specification_version: 4
|
87
87
|
summary: Determines the monospace display width of a string in Ruby.
|