unicode-display_width 2.1.0 → 2.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 83f868860aad325832499e0994c9b727daac6bc8f2f2059a2c3762d947372c80
4
- data.tar.gz: f7e6c78087266834cd0edd4e4728b0742455309cc1f83dd6562efbfcd62dac92
3
+ metadata.gz: e0bcd900f031999ffa43dd6ef091b07b45d425b1ab04e6559f8c8e2c54e08710
4
+ data.tar.gz: ec3daad5e92107072f8f590d5f2217fd1213d7b25d6491bb3b20ee103f7a2087
5
5
  SHA512:
6
- metadata.gz: 1c765285a14f5b45fed8add2fa4d35e144465de2e1943d0d4e8aaab54e1f225fca88858408e24ac7ad110ec9b71fb14d4ea1165282e3f9aae733b34962cacb4d
7
- data.tar.gz: 585579fa0a392c9005a8ef43aaba4c6f69ca2282c5474f873f5287d39604cf042eaa1448edc818de5f4bacd94a1522a3f9dfbd3b7c9b2630aa30d696fe82c2d4
6
+ metadata.gz: da2098a3b5d56518129b453aa88e113583403fefb37bb23eeb3f2ff3213426ea6a7076ab0a2f4c778f91a6cfbe3f98f59ea8dc6a4e37ff4bb6a1499536a5a4b9
7
+ data.tar.gz: 8559483daad47ca76757cf8701f09060bf66f99017a32d62b9c6f739794e1fcc15f2a4b9aae754252d7c8176571f18b590555ed3aa83748836e8e6a681bc7e10
data/CHANGELOG.md CHANGED
@@ -1,5 +1,31 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 2.4.2
4
+
5
+ More performance improvements:
6
+
7
+ - Optimize lookup of first 4096 codepoints
8
+ - Avoid overwrite lookup if no overwrites are set
9
+
10
+ ## 2.4.1
11
+
12
+ - Improve general performance!
13
+ - Further improve performance for ASCII strings
14
+
15
+ *You should really upgrade - it's much faster now!*
16
+
17
+ ## 2.4.0
18
+ - Improve performance for ASCII-only strings, by @fatkodima
19
+ - Require Ruby 2.4
20
+
21
+ ## 2.3.0
22
+
23
+ - Unicode 15.0
24
+
25
+ ## 2.2.0
26
+
27
+ - Add *Hangul Jamo Extended-B* block to zero-width chars, thanks @ninjalj #22
28
+
3
29
  ## 2.1.0
4
30
 
5
31
  - Unicode 14.0
data/MIT-LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT LICENSE
2
2
 
3
- Copyright (c) 2011, 2015-2021 Jan Lelis
3
+ Copyright (c) 2011, 2015-2023 Jan Lelis
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining
6
6
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
1
  ## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
2
2
 
3
- Determines the monospace display width of a string in Ruby. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
3
+ Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
4
4
 
5
- Unicode version: **14.0.0** (September 2021)
5
+ Unicode version: **15.0.0** (September 2022)
6
6
 
7
- Supported Rubies: **3.0**, **2.7**
7
+ Supported Rubies: **3.1**, **3.0**, **2.7**
8
8
 
9
- Old Rubies which might still work: **2.6**, **2.5**, **2.4**, **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
9
+ Old Rubies which might still work: **2.6**, **2.5**, **2.4**
10
+
11
+ For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
12
+
13
+ ## Version 2.4.2 — Performance Updates
14
+
15
+ **If you use this gem, you should really upgrade to 2.4.2. It's often 100x faster, sometimes even 1000x and more!**
16
+
17
+ This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
10
18
 
11
19
  ## Version 2.0 — Breaking Changes
12
20
 
@@ -39,12 +47,12 @@ Width | Characters | Comment
39
47
  -------|------------------------------|--------------------------------------------------
40
48
  X | (user defined) | Overwrites any other values
41
49
  -1 | `"\b"` | Backspace (total width never below 0)
42
- 0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) that do not change horizontal width
50
+ 0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
43
51
  1 | `"\u{00AD}"` | SOFT HYPHEN
44
52
  2 | `"\u{2E3A}"` | TWO-EM DASH
45
53
  3 | `"\u{2E3B}"` | THREE-EM DASH
46
54
  0 | General Categories: Mn, Me, Cf (non-arabic) | Excludes ARABIC format characters
47
- 0 | `"\u{1160}".."\u{11FF}"` | HANGUL JUNGSEONG
55
+ 0 | `"\u{1160}".."\u{11FF}"`, `"\u{D7B0}".."\u{D7FF}"` | HANGUL JUNGSEONG
48
56
  0 | `"\u{2060}".."\u{206F}"`, `"\u{FFF0}".."\u{FFF8}"`, `"\u{E0000}".."\u{E0FFF}"` | Ignorable ranges
49
57
  2 | East Asian Width: F, W | Full-width characters
50
58
  2 | `"\u{3400}".."\u{4DBF}"`, `"\u{4E00}".."\u{9FFF}"`, `"\u{F900}".."\u{FAFF}"`, `"\u{20000}".."\u{2FFFD}"`, `"\u{30000}".."\u{3FFFD}"` | Full-width ranges
@@ -89,6 +97,9 @@ You can overwrite how to handle specific code points by passing a hash (or even
89
97
  Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
90
98
  ```
91
99
 
100
+ Please note that using overwrites disables some perfomance optimizations of this gem.
101
+
102
+
92
103
  #### Emoji Support
93
104
 
94
105
  Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
@@ -148,12 +159,13 @@ Replace "一" with the actual string to measure
148
159
  - JavaScript: https://github.com/mycoboco/wcwidth.js
149
160
  - C: https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
150
161
  - C for Julia: https://github.com/JuliaLang/utf8proc/issues/2
162
+ - Golang: https://github.com/rivo/uniseg
151
163
 
152
164
  See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
153
165
 
154
166
  ## Copyright & Info
155
167
 
156
- - Copyright (c) 2011, 2015-2021 Jan Lelis, https://janlelis.com, released under the MIT
168
+ - Copyright (c) 2011, 2015-2023 Jan Lelis, https://janlelis.com, released under the MIT
157
169
  license
158
170
  - Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
159
171
  - Unicode data: https://www.unicode.org/copyright.html#Exhibit1
Binary file
@@ -2,8 +2,8 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "2.1.0"
6
- UNICODE_VERSION = "14.0.0"
5
+ VERSION = "2.4.2"
6
+ UNICODE_VERSION = "15.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
9
9
  end
@@ -10,5 +10,25 @@ module Unicode
10
10
  serialized_data.force_encoding Encoding::BINARY
11
11
  INDEX = Marshal.load(serialized_data)
12
12
  end
13
+
14
+ def self.decompress_index(index, level)
15
+ index.flat_map{ |value|
16
+ if level > 0
17
+ if value.instance_of?(Array)
18
+ value[15] ||= nil
19
+ decompress_index(value, level - 1)
20
+ else
21
+ decompress_index([value] * 16, level - 1)
22
+ end
23
+ else
24
+ if value.instance_of?(Array)
25
+ value[15] ||= nil
26
+ value
27
+ else
28
+ [value] * 16
29
+ end
30
+ end
31
+ }
32
+ end
13
33
  end
14
34
  end
@@ -5,26 +5,84 @@ require_relative "display_width/index"
5
5
 
6
6
  module Unicode
7
7
  class DisplayWidth
8
- DEPTHS = [0x10000, 0x1000, 0x100, 0x10].freeze
8
+ INITIAL_DEPTH = 0x10000
9
+ ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
10
+ FIRST_4096 = decompress_index(INDEX[0][0], 1)
9
11
 
10
12
  def self.of(string, ambiguous = 1, overwrite = {}, options = {})
11
- res = string.codepoints.inject(0){ |total_width, codepoint|
12
- index_or_value = INDEX
13
- codepoint_depth_offset = codepoint
14
- DEPTHS.each{ |depth|
15
- index_or_value = index_or_value[codepoint_depth_offset / depth]
16
- codepoint_depth_offset = codepoint_depth_offset % depth
17
- break unless index_or_value.is_a? Array
18
- }
19
- width = index_or_value.is_a?(Array) ? index_or_value[codepoint_depth_offset] : index_or_value
20
- width = ambiguous if width == :A
21
- total_width + (overwrite[codepoint] || width || 1)
13
+ if overwrite.empty?
14
+ # Optimization for ASCII-only strings without certain control symbols
15
+ if string.ascii_only?
16
+ if string.match?(ASCII_NON_ZERO_REGEX)
17
+ res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
18
+ res < 0 ? 0 : res
19
+ else
20
+ string.size
21
+ end
22
+ else
23
+ width_no_overwrite(string, ambiguous, options)
24
+ end
25
+ else
26
+ width_all_features(string, ambiguous, overwrite, options)
27
+ end
28
+ end
29
+
30
+ def self.width_no_overwrite(string, ambiguous, options = {})
31
+ # Sum of all chars widths
32
+ res = string.codepoints.sum{ |codepoint|
33
+ if codepoint > 15 && codepoint < 161 # very common
34
+ next 1
35
+ elsif codepoint < 0x1001
36
+ width = FIRST_4096[codepoint]
37
+ else
38
+ width = INDEX
39
+ depth = INITIAL_DEPTH
40
+ while (width = width[codepoint / depth]).instance_of? Array
41
+ codepoint %= depth
42
+ depth /= 16
43
+ end
44
+ end
45
+
46
+ width == :A ? ambiguous : (width || 1)
47
+ }
48
+
49
+ # Substract emoji error
50
+ res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
51
+
52
+ # Return result + prevent negative lengths
53
+ res < 0 ? 0 : res
54
+ end
55
+
56
+ # Same as .width_no_overwrite - but with applying overwrites for each char
57
+ def self.width_all_features(string, ambiguous, overwrite, options)
58
+ # Sum of all chars widths
59
+ res = string.codepoints.sum{ |codepoint|
60
+ next overwrite[codepoint] if overwrite[codepoint]
61
+
62
+ if codepoint > 15 && codepoint < 161 # very common
63
+ next 1
64
+ elsif codepoint < 0x1001
65
+ width = FIRST_4096[codepoint]
66
+ else
67
+ width = INDEX
68
+ depth = INITIAL_DEPTH
69
+ while (width = width[codepoint / depth]).instance_of? Array
70
+ codepoint %= depth
71
+ depth /= 16
72
+ end
73
+ end
74
+
75
+ width == :A ? ambiguous : (width || 1)
22
76
  }
23
77
 
78
+ # Substract emoji error
24
79
  res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
80
+
81
+ # Return result + prevent negative lengths
25
82
  res < 0 ? 0 : res
26
83
  end
27
84
 
85
+
28
86
  def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
29
87
  require "unicode/emoji"
30
88
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.0
4
+ version: 2.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-09-15 00:00:00.000000000 Z
11
+ date: 2023-01-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,7 +38,7 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '13.0'
41
- description: "[Unicode 14.0.0] Determines the monospace display width of a string
41
+ description: "[Unicode 15.0.0] Determines the monospace display width of a string
42
42
  using EastAsianWidth.txt, Unicode general category, and other data."
43
43
  email:
44
44
  - hi@ruby.consulting
@@ -62,9 +62,10 @@ homepage: https://github.com/janlelis/unicode-display_width
62
62
  licenses:
63
63
  - MIT
64
64
  metadata:
65
- changelog_uri: https://github.com/janlelis/unicode-display_width/blob/master/CHANGELOG.md
65
+ changelog_uri: https://github.com/janlelis/unicode-display_width/blob/main/CHANGELOG.md
66
66
  source_code_uri: https://github.com/janlelis/unicode-display_width
67
67
  bug_tracker_uri: https://github.com/janlelis/unicode-display_width/issues
68
+ rubygems_mfa_required: 'true'
68
69
  post_install_message:
69
70
  rdoc_options: []
70
71
  require_paths:
@@ -73,14 +74,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
73
74
  requirements:
74
75
  - - ">="
75
76
  - !ruby/object:Gem::Version
76
- version: 1.9.3
77
+ version: 2.4.0
77
78
  required_rubygems_version: !ruby/object:Gem::Requirement
78
79
  requirements:
79
80
  - - ">="
80
81
  - !ruby/object:Gem::Version
81
82
  version: '0'
82
83
  requirements: []
83
- rubygems_version: 3.2.3
84
+ rubygems_version: 3.4.1
84
85
  signing_key:
85
86
  specification_version: 4
86
87
  summary: Determines the monospace display width of a string in Ruby.