unicode-display_width 2.3.0 → 2.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1fce5b196a16187b338dee6aaba7f4bf499bfc12eee213085c29296851671bc4
4
- data.tar.gz: 4554e159dbc6242ab9e8e285282a6bb4361d715997df21ef81b1d9c90665b4cd
3
+ metadata.gz: 27acdfc12bbc4c8455095846bcd520ae1a816d6927ba8fa607d93c0925127c70
4
+ data.tar.gz: 5c65efc8de3821260f0080f582d7ece1085fdb762f501b5c2c0306fd95dbf6f2
5
5
  SHA512:
6
- metadata.gz: 57824a99da0e7db191264802c8cd01e47319b1387643f4d2ea269a5ab86cfdd2c3d188bcdddc71d56c3cecfd1538940bc68b7d870f424e39cf1d9849cf8123d1
7
- data.tar.gz: 655e5048a23f2576e076511d267cf5df7e382c7531e9f1e954c05e3aea5fbbf3975c1af6b4b7fadc4c433283a96afeb02e5621ca925af4bda605d9a6e3ffafeb
6
+ metadata.gz: 00026a3cf58828846a8f1429f818c900f0998c672b0ad637c402fec46cd63b4470751d6ac35168e7d092a33b966de608ebce84a4e5c2fb936610e9866c1ef7fb
7
+ data.tar.gz: 21f6105c2084a3b4f4f5ad0f6eb12840e0bf8adc408b073dc69f919acfa451bca8adb80e1c813e9a887728467420dcad4198faa30333cc87d63f994b2e396337
data/CHANGELOG.md CHANGED
@@ -1,5 +1,27 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 2.5.0
4
+
5
+ - Unicode 15.1
6
+
7
+ ## 2.4.2
8
+
9
+ More performance improvements:
10
+
11
+ - Optimize lookup of first 4096 codepoints
12
+ - Avoid overwrite lookup if no overwrites are set
13
+
14
+ ## 2.4.1
15
+
16
+ - Improve general performance!
17
+ - Further improve performance for ASCII strings
18
+
19
+ *You should really upgrade - it's much faster now!*
20
+
21
+ ## 2.4.0
22
+ - Improve performance for ASCII-only strings, by @fatkodima
23
+ - Require Ruby 2.4
24
+
3
25
  ## 2.3.0
4
26
 
5
27
  - Unicode 15.0
data/MIT-LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT LICENSE
2
2
 
3
- Copyright (c) 2011, 2015-2022 Jan Lelis
3
+ Copyright (c) 2011, 2015-2023 Jan Lelis
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining
6
6
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
1
  ## Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)
2
2
 
3
- Determines the monospace display width of a string in Ruby. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
3
+ Determines the monospace display width of a string in Ruby. Useful for all kinds of terminal-based applications. Implementation based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt) and other data, 100% in Ruby. It does not rely on the OS vendor (like [wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width.
4
4
 
5
- Unicode version: **15.0.0** (September 2022)
5
+ Unicode version: **15.1.0** (September 2023)
6
6
 
7
- Supported Rubies: **3.1**, **3.0**, **2.7**
7
+ Supported Rubies: **3.2**, **3.1**, **3.0**, **2.7**
8
8
 
9
- Old Rubies which might still work: **2.6**, **2.5**, **2.4**, **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
9
+ Old Rubies which might still work: **2.6**, **2.5**, **2.4**
10
+
11
+ For even older Rubies, use version 2.3.0 of this gem: **2.3**, **2.2**, **2.1**, **2.0**, **1.9**
12
+
13
+ ## Version 2.4.2 — Performance Updates
14
+
15
+ **If you use this gem, you should really upgrade to 2.4.2 or newer. It's often 100x faster, sometimes even 1000x and more!**
16
+
17
+ This is possible because the gem now detects if you use very basic (and common) characters, like ASCII characters. Furthermore, the charachter width lookup code has been optimized, so even when full-width characters are involved, the gem is much faster now.
10
18
 
11
19
  ## Version 2.0 — Breaking Changes
12
20
 
@@ -39,7 +47,7 @@ Width | Characters | Comment
39
47
  -------|------------------------------|--------------------------------------------------
40
48
  X | (user defined) | Overwrites any other values
41
49
  -1 | `"\b"` | Backspace (total width never below 0)
42
- 0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) that do not change horizontal width
50
+ 0 | `"\0"`, `"\x05"`, `"\a"`, `"\n"`, `"\v"`, `"\f"`, `"\r"`, `"\x0E"`, `"\x0F"` | [C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes#C0_.28ASCII_and_derivatives.29) which do not change horizontal width
43
51
  1 | `"\u{00AD}"` | SOFT HYPHEN
44
52
  2 | `"\u{2E3A}"` | TWO-EM DASH
45
53
  3 | `"\u{2E3B}"` | THREE-EM DASH
@@ -89,6 +97,9 @@ You can overwrite how to handle specific code points by passing a hash (or even
89
97
  Unicode::DisplayWidth.of("a\tb", 1, "\t".ord => 10)) # => tab counted as 10, so result is 12
90
98
  ```
91
99
 
100
+ Please note that using overwrites disables some perfomance optimizations of this gem.
101
+
102
+
92
103
  #### Emoji Support
93
104
 
94
105
  Emoji width support is included, but in must be activated manually. It will adjust the string's size for modifier and zero-width joiner sequences. You also need to add the [unicode-emoji](https://github.com/janlelis/unicode-emoji) gem to your Gemfile:
@@ -154,7 +165,7 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related
154
165
 
155
166
  ## Copyright & Info
156
167
 
157
- - Copyright (c) 2011, 2015-2022 Jan Lelis, https://janlelis.com, released under the MIT
168
+ - Copyright (c) 2011, 2015-2023 Jan Lelis, https://janlelis.com, released under the MIT
158
169
  license
159
170
  - Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
160
171
  - Unicode data: https://www.unicode.org/copyright.html#Exhibit1
Binary file
@@ -2,8 +2,8 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "2.3.0"
6
- UNICODE_VERSION = "15.0.0"
5
+ VERSION = "2.5.0"
6
+ UNICODE_VERSION = "15.1.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
9
9
  end
@@ -10,5 +10,25 @@ module Unicode
10
10
  serialized_data.force_encoding Encoding::BINARY
11
11
  INDEX = Marshal.load(serialized_data)
12
12
  end
13
+
14
+ def self.decompress_index(index, level)
15
+ index.flat_map{ |value|
16
+ if level > 0
17
+ if value.instance_of?(Array)
18
+ value[15] ||= nil
19
+ decompress_index(value, level - 1)
20
+ else
21
+ decompress_index([value] * 16, level - 1)
22
+ end
23
+ else
24
+ if value.instance_of?(Array)
25
+ value[15] ||= nil
26
+ value
27
+ else
28
+ [value] * 16
29
+ end
30
+ end
31
+ }
32
+ end
13
33
  end
14
34
  end
@@ -5,26 +5,84 @@ require_relative "display_width/index"
5
5
 
6
6
  module Unicode
7
7
  class DisplayWidth
8
- DEPTHS = [0x10000, 0x1000, 0x100, 0x10].freeze
8
+ INITIAL_DEPTH = 0x10000
9
+ ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/
10
+ FIRST_4096 = decompress_index(INDEX[0][0], 1)
9
11
 
10
12
  def self.of(string, ambiguous = 1, overwrite = {}, options = {})
11
- res = string.codepoints.inject(0){ |total_width, codepoint|
12
- index_or_value = INDEX
13
- codepoint_depth_offset = codepoint
14
- DEPTHS.each{ |depth|
15
- index_or_value = index_or_value[codepoint_depth_offset / depth]
16
- codepoint_depth_offset = codepoint_depth_offset % depth
17
- break unless index_or_value.is_a? Array
18
- }
19
- width = index_or_value.is_a?(Array) ? index_or_value[codepoint_depth_offset] : index_or_value
20
- width = ambiguous if width == :A
21
- total_width + (overwrite[codepoint] || width || 1)
13
+ if overwrite.empty?
14
+ # Optimization for ASCII-only strings without certain control symbols
15
+ if string.ascii_only?
16
+ if string.match?(ASCII_NON_ZERO_REGEX)
17
+ res = string.gsub(ASCII_NON_ZERO_REGEX, "").size - string.count("\b")
18
+ res < 0 ? 0 : res
19
+ else
20
+ string.size
21
+ end
22
+ else
23
+ width_no_overwrite(string, ambiguous, options)
24
+ end
25
+ else
26
+ width_all_features(string, ambiguous, overwrite, options)
27
+ end
28
+ end
29
+
30
+ def self.width_no_overwrite(string, ambiguous, options = {})
31
+ # Sum of all chars widths
32
+ res = string.codepoints.sum{ |codepoint|
33
+ if codepoint > 15 && codepoint < 161 # very common
34
+ next 1
35
+ elsif codepoint < 0x1001
36
+ width = FIRST_4096[codepoint]
37
+ else
38
+ width = INDEX
39
+ depth = INITIAL_DEPTH
40
+ while (width = width[codepoint / depth]).instance_of? Array
41
+ codepoint %= depth
42
+ depth /= 16
43
+ end
44
+ end
45
+
46
+ width == :A ? ambiguous : (width || 1)
47
+ }
48
+
49
+ # Substract emoji error
50
+ res -= emoji_extra_width_of(string, ambiguous) if options[:emoji]
51
+
52
+ # Return result + prevent negative lengths
53
+ res < 0 ? 0 : res
54
+ end
55
+
56
+ # Same as .width_no_overwrite - but with applying overwrites for each char
57
+ def self.width_all_features(string, ambiguous, overwrite, options)
58
+ # Sum of all chars widths
59
+ res = string.codepoints.sum{ |codepoint|
60
+ next overwrite[codepoint] if overwrite[codepoint]
61
+
62
+ if codepoint > 15 && codepoint < 161 # very common
63
+ next 1
64
+ elsif codepoint < 0x1001
65
+ width = FIRST_4096[codepoint]
66
+ else
67
+ width = INDEX
68
+ depth = INITIAL_DEPTH
69
+ while (width = width[codepoint / depth]).instance_of? Array
70
+ codepoint %= depth
71
+ depth /= 16
72
+ end
73
+ end
74
+
75
+ width == :A ? ambiguous : (width || 1)
22
76
  }
23
77
 
78
+ # Substract emoji error
24
79
  res -= emoji_extra_width_of(string, ambiguous, overwrite) if options[:emoji]
80
+
81
+ # Return result + prevent negative lengths
25
82
  res < 0 ? 0 : res
26
83
  end
27
84
 
85
+
28
86
  def self.emoji_extra_width_of(string, ambiguous = 1, overwrite = {}, _ = {})
29
87
  require "unicode/emoji"
30
88
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.3.0
4
+ version: 2.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-09-14 00:00:00.000000000 Z
11
+ date: 2023-10-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
@@ -38,7 +38,7 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '13.0'
41
- description: "[Unicode 15.0.0] Determines the monospace display width of a string
41
+ description: "[Unicode 15.1.0] Determines the monospace display width of a string
42
42
  using EastAsianWidth.txt, Unicode general category, and other data."
43
43
  email:
44
44
  - hi@ruby.consulting
@@ -62,7 +62,7 @@ homepage: https://github.com/janlelis/unicode-display_width
62
62
  licenses:
63
63
  - MIT
64
64
  metadata:
65
- changelog_uri: https://github.com/janlelis/unicode-display_width/blob/master/CHANGELOG.md
65
+ changelog_uri: https://github.com/janlelis/unicode-display_width/blob/main/CHANGELOG.md
66
66
  source_code_uri: https://github.com/janlelis/unicode-display_width
67
67
  bug_tracker_uri: https://github.com/janlelis/unicode-display_width/issues
68
68
  rubygems_mfa_required: 'true'
@@ -74,14 +74,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
74
74
  requirements:
75
75
  - - ">="
76
76
  - !ruby/object:Gem::Version
77
- version: 1.9.3
77
+ version: 2.4.0
78
78
  required_rubygems_version: !ruby/object:Gem::Requirement
79
79
  requirements:
80
80
  - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
83
  requirements: []
84
- rubygems_version: 3.3.7
84
+ rubygems_version: 3.4.4
85
85
  signing_key:
86
86
  specification_version: 4
87
87
  summary: Determines the monospace display width of a string in Ruby.