unicode-display_width 3.1.2 → 3.1.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a85ca57ca5e291c17993e526d222dda44b884286484b3831bb8173ce92aafb1a
4
- data.tar.gz: d1036dfc6464459de04a713e273d09dea767a3b9a9629d9e491052c2ffe97c23
3
+ metadata.gz: 9dadf5b8337ac74b8e5af2a6cd481708c39050506440ccf5d30c6cdc9eb5ade3
4
+ data.tar.gz: ae3bb12a0fabe7a53a1b533909f42a86f1fc7acecc33c9b57218ed3607f71d3c
5
5
  SHA512:
6
- metadata.gz: d669e8a2866b56a78bafb3fff6d2d6430fab6bb1ca2633aeaac68e0634ca14374ac0b325bc7159ef90afe0bdffd9c154700cae1fc3183b1d74281ff4b5024e1b
7
- data.tar.gz: 5f319484d27dad70b3851398e11cd3cb93b5c4f41a6c3a76c958d505d8357f9e303b661fd7a0339262d1458b82cb8619e6682ee2dbf8c583d33fbde4fd1a8680
6
+ metadata.gz: 414227480c3ae2ca0afcee225bb68b6506ac2f7dac630422db87c0a6d28d9d921bcc4645146ffa0a75ace5eb338dc71df2f051aeb0732824dbf1c7bb3117ba22
7
+ data.tar.gz: 1002b3752d47df6f3d416d378148af38ff7c92698a118aded36f55e3b6bbdc99a29552fdee792636d5edea1ce2104266b1dcf9198631a7669334a48fa425af29
data/CHANGELOG.md CHANGED
@@ -1,5 +1,18 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## 3.1.4
4
+
5
+ - Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence
6
+ context (= single emoji char + modifier) #29
7
+ - Add more docs and specs about modifier handling
8
+
9
+ ## 3.1.3
10
+
11
+ Better handling of non-UTF-8 strings, patch by @Earlopain:
12
+
13
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
14
+ - Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
15
+
3
16
  ## 3.1.2
4
17
 
5
18
  - Performance improvements
@@ -28,6 +41,7 @@
28
41
 
29
42
  ## 3.0.1
30
43
 
44
+
31
45
  - Add WezTerm and foot as good Emoji terminals
32
46
 
33
47
  ## 3.0.0
data/README.md CHANGED
@@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1
71
71
  Unicode::DisplayWidth.of("·", 2) # => 2
72
72
  ```
73
73
 
74
+ ### Encoding Notes
75
+
76
+ - Data with *BINARY* encoding is interpreted as UTF-8, if possible
77
+ - Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
78
+
74
79
  ### Custom Overwrites
75
80
 
76
81
  You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
@@ -96,12 +101,16 @@ There are many Emoji which get constructed by combining other Emoji in a sequenc
96
101
 
97
102
  Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
98
103
 
104
+ Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property.
105
+
99
106
  Emoji Type | Width / Comment
100
107
  ------------|----------------
101
- Basic/Single Emoji character without Variation Selector | No special handling
102
- Basic/Single Emoji character with VS15 (Text) | No special handling
103
- Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
104
- Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
108
+ Basic/Single Emoji character without Variation Selector | No special handling
109
+ Basic/Single Emoji character with VS15 (Text) | No special handling
110
+ Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
111
+ Single Emoji character with Skin Tone Modifier | 2
112
+ Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at`
113
+ Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)
105
114
 
106
115
  #### Emoji Modes
107
116
 
@@ -126,7 +135,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi
126
135
 
127
136
  Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
128
137
 
129
- Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
138
+ Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
130
139
 
131
140
  **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
132
141
 
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  class DisplayWidth
5
- VERSION = "3.1.2"
5
+ VERSION = "3.1.4"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
8
8
  INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"
@@ -42,12 +42,21 @@ module Unicode
42
42
  ),
43
43
  Unicode::Emoji::REGEX_EMOJI_KEYCAP
44
44
  )
45
- REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
45
+
46
+ # ebase = Unicode::Emoji::REGEX_PROP_MODIFIER_BASE.source
47
+ REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+|.[\u{1F3FB}-\u{1F3FF}]/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
46
48
  REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16)
47
49
 
48
50
  # Returns monospace display width of string
49
51
  def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
50
- string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8
52
+ # Binary strings don't make much sense when calculating display width.
53
+ # Assume it's valid UTF-8
54
+ if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
55
+ # Didn't work out, go back to binary
56
+ string.force_encoding(Encoding::BINARY)
57
+ end
58
+
59
+ string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
51
60
  options = normalize_options(string, ambiguous, overwrite, old_options, **options)
52
61
 
53
62
  width = 0
@@ -236,4 +245,3 @@ module Unicode
236
245
  end
237
246
  end
238
247
  end
239
-
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-display_width
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.1.2
4
+ version: 3.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-11-20 00:00:00.000000000 Z
11
+ date: 2025-01-13 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: unicode-emoji