RubyGems - unicode-display_width - Versions diffs - 3.1.2 → 3.1.4 - Mend

unicode-display_width 3.1.2 → 3.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +14 -0
data/README.md +14 -5
data/lib/unicode/display_width/constants.rb +1 -1
data/lib/unicode/display_width.rb +11 -3
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a85ca57ca5e291c17993e526d222dda44b884286484b3831bb8173ce92aafb1a
-  data.tar.gz: d1036dfc6464459de04a713e273d09dea767a3b9a9629d9e491052c2ffe97c23
+  metadata.gz: 9dadf5b8337ac74b8e5af2a6cd481708c39050506440ccf5d30c6cdc9eb5ade3
+  data.tar.gz: ae3bb12a0fabe7a53a1b533909f42a86f1fc7acecc33c9b57218ed3607f71d3c
 SHA512:
-  metadata.gz: d669e8a2866b56a78bafb3fff6d2d6430fab6bb1ca2633aeaac68e0634ca14374ac0b325bc7159ef90afe0bdffd9c154700cae1fc3183b1d74281ff4b5024e1b
-  data.tar.gz: 5f319484d27dad70b3851398e11cd3cb93b5c4f41a6c3a76c958d505d8357f9e303b661fd7a0339262d1458b82cb8619e6682ee2dbf8c583d33fbde4fd1a8680
+  metadata.gz: 414227480c3ae2ca0afcee225bb68b6506ac2f7dac630422db87c0a6d28d9d921bcc4645146ffa0a75ace5eb338dc71df2f051aeb0732824dbf1c7bb3117ba22
+  data.tar.gz: 1002b3752d47df6f3d416d378148af38ff7c92698a118aded36f55e3b6bbdc99a29552fdee792636d5edea1ce2104266b1dcf9198631a7669334a48fa425af29

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,18 @@
 # CHANGELOG
+## 3.1.4
+- Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence
+  context (= single emoji char + modifier) #29
+- Add more docs and specs about modifier handling
+## 3.1.3
+Better handling of non-UTF-8 strings, patch by @Earlopain:
+- Data with *BINARY* encoding is interpreted as UTF-8, if possible
+- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8
 ## 3.1.2
 - Performance improvements
@@ -28,6 +41,7 @@
 ## 3.0.1
 - Add WezTerm and foot as good Emoji terminals
 ## 3.0.0

data/README.md CHANGED Viewed

@@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1
 Unicode::DisplayWidth.of("·", 2) # => 2
 ```
+### Encoding Notes
+- Data with *BINARY* encoding is interpreted as UTF-8, if possible
+- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)
 ### Custom Overwrites
 You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
@@ -96,12 +101,16 @@ There are many Emoji which get constructed by combining other Emoji in a sequenc
 Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).
+Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property.
 Emoji Type  | Width / Comment
 ------------|----------------
-Basic/Single Emoji character without Variation Selector | No special handling
-Basic/Single Emoji character with VS15 (Text)           | No special handling
-Basic/Single Emoji character with VS16 (Emoji)          | 2 or East Asian Width (see table below)
-Emoji Sequence                                          | 2 if Emoji belongs to configured Emoji set (see table below)
+Basic/Single Emoji character without Variation Selector   | No special handling
+Basic/Single Emoji character with VS15 (Text)             | No special handling
+Basic/Single Emoji character with VS16 (Emoji)            | 2 or East Asian Width (see table below)
+Single Emoji character with Skin Tone Modifier            | 2
+Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at`
+Emoji Sequence                                            | 2 if Emoji belongs to configured Emoji set (see table below)
 #### Emoji Modes
@@ -126,7 +135,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi
 Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).
-Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities.
+Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).
 **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…

data/lib/unicode/display_width/constants.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module Unicode
   class DisplayWidth
-    VERSION = "3.1.2"
+    VERSION = "3.1.4"
     UNICODE_VERSION = "16.0.0"
     DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/")
     INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz"

data/lib/unicode/display_width.rb CHANGED Viewed

@@ -42,12 +42,21 @@ module Unicode
       ),
       Unicode::Emoji::REGEX_EMOJI_KEYCAP
     )
-    REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
+    # ebase = Unicode::Emoji::REGEX_PROP_MODIFIER_BASE.source
+    REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+|.[\u{1F3FB}-\u{1F3FF}]/, Unicode::Emoji::REGEX_EMOJI_KEYCAP)
     REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16)
     # Returns monospace display width of string
     def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
-      string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8
+      # Binary strings don't make much sense when calculating display width.
+      # Assume it's valid UTF-8
+      if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
+        # Didn't work out, go back to binary
+        string.force_encoding(Encoding::BINARY)
+      end
+      string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
       options = normalize_options(string, ambiguous, overwrite, old_options, **options)
       width = 0
@@ -236,4 +245,3 @@ module Unicode
     end
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: unicode-display_width
 version: !ruby/object:Gem::Version
-  version: 3.1.2
+  version: 3.1.4
 platform: ruby
 authors:
 - Jan Lelis
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-11-20 00:00:00.000000000 Z
+date: 2025-01-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: unicode-emoji