RubyGems - unicode-name - Versions diffs - 1.13.0 → 1.13.2 - Mend

unicode-name 1.13.0 → 1.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/.rake_tasks +3 -0
data/CHANGELOG.md +29 -17
data/Gemfile.lock +11 -11
data/README.md +9 -3
data/data/name.marshal.gz +0 -0
data/lib/unicode/name/constants.rb +1 -1
data/lib/unicode/name.rb +24 -6
data/spec/unicode_name_spec.rb +14 -2
metadata +4 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a4698018ace1d5ea494dae78e89b9b034ef75bbb78441c960a3e9afed0a7632b
-  data.tar.gz: 26bec8d0de6272266d958e400b45efc8b979a6e0c5c9792c08824060b239d72d
+  metadata.gz: ca6d8f90ce7c5fa9c9da362be1d90b10260da01d8ac97e2412e0699fa69ca40a
+  data.tar.gz: a3a2a417c76906c32fe429ce51e16c543696687bb0078340aecb293a65595800
 SHA512:
-  metadata.gz: c016a26447bb75579756602a8bf3ea68a29d8fc46074be79c4583d559779f3251bfbdd6838def8913274c2bbf1f69d225acf6ed6dec258513bb2c091535e7d0c
-  data.tar.gz: f67d5e5edf3554ccc69d52709f1551f8ac066325234c428b604f9e050700883a2cb503d812123cd3b88774afca488b63167c331274bf34ab8e15bb00b43e143c
+  metadata.gz: 9ad0910912fcf5e226e00955c72cd3325796acc49137ce2e9c141fdcaa5518585fb38795de6587cd792d22b428110d08592a212220dc09a91f3e92016140a86a
+  data.tar.gz: 5b8de2a4c57c893d6e18ef4ce5b876a032f0cc0f3726504753a0dfbf9b7b4e4bf18d2b6b7aadf1a976231079d285872a6203e352d66fe26154883c31237d9aca

data/.rake_tasks ADDED Viewed

@@ -0,0 +1,3 @@
+gem
+irb
+spec

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,18 @@
 ## CHANGELOG
+### 1.13.2
+- Optimize index size by removing ranges that have codepoints embedded
+- Optimize index size by substituting common words
+- Fix missing Tangut ideographs
+### 1.13.1
+Bugfix release:
+- Fix medial vowels not generated correctly for Hangul syllables #1
+- Unicode::Name.readable now also applies correction if one exists
 ### 1.13.0
 - Unicode 16.0
@@ -22,59 +35,58 @@
 ### 1.8.0
-* Unicode 12.1
+- Unicode 12.1
 ### 1.7.1
-* Push unicode-types dependency to 1.4.0
+- Push unicode-types dependency to 1.4.0
 ### 1.7.0
-* Unicode 12
+- Unicode 12
 ### 1.6.0
-* Unicode 11
-* Do not depend on rubygems (only use zlib stdlib for unzipping)
+- Unicode 11
+- Do not depend on rubygems (only use zlib stdlib for unzipping)
 ### 1.5.2
-* Explicitly load rubygems/util, fixes regression in 1.5.1
+- Explicitly load rubygems/util, fixes regression in 1.5.1
 ### 1.5.1
-* Use `Gem::Util` for `gunzip`, removes deprecation warning
+- Use `Gem::Util` for `gunzip`, removes deprecation warning
 ### 1.5.0
-* Unicode 10
+- Unicode 10
 ### 1.4.2
-* Fix that Unicode::Name.correct would not fail if codepoint has aliases but no correction
+- Fix that Unicode::Name.correct would not fail if codepoint has aliases but no correction
 ### 1.4.1
-* Be compatible with Ruby 2.4's surrogate literals
-* Bump unicode-types dependency
+- Be compatible with Ruby 2.4's surrogate literals
+- Bump unicode-types dependency
 ### 1.4.0
-* Support Hangul Syllables
+- Support Hangul syllables
 ### 1.3.0
-* Support Unicode 9.0
+- Support Unicode 9.0
 ### 1.2.0
-* Support CJK Ideographs
+- Support CJK Ideographs
 ### 1.1.0
-* Support codepoint labels
+- Support codepoint labels
 ### 1.0.0
-* Initial release
+- Initial release

data/Gemfile.lock CHANGED Viewed

@@ -1,25 +1,25 @@
 PATH
   remote: .
   specs:
-    unicode-name (1.13.0)
+    unicode-name (1.13.2)
       unicode-types (~> 1.10)
 GEM
   remote: https://rubygems.org/
   specs:
-    io-console (0.6.0)
-    irb (1.8.1)
-      rdoc
-      reline (>= 0.3.8)
-    minitest (5.20.0)
-    psych (5.1.0)
+    io-console (0.7.2)
+    irb (1.14.1)
+      rdoc (>= 4.0.0)
+      reline (>= 0.4.2)
+    minitest (5.25.1)
+    psych (5.1.2)
       stringio
-    rake (13.0.6)
-    rdoc (6.5.0)
+    rake (13.2.1)
+    rdoc (6.7.0)
       psych (>= 4.0.0)
-    reline (0.3.8)
+    reline (0.5.10)
       io-console (~> 0.5)
-    stringio (3.0.8)
+    stringio (3.1.1)
     unicode-types (1.10.0)
 PLATFORMS

data/README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Unicode::Name [![[version]](https://badge.fury.io/rb/unicode-name.svg)](https://badge.fury.io/rb/unicode-name)  [![[ci]](https://github.com/janlelis/unicode-name/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-name/actions?query=workflow%3ATest)
+# Unicode::Name [![[version]](https://badge.fury.io/rb/unicode-name.svg)](https://badge.fury.io/rb/unicode-name) [![[ci]](https://github.com/janlelis/unicode-name/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-name/actions?query=workflow%3ATest)
 Return Unicode codepoint names, aliases, and labels.
@@ -6,7 +6,7 @@ Unicode version: **16.0.0** (September 2024)
 Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
-Old Rubies that might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**, **2.X**
+Old Rubies that might still work: **2.X**
 ## Usage
@@ -42,10 +42,16 @@ Unicode::Name.readable("\0") # => "NULL"
 Unicode::Name.readable("\u{FFFFD}") # => "<private-use-FFFFD>"
 ```
-See [unicode-sequence_names](https://github.com/janlelis/unicode-sequence_name) for character names of more complex codepoint sequences.
+See [unicode-sequence_names](https://github.com/janlelis/unicode-sequence_name) for character names of more complex codepoint sequences. This is how you could use both libraries together to get the most relevant name of a character:
+```ruby
+name = Unicode::SequenceName.of(char) || Unicode::Name.readable(char)
+```
 See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
+See [unicode-name.js](https://github.com/janlelis/unicode-name.js) for a JavaScript implementation of this gem.
 ## MIT License
 - Copyright (C) 2016-2024 Jan Lelis <https://janlelis.com>. Released under the MIT license.

data/data/name.marshal.gz CHANGED Viewed

Binary file

data/lib/unicode/name/constants.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module Unicode
   module Name
-    VERSION = "1.13.0"
+    VERSION = "1.13.2"
     UNICODE_VERSION = "16.0.0"
     DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
     INDEX_FILENAME = (DATA_DIRECTORY + "/name.marshal.gz").freeze

data/lib/unicode/name.rb CHANGED Viewed

@@ -11,11 +11,18 @@ module Unicode
     def self.unicode_name(char)
       codepoint = char.unpack("U")[0]
       require_relative "name/index" unless defined? ::Unicode::Name::INDEX
       if res = INDEX[:NAMES][codepoint]
-        res
-      elsif INDEX[:CJK].any?{ |cjk_range| codepoint >= cjk_range[0] && codepoint <= cjk_range[1] }
-        "CJK UNIFIED IDEOGRAPH-%.4X" % codepoint
-      elsif codepoint >= HANGUL_START && codepoint <= HANGUL_END
+        return insert_words(res)
+      end
+      INDEX[:CP_RANGES].each{|prefix, range|
+        if range.any?{ |range| codepoint >= range[0] && codepoint <= range[1] }
+          return "%s%.4X" %[prefix, codepoint]
+        end
+      }
+      if codepoint >= HANGUL_START && codepoint <= HANGUL_END
         "HANGUL SYLLABLE %s" % hangul_decomposition(codepoint)
       else
         nil
@@ -63,7 +70,7 @@ module Unicode
     end
     def self.readable(char)
-      unicode_name(char) ||
+      correct(char) ||
       ( as = aliases(char) ) &&
       ( as[:control]      && as[:control][0]      ||
         as[:figment]      && as[:figment][0]      ||
@@ -78,10 +85,21 @@ module Unicode
     def self.hangul_decomposition(codepoint)
       base = codepoint - HANGUL_START
       final = base % HANGUL_FINAL_MAX
-      medial = (base - final) % HANGUL_MEDIAL_MAX
+      medial = (base % HANGUL_MEDIAL_MAX) / HANGUL_FINAL_MAX
       initial = base / HANGUL_MEDIAL_MAX
       "#{INDEX[:JAMO][:INITIAL][initial]}#{INDEX[:JAMO][:MEDIAL][medial]}#{INDEX[:JAMO][:FINAL][final]}"
     end
+    def self.insert_words(raw_name)
+      raw_name.chars.map{ |char|
+        codepoint = char.ord
+        if codepoint < INDEX[:REPLACE_BASE]
+          char
+        else
+          "#{INDEX[:COMMON_WORDS][codepoint - INDEX[:REPLACE_BASE]]} "
+        end
+      }.join.chomp
+    end
   end
 end

data/spec/unicode_name_spec.rb CHANGED Viewed

@@ -9,13 +9,24 @@ describe Unicode::Name do
       assert_equal "REPLACEMENT CHARACTER", Unicode::Name.of("�")
     end
-    it "works for CJK Ideographs" do
+    it "works for CJK unified ideographs" do
       assert_equal "CJK UNIFIED IDEOGRAPH-4E01", Unicode::Name.of("丁")
     end
-    it "works for Hangul Syllables" do
+    it "works for Hangul syllables" do
       assert_equal "HANGUL SYLLABLE HAN", Unicode::Name.of("한")
       assert_equal "HANGUL SYLLABLE GAG", Unicode::Name.of("각")
+      assert_equal "HANGUL SYLLABLE GAE", Unicode::Name.of("개")
+      assert_equal "HANGUL SYLLABLE GAENG", Unicode::Name.of("갱")
+      assert_equal "HANGUL SYLLABLE DWALB", Unicode::Name.of("돫")
+    end
+    it "works with some ranges that have the codepoint embedded" do
+      assert_equal "EGYPTIAN HIEROGLYPH-143F5", Unicode::Name.of("𔏵")
+      assert_equal "KHITAN SMALL SCRIPT CHARACTER-18C12", Unicode::Name.of("𘰒")
+      assert_equal "TANGUT IDEOGRAPH-18D00", Unicode::Name.of("𘴀")
+      assert_equal "NUSHU CHARACTER-1B171", Unicode::Name.of("𛅱")
+      assert_equal "CJK COMPATIBILITY IDEOGRAPH-2F9B1", Unicode::Name.of("𧃒")
     end
     it "will return nil for characters without name" do
@@ -89,6 +100,7 @@ describe Unicode::Name do
   describe ".readable" do
     it "will return best readable representation of a codepoint" do
       assert_equal "LATIN CAPITAL LETTER A", Unicode::Name.readable("A")
+      assert_equal "LATIN CAPITAL LETTER GHA", Unicode::Name.readable("Ƣ")
       assert_equal "NULL", Unicode::Name.readable("\0")
       assert_equal "<noncharacter-FFFFF>", Unicode::Name.readable("\u{FFFFF}")
       assert_equal "<reserved-10C50>", Unicode::Name.readable("\u{10C50}")

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: unicode-name
 version: !ruby/object:Gem::Version
-  version: 1.13.0
+  version: 1.13.2
 platform: ruby
 authors:
 - Jan Lelis
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-09-13 00:00:00.000000000 Z
+date: 2024-10-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: unicode-types
@@ -33,6 +33,7 @@ extensions: []
 extra_rdoc_files: []
 files:
 - ".gitignore"
+- ".rake_tasks"
 - CHANGELOG.md
 - CODE_OF_CONDUCT.md
 - Gemfile
@@ -67,7 +68,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.5.9
+rubygems_version: 3.5.21
 signing_key:
 specification_version: 4
 summary: Returns name/aliases/label of a Unicode codepoint