unicode-confusable 1.10.0 → 1.12.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e001a3462dfe6d2671e8c9e0aa586a5955627379d2a3cadabc2e3e331628c90a
4
- data.tar.gz: c4d36984ad85c3d37965b0934c2f21ddc71afd8e1ea50366817cfe72ef6e7679
3
+ metadata.gz: decb589505c410acb8b064c0aa0b412f4178bac748a7b9eb0fb7a7d6d3b00da7
4
+ data.tar.gz: 5e60c81b398e29a1914cb356476b47c4c3b95312970d1c22064e7083c133c3b3
5
5
  SHA512:
6
- metadata.gz: 55bde5498357e1953f35644f4a044fce2dd4cbc35ebc094b74642de73b57fdf755a379edaeb2f1f6d0dffda822894e827ec51fd69aea11f996279165678da62b
7
- data.tar.gz: 4c40931571b97cfeb0d5e7ecb488617e9b91ef9f595960f51cf830463619e4c4a22581d80f88c62fd5951506352def6c03408b3e3068241bb3a0b29bca2e7a44
6
+ metadata.gz: fc871a5e4a4291c95b0218e61f9a7125f872f5e27b94e1f63b5f33f36e6188bf90f147fcc82e24ea4d3c1153ba2cc364284124049f15a25914d6a520da33428c
7
+ data.tar.gz: de60105cb97d2255cf9dd930a767e019a7e5e0d1e6deb4d2201bc4e726c203cdf20ebf29417ae3c19d552bb93d0a9a90802a181513136e07701aec05c89fb885
data/CHANGELOG.md CHANGED
@@ -1,5 +1,14 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 1.12.0
4
+
5
+ - Remove default ignorable codepoints, which is now part of the skeleton algorithm
6
+ - Fix the confusable list for ";" (wrongly contained null bytes)
7
+
8
+ ### 1.11.0
9
+
10
+ - Unicode 16.0
11
+
3
12
  ### 1.10.0
4
13
 
5
14
  - Unicode 15.1
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- unicode-confusable (1.10.0)
4
+ unicode-confusable (1.12.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
@@ -30,4 +30,4 @@ DEPENDENCIES
30
30
  unicode-confusable!
31
31
 
32
32
  BUNDLED WITH
33
- 2.1.4
33
+ 2.5.17
data/MIT-LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2016-2023 Jan Lelis, https://janlelis.com
1
+ Copyright (c) 2016-2024 Jan Lelis, https://janlelis.com
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining
4
4
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,14 +1,14 @@
1
1
  # Unicode::Confusable [![[version]](https://badge.fury.io/rb/unicode-confusable.svg)](https://badge.fury.io/rb/unicode-confusable) [![[ci]](https://github.com/janlelis/unicode-confusable/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-confusable/actions?query=workflow%3ATest)
2
2
 
3
- Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), replacing [confusable characters](https://unicode.org/Public/security/12.1.0/confusables.txt), and normalizing the string again.
3
+ Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), removing ignorable characters, replacing [confusable characters](https://unicode.org/Public/security/16.0.0/confusables.txt), and normalizing the string again.
4
4
 
5
- Unicode version: **15.1.0** (September 2023)
5
+ Unicode version: **16.0.0** (September 2024)
6
6
 
7
7
  \* The Unicode normalization [depends on your Ruby version](https://idiosyncratic-ruby.com/73-unicode-version-mapping.html)
8
8
 
9
- Supported Rubies: **3.2**, **3.1**, **3.0**
9
+ Please note: The TR39 standard now includes detection of confusables based on bidi formatting (i.e. right-to-left text). This is currently not supported by this gen.
10
10
 
11
- Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**, **2.2**
11
+ Supported Rubies: **3.x** (might stil work: **2.x**)
12
12
 
13
13
  ## Usage
14
14
 
@@ -49,6 +49,10 @@ Unicode::Confusable.list("o")
49
49
  # => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "੦", "૦", "௦", "౦", "೦", "൦", "๐", "໐", "၀", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"]
50
50
  ```
51
51
 
52
+ ## No Bidi-Confusable Check
53
+
54
+ Testing for bidirectional confusables is currently not supported.
55
+
52
56
  ## No Advanced Detection
53
57
 
54
58
  TR 39 also describes mechanisms for a more exact recognition of confusables, also within the same string:
@@ -57,11 +61,11 @@ TR 39 also describes mechanisms for a more exact recognition of confusables, als
57
61
  - Mixed-script confusable
58
62
  - Whole-script confusable
59
63
 
60
- This is currently **not** supported by this gem.
64
+ This is currently not supported by this gem.
61
65
 
62
66
  See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
63
67
 
64
68
  ## MIT License
65
69
 
66
- - Copyright (C) 2016-2023 Jan Lelis <https://janlelis.com>. Released under the MIT license.
70
+ - Copyright (C) 2016-2024 Jan Lelis <https://janlelis.com>. Released under the MIT license.
67
71
  - Unicode data: https://www.unicode.org/copyright.html#Exhibit1
Binary file
@@ -2,8 +2,8 @@
2
2
 
3
3
  module Unicode
4
4
  module Confusable
5
- VERSION = "1.10.0"
6
- UNICODE_VERSION = "15.1.0"
5
+ VERSION = "1.12.0"
6
+ UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
8
8
  INDEX_FILENAME = (DATA_DIRECTORY + "/confusable.marshal.gz").freeze
9
9
  end
@@ -0,0 +1,9 @@
1
+ require_relative 'index' unless defined? ::Unicode::Confusable::INDEX
2
+
3
+ module Unicode
4
+ module Confusable
5
+ IGNORABLE = INDEX[:IGNORABLE].reduce([]){|acc, cur|
6
+ acc + [*(cur.is_a?(Array) ? Range.new(*cur) : cur)]
7
+ }.freeze
8
+ end
9
+ end
@@ -4,6 +4,8 @@ require 'unicode_normalize/normalize'
4
4
 
5
5
  module Unicode
6
6
  module Confusable
7
+ autoload :IGNORABLE, File.expand_path('confusable/ignorable', __dir__)
8
+
7
9
  def self.confusable?(string1, string2)
8
10
  skeleton(string1) == skeleton(string2)
9
11
  end
@@ -12,8 +14,10 @@ module Unicode
12
14
  require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
13
15
  UnicodeNormalize.normalize(
14
16
  UnicodeNormalize.normalize(string, :nfd).each_codepoint.map{ |codepoint|
15
- INDEX[codepoint] || codepoint
16
- }.flatten.pack("U*"), :nfd
17
+ unless IGNORABLE.include?(codepoint)
18
+ INDEX[:CONFUSABLE][codepoint] || codepoint
19
+ end
20
+ }.flatten.compact.pack("U*"), :nfd
17
21
  )
18
22
  end
19
23
 
@@ -21,9 +25,9 @@ module Unicode
21
25
  require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
22
26
  codepoint = char.codepoints.first or raise ArgumentError, "no data given to Unicode::Confusable.list"
23
27
  if partial_mapping_allowed
24
- INDEX.select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
28
+ INDEX[:CONFUSABLE].select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
25
29
  else
26
- INDEX.select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
30
+ INDEX[:CONFUSABLE].select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
27
31
  end
28
32
  end
29
33
  end
@@ -2,16 +2,28 @@ require_relative "../lib/unicode/confusable"
2
2
  require "minitest/autorun"
3
3
 
4
4
  describe Unicode::Confusable do
5
- it "will detect official confusables" do
6
- assert_equal true, Unicode::Confusable.confusable?("1", "l")
7
- assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
8
- assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
9
- assert_equal true, Unicode::Confusable.confusable?("", "??")
5
+ describe ".confusable?(string1, string2)" do
6
+ it "will detect official confusables" do
7
+ assert_equal true, Unicode::Confusable.confusable?("1", "l")
8
+ assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
9
+ assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
10
+ assert_equal true, Unicode::Confusable.confusable?("⁇", "??")
11
+ end
12
+
13
+ it "will return false for non-confusables" do
14
+ assert_equal false, Unicode::Confusable.confusable?("a", "b")
15
+ assert_equal false, Unicode::Confusable.confusable?("⁇", "?")
16
+ end
10
17
  end
11
18
 
12
- it "will return false for non-confusables" do
13
- assert_equal false, Unicode::Confusable.confusable?("a", "b")
14
- assert_equal false, Unicode::Confusable.confusable?("", "?")
19
+ describe ".skeleton(string)" do
20
+ it "returns internal skeleton representation" do
21
+ assert_equal "Ruby", Unicode::Confusable.skeleton("ℜ𝘂ᖯʏ")
22
+ end
23
+
24
+ it "will remove default ignorable codepoints" do
25
+ assert_equal "ab", Unicode::Confusable.skeleton("a\u{FE0F}b")
26
+ end
15
27
  end
16
28
 
17
29
  describe ".list(char)" do
metadata CHANGED
@@ -1,16 +1,16 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-confusable
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.10.0
4
+ version: 1.12.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-10-01 00:00:00.000000000 Z
11
+ date: 2024-10-30 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: "[Unicode 15.1.0] Compares two strings if they are visually confusable
13
+ description: "[Unicode 16.0.0] Compares two strings if they are visually confusable
14
14
  as described in Unicode® Technical Standard #39: Both strings get transformed into
15
15
  a skeleton format before comparing them. The skeleton is generated by normalizing
16
16
  the string, replacing confusable characters, and then normalizing the string again."
@@ -31,6 +31,7 @@ files:
31
31
  - data/confusable.marshal.gz
32
32
  - lib/unicode/confusable.rb
33
33
  - lib/unicode/confusable/constants.rb
34
+ - lib/unicode/confusable/ignorable.rb
34
35
  - lib/unicode/confusable/index.rb
35
36
  - lib/unicode/confusable/string_ext.rb
36
37
  - sig/unicode-confusable.rbs
@@ -41,7 +42,7 @@ licenses:
41
42
  - MIT
42
43
  metadata:
43
44
  rubygems_mfa_required: 'true'
44
- post_install_message:
45
+ post_install_message:
45
46
  rdoc_options: []
46
47
  require_paths:
47
48
  - lib
@@ -56,8 +57,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
56
57
  - !ruby/object:Gem::Version
57
58
  version: '0'
58
59
  requirements: []
59
- rubygems_version: 3.4.4
60
- signing_key:
60
+ rubygems_version: 3.5.21
61
+ signing_key:
61
62
  specification_version: 4
62
63
  summary: Detect characters that look visually similar.
63
64
  test_files: