unicode-confusable 1.11.0 → 1.12.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 11b387c2c116918a68b3d27168aa46d664b7805060030130dae1c8daa31c8582
4
- data.tar.gz: 6e52671b52b5f7f826929df05d815f195743a502deb7fc427f97ba900e7a1d73
3
+ metadata.gz: decb589505c410acb8b064c0aa0b412f4178bac748a7b9eb0fb7a7d6d3b00da7
4
+ data.tar.gz: 5e60c81b398e29a1914cb356476b47c4c3b95312970d1c22064e7083c133c3b3
5
5
  SHA512:
6
- metadata.gz: e31bae98039b1b83dde599b4f7442a8afc83cc21f1a7085da012af5acf1447faa659f2f9b27a0644b23d1975446c74e4358046c9884b7b3403c90ea0a762ace1
7
- data.tar.gz: dc86a08f2b475448292e7a669ac929577f5299c33d2292d5c6004e82ee19fda19db7dfecd1fd739cf383e8dabc1b0d767483fde89e7b88c9062d5ac58286102f
6
+ metadata.gz: fc871a5e4a4291c95b0218e61f9a7125f872f5e27b94e1f63b5f33f36e6188bf90f147fcc82e24ea4d3c1153ba2cc364284124049f15a25914d6a520da33428c
7
+ data.tar.gz: de60105cb97d2255cf9dd930a767e019a7e5e0d1e6deb4d2201bc4e726c203cdf20ebf29417ae3c19d552bb93d0a9a90802a181513136e07701aec05c89fb885
data/CHANGELOG.md CHANGED
@@ -1,5 +1,10 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 1.12.0
4
+
5
+ - Remove default ignorable codepoints, which is now part of the skeleton algorithm
6
+ - Fix the confusable list for ";" (wrongly contained null bytes)
7
+
3
8
  ### 1.11.0
4
9
 
5
10
  - Unicode 16.0
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- unicode-confusable (1.11.0)
4
+ unicode-confusable (1.12.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -1,14 +1,14 @@
1
1
  # Unicode::Confusable [![[version]](https://badge.fury.io/rb/unicode-confusable.svg)](https://badge.fury.io/rb/unicode-confusable) [![[ci]](https://github.com/janlelis/unicode-confusable/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-confusable/actions?query=workflow%3ATest)
2
2
 
3
- Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), replacing [confusable characters](https://unicode.org/Public/security/12.1.0/confusables.txt), and normalizing the string again.
3
+ Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), removing ignorable characters, replacing [confusable characters](https://unicode.org/Public/security/16.0.0/confusables.txt), and normalizing the string again.
4
4
 
5
5
  Unicode version: **16.0.0** (September 2024)
6
6
 
7
7
  \* The Unicode normalization [depends on your Ruby version](https://idiosyncratic-ruby.com/73-unicode-version-mapping.html)
8
8
 
9
- Supported Rubies: **3.3**, **3.2**, **3.1**, **3.0**
9
+ Please note: The TR39 standard now includes detection of confusables based on bidi formatting (i.e. right-to-left text). This is currently not supported by this gen.
10
10
 
11
- Old Rubies which might still work: **2.7**, **2.6**, **2.5**, **2.4**, **2.3**, **2.2**
11
+ Supported Rubies: **3.x** (might stil work: **2.x**)
12
12
 
13
13
  ## Usage
14
14
 
@@ -49,6 +49,10 @@ Unicode::Confusable.list("o")
49
49
  # => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "੦", "૦", "௦", "౦", "೦", "൦", "๐", "໐", "၀", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"]
50
50
  ```
51
51
 
52
+ ## No Bidi-Confusable Check
53
+
54
+ Testing for bidirectional confusables is currently not supported.
55
+
52
56
  ## No Advanced Detection
53
57
 
54
58
  TR 39 also describes mechanisms for a more exact recognition of confusables, also within the same string:
@@ -57,7 +61,7 @@ TR 39 also describes mechanisms for a more exact recognition of confusables, als
57
61
  - Mixed-script confusable
58
62
  - Whole-script confusable
59
63
 
60
- This is currently **not** supported by this gem.
64
+ This is currently not supported by this gem.
61
65
 
62
66
  See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
63
67
 
Binary file
@@ -2,7 +2,7 @@
2
2
 
3
3
  module Unicode
4
4
  module Confusable
5
- VERSION = "1.11.0"
5
+ VERSION = "1.12.0"
6
6
  UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
8
8
  INDEX_FILENAME = (DATA_DIRECTORY + "/confusable.marshal.gz").freeze
@@ -0,0 +1,9 @@
1
+ require_relative 'index' unless defined? ::Unicode::Confusable::INDEX
2
+
3
+ module Unicode
4
+ module Confusable
5
+ IGNORABLE = INDEX[:IGNORABLE].reduce([]){|acc, cur|
6
+ acc + [*(cur.is_a?(Array) ? Range.new(*cur) : cur)]
7
+ }.freeze
8
+ end
9
+ end
@@ -4,6 +4,8 @@ require 'unicode_normalize/normalize'
4
4
 
5
5
  module Unicode
6
6
  module Confusable
7
+ autoload :IGNORABLE, File.expand_path('confusable/ignorable', __dir__)
8
+
7
9
  def self.confusable?(string1, string2)
8
10
  skeleton(string1) == skeleton(string2)
9
11
  end
@@ -12,8 +14,10 @@ module Unicode
12
14
  require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
13
15
  UnicodeNormalize.normalize(
14
16
  UnicodeNormalize.normalize(string, :nfd).each_codepoint.map{ |codepoint|
15
- INDEX[codepoint] || codepoint
16
- }.flatten.pack("U*"), :nfd
17
+ unless IGNORABLE.include?(codepoint)
18
+ INDEX[:CONFUSABLE][codepoint] || codepoint
19
+ end
20
+ }.flatten.compact.pack("U*"), :nfd
17
21
  )
18
22
  end
19
23
 
@@ -21,9 +25,9 @@ module Unicode
21
25
  require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
22
26
  codepoint = char.codepoints.first or raise ArgumentError, "no data given to Unicode::Confusable.list"
23
27
  if partial_mapping_allowed
24
- INDEX.select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
28
+ INDEX[:CONFUSABLE].select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
25
29
  else
26
- INDEX.select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
30
+ INDEX[:CONFUSABLE].select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
27
31
  end
28
32
  end
29
33
  end
@@ -2,16 +2,28 @@ require_relative "../lib/unicode/confusable"
2
2
  require "minitest/autorun"
3
3
 
4
4
  describe Unicode::Confusable do
5
- it "will detect official confusables" do
6
- assert_equal true, Unicode::Confusable.confusable?("1", "l")
7
- assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
8
- assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
9
- assert_equal true, Unicode::Confusable.confusable?("", "??")
5
+ describe ".confusable?(string1, string2)" do
6
+ it "will detect official confusables" do
7
+ assert_equal true, Unicode::Confusable.confusable?("1", "l")
8
+ assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
9
+ assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
10
+ assert_equal true, Unicode::Confusable.confusable?("⁇", "??")
11
+ end
12
+
13
+ it "will return false for non-confusables" do
14
+ assert_equal false, Unicode::Confusable.confusable?("a", "b")
15
+ assert_equal false, Unicode::Confusable.confusable?("⁇", "?")
16
+ end
10
17
  end
11
18
 
12
- it "will return false for non-confusables" do
13
- assert_equal false, Unicode::Confusable.confusable?("a", "b")
14
- assert_equal false, Unicode::Confusable.confusable?("", "?")
19
+ describe ".skeleton(string)" do
20
+ it "returns internal skeleton representation" do
21
+ assert_equal "Ruby", Unicode::Confusable.skeleton("ℜ𝘂ᖯʏ")
22
+ end
23
+
24
+ it "will remove default ignorable codepoints" do
25
+ assert_equal "ab", Unicode::Confusable.skeleton("a\u{FE0F}b")
26
+ end
15
27
  end
16
28
 
17
29
  describe ".list(char)" do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-confusable
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.11.0
4
+ version: 1.12.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2024-09-13 00:00:00.000000000 Z
11
+ date: 2024-10-30 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: "[Unicode 16.0.0] Compares two strings if they are visually confusable
14
14
  as described in Unicode® Technical Standard #39: Both strings get transformed into
@@ -31,6 +31,7 @@ files:
31
31
  - data/confusable.marshal.gz
32
32
  - lib/unicode/confusable.rb
33
33
  - lib/unicode/confusable/constants.rb
34
+ - lib/unicode/confusable/ignorable.rb
34
35
  - lib/unicode/confusable/index.rb
35
36
  - lib/unicode/confusable/string_ext.rb
36
37
  - sig/unicode-confusable.rbs
@@ -56,7 +57,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
56
57
  - !ruby/object:Gem::Version
57
58
  version: '0'
58
59
  requirements: []
59
- rubygems_version: 3.5.9
60
+ rubygems_version: 3.5.21
60
61
  signing_key:
61
62
  specification_version: 4
62
63
  summary: Detect characters that look visually similar.