unicode-confusable 1.11.0 → 1.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/Gemfile.lock +1 -1
- data/MIT-LICENSE.txt +1 -1
- data/README.md +12 -14
- data/data/confusable.marshal.gz +0 -0
- data/lib/unicode/confusable/constants.rb +2 -2
- data/lib/unicode/confusable/ignorable.rb +9 -0
- data/lib/unicode/confusable.rb +8 -4
- data/spec/unicode_confusable_spec.rb +22 -10
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b0fb80c9f71641f4246d0b7fe969d6495c09678168204cf88dbf660a08e83e96
|
4
|
+
data.tar.gz: 71b09a54d8a909ab9a4718b0d3a561572cf7325e8f91febafadcfdc4d68b6b42
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6b2d5fd1fd5fa69645693e266c4180edc4c6b042289f28f69a424f116d2f3ca4dd17560d824bd164942079c56891c016497b0eac9b5d3bd513e27b9ec03595cb
|
7
|
+
data.tar.gz: f4614d86cb2a393e8e83ddf83f3fd4d65d35ddfacc53dc192439eef1d79c692b0e025853010f6878adce8e8c7d17bbb86f1355f1de6cd939ad09ad9bdb18a3b3
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,14 @@
|
|
1
1
|
## CHANGELOG
|
2
2
|
|
3
|
+
### 1.13.0
|
4
|
+
|
5
|
+
- Unicode 17.0
|
6
|
+
|
7
|
+
### 1.12.0
|
8
|
+
|
9
|
+
- Remove default ignorable codepoints, which is now part of the skeleton algorithm
|
10
|
+
- Fix the confusable list for ";" (wrongly contained null bytes)
|
11
|
+
|
3
12
|
### 1.11.0
|
4
13
|
|
5
14
|
- Unicode 16.0
|
data/Gemfile.lock
CHANGED
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
# Unicode::Confusable [![[version]](https://badge.fury.io/rb/unicode-confusable.svg)](https://badge.fury.io/rb/unicode-confusable) [![[ci]](https://github.com/janlelis/unicode-confusable/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-confusable/actions?query=workflow%3ATest)
|
2
2
|
|
3
|
-
Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), replacing [confusable characters](https://unicode.org/Public/security/
|
3
|
+
Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](https://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string ([NFD](http://unicode.org/reports/tr15/#Norm_Forms)), removing ignorable characters, replacing [confusable characters](https://unicode.org/Public/security/16.0.0/confusables.txt), and normalizing the string again.
|
4
4
|
|
5
|
-
Unicode version: **
|
5
|
+
Unicode version: **17.0.0** (September 2025)
|
6
6
|
|
7
7
|
\* The Unicode normalization [depends on your Ruby version](https://idiosyncratic-ruby.com/73-unicode-version-mapping.html)
|
8
8
|
|
9
|
-
|
9
|
+
Please note: The TR39 standard now includes detection of confusables based on bidi formatting (i.e. right-to-left text). This is currently not supported by this gen.
|
10
10
|
|
11
|
-
|
11
|
+
Supported Rubies: **3.x** (might stil work: **2.x**)
|
12
12
|
|
13
13
|
## Usage
|
14
14
|
|
@@ -35,33 +35,31 @@ Unicode::Confusable.skeleton "ℜ𝘂ᖯʏ" # => "Ruby"
|
|
35
35
|
|
36
36
|
### List
|
37
37
|
|
38
|
-
List all
|
38
|
+
List all characters that map to the confusable exemplar given:
|
39
39
|
|
40
40
|
```ruby
|
41
41
|
Unicode::Confusable.list("o", false)
|
42
|
-
# => ["ం", "ಂ", "ം", "ං", "०", "੦", "૦", "
|
42
|
+
# => ["ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬"]
|
43
43
|
```
|
44
44
|
|
45
45
|
If you omit the second parameter, it will also show confusables, where the given character is just a part of:
|
46
46
|
|
47
47
|
```ruby
|
48
48
|
Unicode::Confusable.list("o")
|
49
|
-
# => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "੦", "૦", "
|
49
|
+
# => ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ⲑ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "თ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"]
|
50
50
|
```
|
51
51
|
|
52
|
-
## No
|
52
|
+
## No Bidi-Confusable Check
|
53
53
|
|
54
|
-
|
54
|
+
Testing for bidirectional confusables is currently not supported.
|
55
55
|
|
56
|
-
|
57
|
-
- Mixed-script confusable
|
58
|
-
- Whole-script confusable
|
56
|
+
## Single-script / Mixed-script / Whole-script
|
59
57
|
|
60
|
-
This is currently
|
58
|
+
TR 39 also describes mechanisms for further categorization of confusables. This is currently not part of this gem, however the [unicode-scripts gem](https://github.com/janlelis/unicode-scripts) does include mixed-script detection, which you can use for this purpose.
|
61
59
|
|
62
60
|
See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
|
63
61
|
|
64
62
|
## MIT License
|
65
63
|
|
66
|
-
- Copyright (C) 2016-
|
64
|
+
- Copyright (C) 2016-2025 Jan Lelis <https://janlelis.com>. Released under the MIT license.
|
67
65
|
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
|
data/data/confusable.marshal.gz
CHANGED
Binary file
|
@@ -2,8 +2,8 @@
|
|
2
2
|
|
3
3
|
module Unicode
|
4
4
|
module Confusable
|
5
|
-
VERSION = "1.
|
6
|
-
UNICODE_VERSION = "
|
5
|
+
VERSION = "1.13.0"
|
6
|
+
UNICODE_VERSION = "17.0.0"
|
7
7
|
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
|
8
8
|
INDEX_FILENAME = (DATA_DIRECTORY + "/confusable.marshal.gz").freeze
|
9
9
|
end
|
data/lib/unicode/confusable.rb
CHANGED
@@ -4,6 +4,8 @@ require 'unicode_normalize/normalize'
|
|
4
4
|
|
5
5
|
module Unicode
|
6
6
|
module Confusable
|
7
|
+
autoload :IGNORABLE, File.expand_path('confusable/ignorable', __dir__)
|
8
|
+
|
7
9
|
def self.confusable?(string1, string2)
|
8
10
|
skeleton(string1) == skeleton(string2)
|
9
11
|
end
|
@@ -12,8 +14,10 @@ module Unicode
|
|
12
14
|
require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
|
13
15
|
UnicodeNormalize.normalize(
|
14
16
|
UnicodeNormalize.normalize(string, :nfd).each_codepoint.map{ |codepoint|
|
15
|
-
|
16
|
-
|
17
|
+
unless IGNORABLE.include?(codepoint)
|
18
|
+
INDEX[:CONFUSABLE][codepoint] || codepoint
|
19
|
+
end
|
20
|
+
}.flatten.compact.pack("U*"), :nfd
|
17
21
|
)
|
18
22
|
end
|
19
23
|
|
@@ -21,9 +25,9 @@ module Unicode
|
|
21
25
|
require_relative 'confusable/index' unless defined? ::Unicode::Confusable::INDEX
|
22
26
|
codepoint = char.codepoints.first or raise ArgumentError, "no data given to Unicode::Confusable.list"
|
23
27
|
if partial_mapping_allowed
|
24
|
-
INDEX.select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
|
28
|
+
INDEX[:CONFUSABLE].select{ |k,v| v == codepoint || v.is_a?(Array) && v.include?(codepoint) }.keys.map{ |codepoint| [codepoint].pack("U*") }
|
25
29
|
else
|
26
|
-
INDEX.select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
|
30
|
+
INDEX[:CONFUSABLE].select{ |k,v| v == codepoint }.keys.map{ |codepoint| [codepoint].pack("U") }
|
27
31
|
end
|
28
32
|
end
|
29
33
|
end
|
@@ -2,27 +2,39 @@ require_relative "../lib/unicode/confusable"
|
|
2
2
|
require "minitest/autorun"
|
3
3
|
|
4
4
|
describe Unicode::Confusable do
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
5
|
+
describe ".confusable?(string1, string2)" do
|
6
|
+
it "will detect official confusables" do
|
7
|
+
assert_equal true, Unicode::Confusable.confusable?("1", "l")
|
8
|
+
assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
|
9
|
+
assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
|
10
|
+
assert_equal true, Unicode::Confusable.confusable?("⁇", "??")
|
11
|
+
end
|
12
|
+
|
13
|
+
it "will return false for non-confusables" do
|
14
|
+
assert_equal false, Unicode::Confusable.confusable?("a", "b")
|
15
|
+
assert_equal false, Unicode::Confusable.confusable?("⁇", "?")
|
16
|
+
end
|
10
17
|
end
|
11
18
|
|
12
|
-
|
13
|
-
|
14
|
-
|
19
|
+
describe ".skeleton(string)" do
|
20
|
+
it "returns internal skeleton representation" do
|
21
|
+
assert_equal "Ruby", Unicode::Confusable.skeleton("ℜ𝘂ᖯʏ")
|
22
|
+
end
|
23
|
+
|
24
|
+
it "will remove default ignorable codepoints" do
|
25
|
+
assert_equal "ab", Unicode::Confusable.skeleton("a\u{FE0F}b")
|
26
|
+
end
|
15
27
|
end
|
16
28
|
|
17
29
|
describe ".list(char)" do
|
18
30
|
it "will return list of confusables for a character, also confusables where given character is part of" do
|
19
|
-
assert_equal ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "੦", "૦", "
|
31
|
+
assert_equal ["⒪", "ꜵ", "℅", "ᴔ", "ꭁ", "ꭂ", "ﷲ", "№", "ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬", "ۿ", "ø", "ꬾ", "ɵ", "ꝋ", "ⲑ", "ө", "ѳ", "ꮎ", "ꮻ", "ꭴ", "ﳙ", "ơ", "œ", "ɶ", "∞", "ꝏ", "ꚙ", "ﳗ", "ﱑ", "ﳘ", "ﱒ", "ﶓ", "ﶔ", "ﱓ", "ﱔ", "ൟ", "თ", "တ", "ꭣ", "ﲠ", "ﳢ", "ﲥ", "ﳤ", "ﷻ", "ﴱ", "ﳨ", "ﴲ", "ﳪ", "ﷺ", "ﷷ", "ﳍ", "ﳖ", "ﳯ", "ﳞ", "ﳱ", "ﳦ", "ﲛ", "ﳠ", "ﯭ", "ﯬ"], Unicode::Confusable.list("o")
|
20
32
|
end
|
21
33
|
end
|
22
34
|
|
23
35
|
describe ".list(char, false)" do
|
24
36
|
it "will return list of confusables for a character, only direct confusables" do
|
25
|
-
assert_equal ["ం", "ಂ", "ം", "ං", "०", "੦", "૦", "
|
37
|
+
assert_equal ["ం", "ಂ", "ം", "ං", "०", "০", "੦", "૦", "୦", "௦", "౦", "൦", "๐", "໐", "၀", "០", "𑓐", "٥", "۵", "o", "ℴ", "𝐨", "𝑜", "𝒐", "𝓸", "𝔬", "𝕠", "𝖔", "𝗈", "𝗼", "𝘰", "𝙤", "𝚘", "ᴏ", "ᴑ", "ꬽ", "ο", "𝛐", "𝜊", "𝝄", "𝝾", "𝞸", "σ", "𝛔", "𝜎", "𝝈", "𝞂", "𝞼", "ⲟ", "ϭ", "о", "ჿ", "օ", "ס", "ه", "𞸤", "𞹤", "𞺄", "ﻫ", "ﻬ", "ﻪ", "ﻩ", "ھ", "ﮬ", "ﮭ", "ﮫ", "ﮪ", "ہ", "ﮨ", "ﮩ", "ﮧ", "ﮦ", "ە", "ഠ", "ဝ", "𐓪", "𑣈", "𑣗", "𐐬"], Unicode::Confusable.list("o", false)
|
26
38
|
end
|
27
39
|
end
|
28
40
|
end
|
metadata
CHANGED
@@ -1,16 +1,16 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-confusable
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.13.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2025-09-09 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: "[Unicode
|
13
|
+
description: "[Unicode 17.0.0] Compares two strings if they are visually confusable
|
14
14
|
as described in Unicode® Technical Standard #39: Both strings get transformed into
|
15
15
|
a skeleton format before comparing them. The skeleton is generated by normalizing
|
16
16
|
the string, replacing confusable characters, and then normalizing the string again."
|
@@ -31,6 +31,7 @@ files:
|
|
31
31
|
- data/confusable.marshal.gz
|
32
32
|
- lib/unicode/confusable.rb
|
33
33
|
- lib/unicode/confusable/constants.rb
|
34
|
+
- lib/unicode/confusable/ignorable.rb
|
34
35
|
- lib/unicode/confusable/index.rb
|
35
36
|
- lib/unicode/confusable/string_ext.rb
|
36
37
|
- sig/unicode-confusable.rbs
|
@@ -56,7 +57,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
56
57
|
- !ruby/object:Gem::Version
|
57
58
|
version: '0'
|
58
59
|
requirements: []
|
59
|
-
rubygems_version: 3.5.
|
60
|
+
rubygems_version: 3.5.21
|
60
61
|
signing_key:
|
61
62
|
specification_version: 4
|
62
63
|
summary: Detect characters that look visually similar.
|