unicode-confusable 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d8c8e349451ea68559ba19ff01ac84268c6f0ddb
4
+ data.tar.gz: 209ca999b1767568c2e5fe80c7ac455605770339
5
+ SHA512:
6
+ metadata.gz: 6790f06f877a5d792a7307b4bd422708d9eb98da4e581637fd8cfd4dae254ba6988768785b210514d09ffb4d62bc905a45ad41eb8cd67a0fcf9a3f3c2d6de6b9
7
+ data.tar.gz: be3dc3e9bf16d151f4beb8e6ae39989fe10252c71e545ce5b485654a9a21b2b3da617bf0dc898804eb4b873ea4a5808a4d4feda058d826d334601aad95e45839
data/.gitignore ADDED
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
data/.travis.yml ADDED
@@ -0,0 +1,20 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ script: bundle exec ruby spec/unicode_confusable_spec.rb
5
+
6
+ rvm:
7
+ - 2.3.0
8
+ - 2.2
9
+ - ruby-head
10
+ - rbx-2
11
+ - jruby-head
12
+ - jruby-9.0.5.0
13
+
14
+ cache:
15
+ - bundler
16
+
17
+ matrix:
18
+ allow_failures:
19
+ - rvm: jruby-head
20
+ - rvm: rbx-2
data/CHANGELOG.md ADDED
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 1.0.0
4
+
5
+ * Inital release
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
data/MIT-LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2016 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,40 @@
1
+ # Unicode::Confusable [![[version]](https://badge.fury.io/rb/unicode-confusable.svg)](http://badge.fury.io/rb/unicode-confusable) [![[travis]](https://travis-ci.org/janlelis/unicode-confusable.png)](https://travis-ci.org/janlelis/unicode-confusable)
2
+
3
+ Compares two strings if they are visually confusable as described in [Unicode® Technical Standard #39](http://www.unicode.org/reports/tr39/#Confusable_Detection): Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string, replacing [confusable characters](ftp://ftp.unicode.org/Public/security/8.0.0/confusables.txt), and normalizing the string again. Please note: The skeleton is an intermediate representation, not meant for any other use than testing confusability.
4
+
5
+ Unicode version: **8.0.0**
6
+
7
+ Supported Rubies: **2.3**, **2.2**
8
+
9
+ ## `Gemfile`
10
+
11
+ ```ruby
12
+ gem "unicode-confusable"
13
+ ```
14
+
15
+ ## Usage
16
+
17
+ ```ruby
18
+ require "unicode/confusable"
19
+
20
+ Unicode::Confusable.confusable? "a", "b" # => false
21
+ Unicode::Confusable.confusable? "ℜ𝘂ᖯʏ", "Ruby" # => true
22
+ Unicode::Confusable.confusable? "Michael", "Michae1" # => true
23
+ Unicode::Confusable.confusable? "⁇", "?" # => false
24
+ Unicode::Confusable.confusable? "⁇", "??" # => true
25
+ ```
26
+
27
+ ## No Advanced Detection
28
+
29
+ TR 39 also describes mechanisms for a more exact recognition of confusables, also within the same string:
30
+
31
+ - Single-script confusable
32
+ - Mixed-script confusable
33
+ - Whole-script confusable
34
+
35
+ This is (currently) **not** supported by this gem.
36
+
37
+ ## MIT License
38
+
39
+ - Copyright (C) 2016 Jan Lelis <http://janlelis.com>. Released under the MIT license.
40
+ - Unicode data: http://www.unicode.org/copyright.html#Exhibit1
data/Rakefile ADDED
@@ -0,0 +1,37 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+ # # #
11
+ # Gem build and install task
12
+
13
+ desc info
14
+ task :gem do
15
+ puts info + "\n\n"
16
+ print " "; sh "gem build #{gemspec_file}"
17
+ FileUtils.mkdir_p 'pkg'
18
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
19
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
20
+ end
21
+
22
+ # # #
23
+ # Start an IRB session with the gem loaded
24
+
25
+ desc "#{gemspec.name} | IRB"
26
+ task :irb do
27
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}"
28
+ end
29
+
30
+ # # #
31
+ # Run Specs
32
+
33
+ desc "#{gemspec.name} | Spec"
34
+ task :spec do
35
+ sh "for file in spec/*.rb; do ruby $file; done"
36
+ end
37
+ task default: :spec
Binary file
@@ -0,0 +1,22 @@
1
+ require_relative "confusable/constants"
2
+ require_relative "confusable/index"
3
+
4
+ require 'unicode_normalize/normalize'
5
+
6
+ module Unicode
7
+ module Confusable
8
+ def self.confusable?(string1, string2)
9
+ skeleton(string1) == skeleton(string2)
10
+ end
11
+
12
+ def self.skeleton(string)
13
+ require_relative 'display_width/index' unless defined? ::Unicode::Confusable::INDEX
14
+ UnicodeNormalize.normalize(
15
+ UnicodeNormalize.normalize(string, :nfd).each_codepoint.map{ |codepoint|
16
+ INDEX[codepoint] || codepoint
17
+ }.flatten.pack("U*"), :nfd
18
+ )
19
+ end
20
+ end
21
+ end
22
+
@@ -0,0 +1,8 @@
1
+ module Unicode
2
+ module Confusable
3
+ VERSION = "1.0.0".freeze
4
+ DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + '/../../../data/').freeze
5
+ INDEX_FILENAME = (DATA_DIRECTORY + '/confusable.marshal.gz').freeze
6
+ end
7
+ end
8
+
@@ -0,0 +1,7 @@
1
+ require_relative 'constants'
2
+
3
+ module Unicode
4
+ module Confusable
5
+ INDEX = Marshal.load(Gem.gunzip(File.binread(INDEX_FILENAME)))
6
+ end
7
+ end
@@ -0,0 +1,8 @@
1
+ require_relative "../confusable"
2
+
3
+ class String
4
+ # Optional core extension for your convenience
5
+ def confusable?(other)
6
+ Unicode::Confusable.compare(self, other)
7
+ end
8
+ end
@@ -0,0 +1,17 @@
1
+ require_relative "../lib/unicode/confusable"
2
+ require "minitest/autorun"
3
+
4
+ describe Unicode::Confusable do
5
+ it "will detect official confusables" do
6
+ assert_equal true, Unicode::Confusable.confusable?("1", "l")
7
+ assert_equal true, Unicode::Confusable.confusable?("ℜ𝘂ᖯʏ", "Ruby")
8
+ assert_equal true, Unicode::Confusable.confusable?("Michael", "Michae1")
9
+ assert_equal true, Unicode::Confusable.confusable?("⁇", "??")
10
+ end
11
+
12
+ it "will return false for non-confusables" do
13
+ assert_equal false, Unicode::Confusable.confusable?("a", "b")
14
+ assert_equal false, Unicode::Confusable.confusable?("⁇", "?")
15
+ end
16
+ end
17
+
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/unicode/confusable/constants"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "unicode-confusable"
7
+ gem.version = Unicode::Confusable::VERSION
8
+ gem.summary = "Detect characters that look visually similar."
9
+ gem.description = "Compares two strings if they are visually confusable as described in Unicode® Technical Standard #39: Both strings get transformed into a skeleton format before comparing them. The skeleton is generated by normalizing the string, replacing confusable characters, and normalizing the string again."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/unicode-confusable"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^pkg/ }
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.2"
21
+ end
metadata ADDED
@@ -0,0 +1,64 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: unicode-confusable
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-03-12 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description: 'Compares two strings if they are visually confusable as described in
14
+ Unicode® Technical Standard #39: Both strings get transformed into a skeleton format
15
+ before comparing them. The skeleton is generated by normalizing the string, replacing
16
+ confusable characters, and normalizing the string again.'
17
+ email:
18
+ - mail@janlelis.de
19
+ executables: []
20
+ extensions: []
21
+ extra_rdoc_files: []
22
+ files:
23
+ - ".gitignore"
24
+ - ".travis.yml"
25
+ - CHANGELOG.md
26
+ - CODE_OF_CONDUCT.md
27
+ - Gemfile
28
+ - MIT-LICENSE.txt
29
+ - README.md
30
+ - Rakefile
31
+ - data/confusable.marshal.gz
32
+ - lib/unicode/confusable.rb
33
+ - lib/unicode/confusable/constants.rb
34
+ - lib/unicode/confusable/index.rb
35
+ - lib/unicode/confusable/string_ext.rb
36
+ - spec/unicode_confusable_spec.rb
37
+ - unicode-confusable.gemspec
38
+ homepage: https://github.com/janlelis/unicode-confusable
39
+ licenses:
40
+ - MIT
41
+ metadata: {}
42
+ post_install_message:
43
+ rdoc_options: []
44
+ require_paths:
45
+ - lib
46
+ required_ruby_version: !ruby/object:Gem::Requirement
47
+ requirements:
48
+ - - "~>"
49
+ - !ruby/object:Gem::Version
50
+ version: '2.2'
51
+ required_rubygems_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: '0'
56
+ requirements: []
57
+ rubyforge_project:
58
+ rubygems_version: 2.5.1
59
+ signing_key:
60
+ specification_version: 4
61
+ summary: Detect characters that look visually similar.
62
+ test_files:
63
+ - spec/unicode_confusable_spec.rb
64
+ has_rdoc: