uniscribe 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9f51890eee3fe1e008ed635c68a3598d9e9f5467
4
+ data.tar.gz: d25eb46898d1de180c085960fe858cbbea57dd2f
5
+ SHA512:
6
+ metadata.gz: 766dedd2e18ef926e5a962882250c232db1aae66e6008cb179fc7bfaa88a913b61157d8114527f6bb9eff2a0111137658b0311015bf5679fbefb5a9af6210b0b
7
+ data.tar.gz: bfe050bd760fd40afbbc2dbf6f664d3c2de4cbfff2ffc0cd9ab7d3912f7e3349b57a45de44949ba4e07b06704e08f7796e2bf3675410a5e64583a9240cdaa13c
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,23 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ rvm:
5
+ - ruby-head
6
+ - 2.4.1
7
+ - 2.3.3
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - jruby-head
12
+ - jruby-9.1.8.0
13
+
14
+ cache:
15
+ - bundler
16
+
17
+ matrix:
18
+ allow_failures:
19
+ - rvm: jruby-head
20
+ - rvm: jruby-9.1.8.0
21
+ - rvm: ruby-head
22
+ - rvm: 2.0
23
+ # fast_finish: true
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.1.0
4
+
5
+ * Initial release
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
6
+ gem 'rake'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2017 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,113 @@
1
+ # uniscribe | Describe the Unicode [![[version]](https://badge.fury.io/rb/uniscribe.svg)](http://badge.fury.io/rb/uniscribe) [![[travis]](https://travis-ci.org/janlelis/uniscribe.svg)](https://travis-ci.org/janlelis/uniscribe)
2
+
3
+ Describes Unicode characters with their name and shows compositions.
4
+
5
+ - Helps you understand how glyphs and codepoints are structered within the data
6
+ - Gives you the names of glyphs and codepoints, which can be used for further research
7
+ - Highlights invalid/special/blank codepoints
8
+
9
+ Uses a similar color coding like its lower-level companion tool [unibits](https://github.com/janlelis/unibits).
10
+
11
+ ## Setup
12
+
13
+ Make sure you have Ruby installed and installing gems works properly. Then do:
14
+
15
+ ```
16
+ $ gem install uniscribe
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ Pass the string to debug to uniscribe:
22
+
23
+ ### From CLI
24
+
25
+ ```
26
+ $ uniscribe "test strı̈ng"
27
+ ```
28
+
29
+ ### From Ruby
30
+
31
+ ```ruby
32
+ require "uniscribe/kernel_method"
33
+ uniscribe "test strı̈ng"
34
+ ```
35
+
36
+ ### Output
37
+
38
+ ```
39
+
40
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
41
+ 0065 ├─ e ├─ LATIN SMALL LETTER E
42
+ 0073 ├─ s ├─ LATIN SMALL LETTER S
43
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
44
+ 0020 ├─ ] [ ├─ SPACE
45
+ 0073 ├─ s ├─ LATIN SMALL LETTER S
46
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
47
+ 0072 ├─ r ├─ LATIN SMALL LETTER R
48
+ ---- ├┬ ı̈ ├┬ Composition
49
+ 0131 │├─ ı │├─ LATIN SMALL LETTER DOTLESS I
50
+ 0308 │└─ ◌̈ │└─ COMBINING DIAERESIS
51
+ 006E ├─ n ├─ LATIN SMALL LETTER N
52
+ 0067 ├─ g ├─ LATIN SMALL LETTER G
53
+
54
+ ```
55
+
56
+ ## Examples
57
+
58
+ ### Tamil
59
+
60
+ `>> uniscribe "நகரத்தில்"`
61
+
62
+ ![Screenshot Tamil](/screenshots/tamil.png?raw=true "Tamil")
63
+
64
+ ### Thai
65
+
66
+ `>> uniscribe "ม้าลายหกตัว"`
67
+
68
+ ![Screenshot Thai](/screenshots/thai.png?raw=true "Thai")
69
+
70
+ ### Emoji Sequences
71
+
72
+ `>> uniscribe "3️⃣🤸‍♀"`
73
+
74
+ ![Screenshot Emoji](/screenshots/emoji.png?raw=true "Emoji")
75
+
76
+ ### Lots of Combining Marks
77
+
78
+ `>> uniscribe "̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍"`
79
+
80
+ ![Screenshot Marks](/screenshots/marks.png?raw=true "Marks")
81
+
82
+ ### Some Strange Unicode Characters
83
+
84
+ `>> uniscribe "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}"`
85
+
86
+ ![Screenshot Strange](/screenshots/strange.png?raw=true "Strange")
87
+
88
+ ### Some Blanks
89
+
90
+ `>> uniscribe "­ᅠ 𝅸"`
91
+
92
+ ![Screenshot Blanks](/screenshots/blanks.png?raw=true "Blanks")
93
+
94
+ ## Notes
95
+
96
+ The proper detetion of compositions / graphemes / combined characters depends on your Ruby version:
97
+
98
+ Ruby | Unicode Version
99
+ -----|----------------
100
+ 2.4 | 9.0.0
101
+ 2.3 | 8.0.0
102
+ 2.2 | 7.0.0
103
+ 2.1 | 6.1.0
104
+
105
+ Also see
106
+
107
+ - [unibits](https://github.com/janlelis/unibits) - visualizes Unicode encodings
108
+ - [symbolify](https://github.com/janlelis/symbolify) - used for safely printing individual codepoints
109
+ - [characteristics](https://github.com/janlelis/characteristics) - used for detecting blanks and similar
110
+ - [unicopy](https://github.com/janlelis/unicopy) - copy codepoints to clipboard
111
+ - [Unicode® Standard Annex #29: Unicode Text Segmentation](http://unicode.org/reports/tr29/)
112
+
113
+ Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
@@ -0,0 +1,38 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+ # # #
11
+ # Gem build and install task
12
+
13
+ desc info
14
+ task :gem do
15
+ puts info + "\n\n"
16
+ print " "; sh "gem build #{gemspec_file}"
17
+ FileUtils.mkdir_p 'pkg'
18
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
19
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
20
+ end
21
+
22
+ # # #
23
+ # Start an IRB session with the gem loaded
24
+
25
+ desc "#{gemspec.name} | IRB"
26
+ task :irb do
27
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}/kernel_method"
28
+ end
29
+
30
+ # # #
31
+ # Run specs
32
+
33
+ desc "#{gemspec.name} | Spec"
34
+ task :spec do
35
+ sh "for file in spec/*_spec.rb; do ruby $file; done"
36
+ end
37
+ task default: :spec
38
+
@@ -0,0 +1,75 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "rationalist"
5
+ require "uniscribe"
6
+
7
+ argv = Rationalist.parse(
8
+ ARGV,
9
+ string: '_',
10
+ alias: {
11
+ e: 'encoding',
12
+ v: 'version',
13
+ },
14
+ boolean: [
15
+ 'help',
16
+ 'version',
17
+ 'wide-ambiguous',
18
+ ]
19
+ )
20
+
21
+ if argv[:version]
22
+ puts "uniscribe #{Uniscribe::VERSION} by #{Paint["J-_-L", :bold]} <https://github.com/janlelis/uniscribe>"
23
+ puts "Unicode version is #{Uniscribe::UNICODE_VERSION} (glyph detection #{Uniscribe::UNICODE_VERSION_GLYPH_DETECTION || "[not supported]"})"
24
+ exit(0)
25
+ end
26
+
27
+ if argv[:help]
28
+ puts <<-HELP
29
+
30
+ #{Paint["DESCRIPTION", :underline]}
31
+
32
+ Describes a string of Unicode characters with their name and shows compositions.
33
+
34
+ #{Paint["USAGE", :underline]}
35
+
36
+ #{Paint["uniscribe", :bold]} [options] data
37
+
38
+ --encoding <encoding> | -e | which (Unicode) encoding to use for given data
39
+ --help | | this help page
40
+ --version | -v | displays version of uniscribe
41
+ --wide-ambiguous | | ambiguous characters
42
+
43
+ #{Paint["COLOR CODING", :underline]}
44
+
45
+ #{Paint["blank", Uniscribe::COLORS[:blank]]}
46
+ #{Paint["control", Uniscribe::COLORS[:control]]}
47
+ #{Paint["format", Uniscribe::COLORS[:format]]}
48
+ #{Paint["mark", Uniscribe::COLORS[:mark]]}
49
+ #{Paint["unassigned", Uniscribe::COLORS[:unassigned]]}
50
+ #{Paint["unassigned and ignorable", Uniscribe::COLORS[:ignorable]]}
51
+
52
+ random color for other characters and compositions
53
+
54
+ #{Paint["MORE INFO", :underline]}
55
+
56
+ https://github.com/janlelis/uniscribe
57
+
58
+ HELP
59
+ exit(0)
60
+ end
61
+
62
+ if argv[:_] && argv[:_][0]
63
+ data = argv[:_][0]
64
+ elsif !$stdin.tty?
65
+ data = $stdin.read
66
+ else
67
+ data = nil
68
+ end
69
+
70
+ begin
71
+ Uniscribe.of(data)
72
+ rescue ArgumentError
73
+ $stderr.puts Paint[$!.message, :red]
74
+ exit(1)
75
+ end
@@ -0,0 +1,206 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "uniscribe/version"
4
+
5
+ require "unicode/name"
6
+ require "unicode/sequence_name"
7
+ require "symbolify"
8
+ require "characteristics"
9
+ require "paint"
10
+ require "unicode/display_width"
11
+ require "unicode/emoji"
12
+
13
+ module Uniscribe
14
+ SUPPORTED_ENCODINGS = Encoding.name_list.grep(
15
+ Regexp.union(
16
+ /^UTF-8$/,
17
+ /^UTF8-/,
18
+ /^UTF-...E$/,
19
+ /^US-ASCII$/,
20
+ /^ISO-8859-1$/,
21
+ )
22
+ ).sort.freeze
23
+
24
+ COLORS = {
25
+ control: "#0000FF",
26
+ blank: "#33AADD",
27
+ format: "#FF00FF",
28
+ mark: "#228822",
29
+ unassigned: "#FF5500",
30
+ ignorable: "#FFAA00",
31
+ }
32
+
33
+ def self.of(string, encoding: nil, wide_ambiguous: false)
34
+ string = convert_to_encoding_or_raise(string, encoding)
35
+ glyphs = string.encode("UTF-8").scan(/\X/)
36
+
37
+ visualize(glyphs, wide_ambiguous: wide_ambiguous)
38
+ end
39
+
40
+ def self.convert_to_encoding_or_raise(string, encoding)
41
+ raise ArgumentError, "no data given to uniscribe" if !string || string.empty?
42
+
43
+ string.force_encoding(encoding) if encoding
44
+
45
+ case string.encoding.name
46
+ when *SUPPORTED_ENCODINGS
47
+ unless string.valid_encoding?
48
+ raise ArgumentError, "uniscribe can only describe strings with a valid encoding"
49
+ end
50
+
51
+ string
52
+ when 'UTF-16', 'UTF-32'
53
+ raise ArgumentError, "unibits only supports #{string.encoding.name} with specified endianess, please use #{string.encoding.name}LE or #{string.encoding.name}BE"
54
+ else
55
+ raise ArgumentError, "uniscribe can only describe Unicode strings (or US-ASCII or ISO-8859-1)"
56
+ end
57
+ end
58
+
59
+ def self.visualize(glyphs, wide_ambiguous: false)
60
+ puts
61
+ ( glyphs[0..-2] || [] ).each{ |glyph|
62
+ cps = glyph.codepoints
63
+ if cps.size > 1
64
+ puts_composition(cps, wide_ambiguous)
65
+ else
66
+ puts_codepoint(cps[0], false, false, wide_ambiguous)
67
+ end
68
+ }
69
+
70
+ cps = glyphs[-1].codepoints
71
+ if cps.size > 1
72
+ puts_composition(cps, wide_ambiguous)
73
+ else
74
+ puts_codepoint(cps[0], false, true, wide_ambiguous)
75
+ end
76
+ puts
77
+ end
78
+
79
+ def self.puts_composition(cps, wide_ambiguous = false)
80
+ char = cps.pack("U*")
81
+ if sequence_name = Unicode::SequenceName.of(char)
82
+ name = "Composition: #{sequence_name}"
83
+ else
84
+ name = "Composition"
85
+ end
86
+ char_color = random_color
87
+ cp_hex = "----"
88
+ symbolified_char = symbolify_composition(char)
89
+ padding = determine_padding(symbolified_char, false, wide_ambiguous)
90
+
91
+ puts " %s ├┬ %s%s├┬ %s" % [
92
+ Paint[cp_hex, char_color],
93
+ Paint[symbolified_char, char_color],
94
+ padding,
95
+ Paint[name, char_color],
96
+ ]
97
+ ( cps[0..-2] || [] ).each{ |cp|
98
+ puts_codepoint(cp, true, false, wide_ambiguous)
99
+ }
100
+ puts_codepoint(cps[-1], true, true, wide_ambiguous)
101
+ end
102
+
103
+ def self.puts_codepoint(cp, composed = false, last = false, wide_ambiguous = false)
104
+ char = [cp].pack("U*")
105
+ char_info = UnicodeCharacteristics.new(char)
106
+ char_color = determine_codepoint_color(char_info)
107
+ cp_hex = cp.to_s(16).rjust(4, "0").rjust(6).upcase
108
+ symbolified_char = Symbolify.unicode(char, char_info)
109
+ if composed && !last
110
+ branch = "│├─"
111
+ elsif composed && last
112
+ branch = "│└─"
113
+ else
114
+ branch = "├─"
115
+ end
116
+ name = determine_codepoint_name(char)
117
+ padding = determine_padding(symbolified_char, composed, wide_ambiguous)
118
+
119
+ puts " %s %s %s%s%s %s" % [
120
+ Paint[cp_hex, char_color],
121
+ branch,
122
+ Paint[symbolified_char, char_color],
123
+ padding,
124
+ branch,
125
+ Paint[name, char_color],
126
+ ]
127
+ end
128
+
129
+ def self.determine_codepoint_color(char_info)
130
+ if !char_info.assigned?
131
+ if char_info.ignorable?
132
+ COLORS[:ignorable]
133
+ else
134
+ COLORS[:unassigned]
135
+ end
136
+ elsif char_info.blank?
137
+ COLORS[:blank]
138
+ elsif char_info.control?
139
+ COLORS[:control]
140
+ elsif char_info.format?
141
+ COLORS[:format]
142
+ elsif char_info.unicode? && char_info.category[0] == "M"
143
+ COLORS[:mark]
144
+ else
145
+ random_color
146
+ end
147
+ end
148
+
149
+ def self.random_color
150
+ "%.2x%.2x%.2x" % [rand(90) + 60, rand(90) + 60, rand(90) + 60]
151
+ end
152
+
153
+ def self.determine_codepoint_name(char)
154
+ name = Unicode::Name.correct(char)
155
+ return name if name
156
+
157
+ name = Unicode::Name.label(char)
158
+ as = Unicode::Name.aliases(char)
159
+ return name if !as
160
+
161
+ alias_ = ( as[:control] && as[:control][0] ||
162
+ as[:figment] && as[:figment][0] ||
163
+ as[:alternate] && as[:alternate][0] ||
164
+ as[:abbreviation] && as[:abbreviation][0] )
165
+ return name if !alias_
166
+
167
+ name + " " + alias_
168
+ end
169
+
170
+ def self.determine_padding(char, composed, wide_ambiguous)
171
+ required_width = Unicode::DisplayWidth.of(char, wide_ambiguous ? 2 : 1, {}, emoji: true)
172
+ required_width += 1 if composed
173
+ required_width = 0 if required_width < 0
174
+
175
+ case required_width
176
+ when 0...5
177
+ "\t\t"
178
+ when 5...10
179
+ "\t"
180
+ else
181
+ ""
182
+ end
183
+ end
184
+
185
+ def self.symbolify_composition(char)
186
+ char_infos = char.chars.map{ |c| UnicodeCharacteristics.new(c) }
187
+
188
+ case
189
+ when char_infos.any?{ |c| !c.assigned? }
190
+ "n/a"
191
+ when char_infos.all?{ |c| c.separator? }
192
+ "⏎"
193
+ when char_infos.all?{ |c| c.category == "Mn" || c.category == "Me" }
194
+ if char_infos.any?{ |c| c.category == "Mn" }
195
+ "◌" + char
196
+ else
197
+ " " + char
198
+ end
199
+ when char_infos.all?{ |c| c.blank? }
200
+ "]" + char + "["
201
+ else
202
+ char
203
+ end
204
+ end
205
+ end
206
+
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../uniscribe'
4
+
5
+ module Kernel
6
+ private
7
+
8
+ def uniscribe(string, **kwargs)
9
+ Uniscribe.of(string, **kwargs)
10
+ end
11
+ end
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Uniscribe
4
+ VERSION = "1.0.0".freeze
5
+ UNICODE_VERSION = "9.0.0".freeze
6
+
7
+ RUBY_UNICODE_VERSIONS = {
8
+ 2.4 => "9.0.0".freeze,
9
+ 2.3 => "8.0.0".freeze,
10
+ 2.2 => "7.0.0".freeze,
11
+ 2.1 => "6.1.0".freeze,
12
+ }.freeze
13
+
14
+ UNICODE_VERSION_GLYPH_DETECTION = RUBY_ENGINE == "ruby" && RUBY_UNICODE_VERSIONS[RUBY_VERSION.to_f]
15
+ end
16
+
@@ -0,0 +1,187 @@
1
+ require_relative "../lib/uniscribe/kernel_method"
2
+ require "minitest/autorun"
3
+
4
+ describe Uniscribe do
5
+ def check(string_to_test, match_regex)
6
+ uniscribe(string_to_test)
7
+ assert_output(match_regex){ uniscribe(string_to_test) }
8
+ end
9
+
10
+ describe "displays codepoints" do
11
+ it "LATIN CAPITAL LETTER" do
12
+ check "AB", /0041.*0042/m
13
+ end
14
+
15
+ it "AERIAL TRAMWAY" do
16
+ check "🚡", /1F6A1/
17
+ end
18
+ end
19
+
20
+ describe "displays glyph itself" do
21
+ it "LATIN CAPITAL LETTER" do
22
+ check "AB", /A.*B/m
23
+ end
24
+
25
+ it "AERIAL TRAMWAY" do
26
+ check "🚡", /🚡/
27
+ end
28
+ end
29
+
30
+ describe "displays names" do
31
+ it "LATIN CAPITAL LETTER" do
32
+ check "AB", /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B/m
33
+ end
34
+
35
+ it "AERIAL TRAMWAY" do
36
+ check "🚡", /AERIAL TRAMWAY/
37
+ end
38
+ end
39
+
40
+ describe "supported encodings" do
41
+ it "works with UTF-16" do
42
+ check "🚡".encode("UTF-16LE"), /AERIAL TRAMWAY/
43
+ end
44
+
45
+ it "works with UTF-32" do
46
+ check "🚡".encode("UTF-32BE"), /AERIAL TRAMWAY/
47
+ end
48
+
49
+ it "works with US-ASCII" do
50
+ check "AB".force_encoding("US-ASCII"), /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B/m
51
+ end
52
+
53
+ it "works with ISO-8859-1" do
54
+ check "AB\x81".force_encoding("ISO-8859-1"), /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B.*<control-0081> HIGH OCTET PRESET/m
55
+ end
56
+ end
57
+
58
+ describe "example compositions" do
59
+ describe "combining marks" do
60
+ it "DIAERESIS" do
61
+ check "g̈", /Composition.*LATIN SMALL LETTER G.*DIAERESIS/m
62
+ end
63
+
64
+ it "RING BELOW" do
65
+ check "n̥", /Composition.*LATIN SMALL LETTER N.*COMBINING RING BELOW/m
66
+ end
67
+
68
+ it "ARABIC FATHA" do
69
+ check "دَ", /Composition.*ARABIC LETTER DAL.*ARABIC FATHA/m
70
+ end
71
+
72
+ it "ACUTE ACCENT" do
73
+ check "ά", /Composition.*GREEK SMALL LETTER ALPHA.*COMBINING ACUTE ACCENT/m
74
+ end
75
+
76
+ it "HEBREW POINT HIRIQ" do
77
+ check "חִ", /Composition.*HEBREW LETTER HET.*HEBREW POINT HIRIQ/m
78
+ end
79
+
80
+ it "THAI CHARACTER SARA U" do
81
+ check "จุ", /Composition.*THAI CHARACTER CHO CHAN.*THAI CHARACTER SARA U/m
82
+ end
83
+ end
84
+
85
+ describe "misc scripts" do
86
+ if RUBY_VERSION >= "2.4.0"
87
+ it "HANGUL" do
88
+ check "ᅘᆇᇈ", /Composition.*HANGUL CHOSEONG SSANGHIEUH.*HANGUL JUNGSEONG YO-O.*HANGUL JONGSEONG NIEUN-PANSIOS/m
89
+ end
90
+
91
+ it "HANGUL 2" do
92
+ check "각", /Composition.*HANGUL CHOSEONG KIYEOK.*HANGUL JUNGSEONG A.*HANGUL JONGSEONG KIYEOK/m
93
+ end
94
+
95
+ it "HANGUL 3" do
96
+ check "ᄇᄉᄐ", /Composition.*HANGUL CHOSEONG PIEUP.*HANGUL CHOSEONG SIOS.*HANGUL CHOSEONG THIEUTH/m
97
+ end
98
+
99
+ it "TAMIL" do
100
+ check "நி", /Composition.*TAMIL SYLLABLE NI.*TAMIL LETTER NA.*TAMIL VOWEL SIGN I/m
101
+ end
102
+
103
+ it "DEVANAGARI" do
104
+ check "षि", /Composition.*DEVANAGARI LETTER SSA.*DEVANAGARI VOWEL SIGN I/m
105
+ end
106
+ end
107
+ end
108
+
109
+ describe "zwj and zwnj" do
110
+ if RUBY_VERSION >= "2.4.0"
111
+ it "ZWJ" do
112
+ check "क्‍", /Composition.*DEVANAGARI LETTER KA.*DEVANAGARI SIGN VIRAMA.*ZERO WIDTH JOINER/m
113
+ end
114
+
115
+ it "ZWNJ" do
116
+ check "t‌", /Composition.*LATIN SMALL LETTER T.*ZERO WIDTH NON-JOINER/m
117
+ end
118
+ end
119
+ end
120
+
121
+ describe "misc variations" do
122
+ it "TEXT STYLE" do
123
+ check "‼︎", /Composition.*(text style).*DOUBLE EXCLAMATION MARK.*VARIATION SELECTOR-15/m
124
+ end
125
+
126
+ it "EMOJI STYLE" do
127
+ check "‼️", /Composition.*(emoji style).*DOUBLE EXCLAMATION MARK.*VARIATION SELECTOR-16/m
128
+ end
129
+
130
+ it "DOTTED FORM" do
131
+ check "င︀", /Composition.*(dotted form).*MYANMAR LETTER NGA.*VARIATION SELECTOR-1/m
132
+ end
133
+
134
+ it "MONGOLIAN SECOND FORM" do
135
+ check "ᠠ᠋", /Composition.*(second form).*MONGOLIAN LETTER A.*MONGOLIAN FREE VARIATION SELECTOR ONE/m
136
+ end
137
+
138
+ it "CJK COMPATIBILITY IDEOGRAPH-2F81F" do
139
+ check "㓟︀", /Composition.*CJK COMPATIBILITY IDEOGRAPH-2F81F.*CJK UNIFIED IDEOGRAPH-34DF.*VARIATION SELECTOR-1/m
140
+ end
141
+
142
+ it "CID+6238" do
143
+ check "胥󠄀", /Composition.*CID\+6238.*CJK UNIFIED IDEOGRAPH-80E5.*VARIATION SELECTOR-17/m
144
+ end
145
+ end
146
+
147
+ describe "misc other" do
148
+ it "KEYCAP" do
149
+ check "5⃣", /Composition.*DIGIT FIVE.*COMBINING ENCLOSING KEYCAP/m
150
+ end
151
+
152
+ if RUBY_VERSION >= "2.4.0"
153
+ it "␍ + ␊" do
154
+ check "\r\n", /Composition.*<control-000D> CARRIAGE RETURN.*<control-000A> LINE FEED/m
155
+ end
156
+
157
+ it "REGIONAL" do
158
+ check "🇺🇳", /Composition.*UNITED NATIONS.*REGIONAL INDICATOR SYMBOL LETTER U.*REGIONAL INDICATOR SYMBOL LETTER N/m
159
+ end
160
+
161
+ it "TAG SEQUENCE" do
162
+ check "🏴󠁧󠁢󠁳󠁣󠁴󠁿", /Composition.*SCOTLAND.*WAVING BLACK FLAG.*TAG LATIN SMALL LETTER G.*TAG LATIN SMALL LETTER B.*TAG LATIN SMALL LETTER S.*TAG LATIN SMALL LETTER C.*TAG LATIN SMALL LETTER T.*CANCEL TAG/m
163
+ end
164
+
165
+ it "EMOJI MODIFIER" do
166
+ check "🙅🏿", /Composition.*PERSON GESTURING NO: DARK SKIN TONE.*FACE WITH NO GOOD GESTURE.*EMOJI MODIFIER FITZPATRICK TYPE-6/m
167
+ end
168
+
169
+ it "EMOJI ZWJ SEQUENCE" do
170
+ check "👩‍👩‍👦‍👦", /Composition.*FAMILY.*WOMAN.*ZERO WIDTH JOINER.*WOMAN.*ZERO WIDTH JOINER.*BOY.*ZERO WIDTH JOINER.*BOY/m
171
+ end
172
+ end
173
+ end
174
+ end
175
+
176
+ describe "unusual codepoints" do
177
+ if RUBY_VERSION >= "2.4.0"
178
+ it "safely prints and highlights unusual codepoints" do
179
+ check "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}", /<control-0000> NULL.*Composition.*LATIN CAPITAL LETTER A.*VARIATION SELECTOR-232.*<control-007F> DELETE.*Composition.*<control-000D> CARRIAGE RETURN.*<control-000A> LINE FEED.*<reserved-D0000>.*<control-0081> HIGH OCTET PRESET.*INTERLINEAR ANNOTATION ANCHOR.*LATIN CAPITAL LETTER B.*INTERLINEAR ANNOTATION TERMINATOR.*Composition.*WAVING BLACK FLAG.*TAG LATIN SMALL LETTER A.*CANCEL TAG.*<noncharacter-10FFFF>/m
180
+ end
181
+ end
182
+
183
+ it "safely prints and highlights various blanks" do
184
+ check "­ᅠ 𝅸", /SOFT HYPHEN.*HANGUL JUNGSEONG FILLER.*EM QUAD.*INHIBIT ARABIC FORM SHAPING.*ZERO WIDTH NO-BREAK SPACE.*MUSICAL SYMBOL END SLUR/m
185
+ end
186
+ end
187
+ end
@@ -0,0 +1,29 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/uniscribe/version"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "uniscribe"
7
+ gem.version = Uniscribe::VERSION
8
+ gem.summary = "Describes Unicode characters."
9
+ gem.description = "Describes Unicode characters with their name and shows compositions."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/uniscribe"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^(pkg|screenshots)/}
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.1"
21
+ gem.add_dependency "unicode-name", "~> 1.4", ">= 1.4.2"
22
+ gem.add_dependency "unicode-sequence_name", "~> 1.0"
23
+ gem.add_dependency "unicode-display_width", "~> 1.2", ">= 1.2.1"
24
+ gem.add_dependency "unicode-emoji", ">= 0.9", "< 2.0"
25
+ gem.add_dependency "symbolify", "~> 1.2"
26
+ gem.add_dependency "characteristics", ">= 0.7", "< 2.0"
27
+ gem.add_dependency "paint", ">= 0.9", "< 3.0"
28
+ gem.add_dependency "rationalist", "~> 2.0"
29
+ end
metadata ADDED
@@ -0,0 +1,203 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: uniscribe
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-04-17 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode-name
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.4'
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 1.4.2
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - "~>"
28
+ - !ruby/object:Gem::Version
29
+ version: '1.4'
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: 1.4.2
33
+ - !ruby/object:Gem::Dependency
34
+ name: unicode-sequence_name
35
+ requirement: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '1.0'
40
+ type: :runtime
41
+ prerelease: false
42
+ version_requirements: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '1.0'
47
+ - !ruby/object:Gem::Dependency
48
+ name: unicode-display_width
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '1.2'
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: 1.2.1
57
+ type: :runtime
58
+ prerelease: false
59
+ version_requirements: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - "~>"
62
+ - !ruby/object:Gem::Version
63
+ version: '1.2'
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: 1.2.1
67
+ - !ruby/object:Gem::Dependency
68
+ name: unicode-emoji
69
+ requirement: !ruby/object:Gem::Requirement
70
+ requirements:
71
+ - - ">="
72
+ - !ruby/object:Gem::Version
73
+ version: '0.9'
74
+ - - "<"
75
+ - !ruby/object:Gem::Version
76
+ version: '2.0'
77
+ type: :runtime
78
+ prerelease: false
79
+ version_requirements: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: '0.9'
84
+ - - "<"
85
+ - !ruby/object:Gem::Version
86
+ version: '2.0'
87
+ - !ruby/object:Gem::Dependency
88
+ name: symbolify
89
+ requirement: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - "~>"
92
+ - !ruby/object:Gem::Version
93
+ version: '1.2'
94
+ type: :runtime
95
+ prerelease: false
96
+ version_requirements: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - "~>"
99
+ - !ruby/object:Gem::Version
100
+ version: '1.2'
101
+ - !ruby/object:Gem::Dependency
102
+ name: characteristics
103
+ requirement: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: '0.7'
108
+ - - "<"
109
+ - !ruby/object:Gem::Version
110
+ version: '2.0'
111
+ type: :runtime
112
+ prerelease: false
113
+ version_requirements: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0.7'
118
+ - - "<"
119
+ - !ruby/object:Gem::Version
120
+ version: '2.0'
121
+ - !ruby/object:Gem::Dependency
122
+ name: paint
123
+ requirement: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - ">="
126
+ - !ruby/object:Gem::Version
127
+ version: '0.9'
128
+ - - "<"
129
+ - !ruby/object:Gem::Version
130
+ version: '3.0'
131
+ type: :runtime
132
+ prerelease: false
133
+ version_requirements: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: '0.9'
138
+ - - "<"
139
+ - !ruby/object:Gem::Version
140
+ version: '3.0'
141
+ - !ruby/object:Gem::Dependency
142
+ name: rationalist
143
+ requirement: !ruby/object:Gem::Requirement
144
+ requirements:
145
+ - - "~>"
146
+ - !ruby/object:Gem::Version
147
+ version: '2.0'
148
+ type: :runtime
149
+ prerelease: false
150
+ version_requirements: !ruby/object:Gem::Requirement
151
+ requirements:
152
+ - - "~>"
153
+ - !ruby/object:Gem::Version
154
+ version: '2.0'
155
+ description: Describes Unicode characters with their name and shows compositions.
156
+ email:
157
+ - mail@janlelis.de
158
+ executables:
159
+ - uniscribe
160
+ extensions: []
161
+ extra_rdoc_files: []
162
+ files:
163
+ - ".gitignore"
164
+ - ".travis.yml"
165
+ - CHANGELOG.md
166
+ - CODE_OF_CONDUCT.md
167
+ - Gemfile
168
+ - Gemfile.lock
169
+ - MIT-LICENSE.txt
170
+ - README.md
171
+ - Rakefile
172
+ - bin/uniscribe
173
+ - lib/uniscribe.rb
174
+ - lib/uniscribe/kernel_method.rb
175
+ - lib/uniscribe/version.rb
176
+ - spec/uniscribe_spec.rb
177
+ - uniscribe.gemspec
178
+ homepage: https://github.com/janlelis/uniscribe
179
+ licenses:
180
+ - MIT
181
+ metadata: {}
182
+ post_install_message:
183
+ rdoc_options: []
184
+ require_paths:
185
+ - lib
186
+ required_ruby_version: !ruby/object:Gem::Requirement
187
+ requirements:
188
+ - - "~>"
189
+ - !ruby/object:Gem::Version
190
+ version: '2.1'
191
+ required_rubygems_version: !ruby/object:Gem::Requirement
192
+ requirements:
193
+ - - ">="
194
+ - !ruby/object:Gem::Version
195
+ version: '0'
196
+ requirements: []
197
+ rubyforge_project:
198
+ rubygems_version: 2.6.8
199
+ signing_key:
200
+ specification_version: 4
201
+ summary: Describes Unicode characters.
202
+ test_files:
203
+ - spec/uniscribe_spec.rb