uniscribe 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9f51890eee3fe1e008ed635c68a3598d9e9f5467
4
+ data.tar.gz: d25eb46898d1de180c085960fe858cbbea57dd2f
5
+ SHA512:
6
+ metadata.gz: 766dedd2e18ef926e5a962882250c232db1aae66e6008cb179fc7bfaa88a913b61157d8114527f6bb9eff2a0111137658b0311015bf5679fbefb5a9af6210b0b
7
+ data.tar.gz: bfe050bd760fd40afbbc2dbf6f664d3c2de4cbfff2ffc0cd9ab7d3912f7e3349b57a45de44949ba4e07b06704e08f7796e2bf3675410a5e64583a9240cdaa13c
@@ -0,0 +1,2 @@
1
+ Gemfile.lock
2
+ /pkg
@@ -0,0 +1,23 @@
1
+ sudo: false
2
+ language: ruby
3
+
4
+ rvm:
5
+ - ruby-head
6
+ - 2.4.1
7
+ - 2.3.3
8
+ - 2.2
9
+ - 2.1
10
+ - 2.0
11
+ - jruby-head
12
+ - jruby-9.1.8.0
13
+
14
+ cache:
15
+ - bundler
16
+
17
+ matrix:
18
+ allow_failures:
19
+ - rvm: jruby-head
20
+ - rvm: jruby-9.1.8.0
21
+ - rvm: ruby-head
22
+ - rvm: 2.0
23
+ # fast_finish: true
@@ -0,0 +1,6 @@
1
+ ## CHANGELOG
2
+
3
+ ### 0.1.0
4
+
5
+ * Initial release
6
+
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at opensource@janlelis.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
4
+
5
+ gem 'minitest'
6
+ gem 'rake'
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2017 Jan Lelis, mail@janlelis.de
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,113 @@
1
+ # uniscribe | Describe the Unicode [![[version]](https://badge.fury.io/rb/uniscribe.svg)](http://badge.fury.io/rb/uniscribe) [![[travis]](https://travis-ci.org/janlelis/uniscribe.svg)](https://travis-ci.org/janlelis/uniscribe)
2
+
3
+ Describes Unicode characters with their name and shows compositions.
4
+
5
+ - Helps you understand how glyphs and codepoints are structered within the data
6
+ - Gives you the names of glyphs and codepoints, which can be used for further research
7
+ - Highlights invalid/special/blank codepoints
8
+
9
+ Uses a similar color coding like its lower-level companion tool [unibits](https://github.com/janlelis/unibits).
10
+
11
+ ## Setup
12
+
13
+ Make sure you have Ruby installed and installing gems works properly. Then do:
14
+
15
+ ```
16
+ $ gem install uniscribe
17
+ ```
18
+
19
+ ## Usage
20
+
21
+ Pass the string to debug to uniscribe:
22
+
23
+ ### From CLI
24
+
25
+ ```
26
+ $ uniscribe "test strı̈ng"
27
+ ```
28
+
29
+ ### From Ruby
30
+
31
+ ```ruby
32
+ require "uniscribe/kernel_method"
33
+ uniscribe "test strı̈ng"
34
+ ```
35
+
36
+ ### Output
37
+
38
+ ```
39
+
40
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
41
+ 0065 ├─ e ├─ LATIN SMALL LETTER E
42
+ 0073 ├─ s ├─ LATIN SMALL LETTER S
43
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
44
+ 0020 ├─ ] [ ├─ SPACE
45
+ 0073 ├─ s ├─ LATIN SMALL LETTER S
46
+ 0074 ├─ t ├─ LATIN SMALL LETTER T
47
+ 0072 ├─ r ├─ LATIN SMALL LETTER R
48
+ ---- ├┬ ı̈ ├┬ Composition
49
+ 0131 │├─ ı │├─ LATIN SMALL LETTER DOTLESS I
50
+ 0308 │└─ ◌̈ │└─ COMBINING DIAERESIS
51
+ 006E ├─ n ├─ LATIN SMALL LETTER N
52
+ 0067 ├─ g ├─ LATIN SMALL LETTER G
53
+
54
+ ```
55
+
56
+ ## Examples
57
+
58
+ ### Tamil
59
+
60
+ `>> uniscribe "நகரத்தில்"`
61
+
62
+ ![Screenshot Tamil](/screenshots/tamil.png?raw=true "Tamil")
63
+
64
+ ### Thai
65
+
66
+ `>> uniscribe "ม้าลายหกตัว"`
67
+
68
+ ![Screenshot Thai](/screenshots/thai.png?raw=true "Thai")
69
+
70
+ ### Emoji Sequences
71
+
72
+ `>> uniscribe "3️⃣🤸‍♀"`
73
+
74
+ ![Screenshot Emoji](/screenshots/emoji.png?raw=true "Emoji")
75
+
76
+ ### Lots of Combining Marks
77
+
78
+ `>> uniscribe "̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍"`
79
+
80
+ ![Screenshot Marks](/screenshots/marks.png?raw=true "Marks")
81
+
82
+ ### Some Strange Unicode Characters
83
+
84
+ `>> uniscribe "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}"`
85
+
86
+ ![Screenshot Strange](/screenshots/strange.png?raw=true "Strange")
87
+
88
+ ### Some Blanks
89
+
90
+ `>> uniscribe "­ᅠ 𝅸"`
91
+
92
+ ![Screenshot Blanks](/screenshots/blanks.png?raw=true "Blanks")
93
+
94
+ ## Notes
95
+
96
+ The proper detetion of compositions / graphemes / combined characters depends on your Ruby version:
97
+
98
+ Ruby | Unicode Version
99
+ -----|----------------
100
+ 2.4 | 9.0.0
101
+ 2.3 | 8.0.0
102
+ 2.2 | 7.0.0
103
+ 2.1 | 6.1.0
104
+
105
+ Also see
106
+
107
+ - [unibits](https://github.com/janlelis/unibits) - visualizes Unicode encodings
108
+ - [symbolify](https://github.com/janlelis/symbolify) - used for safely printing individual codepoints
109
+ - [characteristics](https://github.com/janlelis/characteristics) - used for detecting blanks and similar
110
+ - [unicopy](https://github.com/janlelis/unicopy) - copy codepoints to clipboard
111
+ - [Unicode® Standard Annex #29: Unicode Text Segmentation](http://unicode.org/reports/tr29/)
112
+
113
+ Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
@@ -0,0 +1,38 @@
1
+ # # #
2
+ # Get gemspec info
3
+
4
+ gemspec_file = Dir['*.gemspec'].first
5
+ gemspec = eval File.read(gemspec_file), binding, gemspec_file
6
+ info = "#{gemspec.name} | #{gemspec.version} | " \
7
+ "#{gemspec.runtime_dependencies.size} dependencies | " \
8
+ "#{gemspec.files.size} files"
9
+
10
+ # # #
11
+ # Gem build and install task
12
+
13
+ desc info
14
+ task :gem do
15
+ puts info + "\n\n"
16
+ print " "; sh "gem build #{gemspec_file}"
17
+ FileUtils.mkdir_p 'pkg'
18
+ FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", 'pkg'
19
+ puts; sh %{gem install --no-document pkg/#{gemspec.name}-#{gemspec.version}.gem}
20
+ end
21
+
22
+ # # #
23
+ # Start an IRB session with the gem loaded
24
+
25
+ desc "#{gemspec.name} | IRB"
26
+ task :irb do
27
+ sh "irb -I ./lib -r #{gemspec.name.gsub '-','/'}/kernel_method"
28
+ end
29
+
30
+ # # #
31
+ # Run specs
32
+
33
+ desc "#{gemspec.name} | Spec"
34
+ task :spec do
35
+ sh "for file in spec/*_spec.rb; do ruby $file; done"
36
+ end
37
+ task default: :spec
38
+
@@ -0,0 +1,75 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "rationalist"
5
+ require "uniscribe"
6
+
7
+ argv = Rationalist.parse(
8
+ ARGV,
9
+ string: '_',
10
+ alias: {
11
+ e: 'encoding',
12
+ v: 'version',
13
+ },
14
+ boolean: [
15
+ 'help',
16
+ 'version',
17
+ 'wide-ambiguous',
18
+ ]
19
+ )
20
+
21
+ if argv[:version]
22
+ puts "uniscribe #{Uniscribe::VERSION} by #{Paint["J-_-L", :bold]} <https://github.com/janlelis/uniscribe>"
23
+ puts "Unicode version is #{Uniscribe::UNICODE_VERSION} (glyph detection #{Uniscribe::UNICODE_VERSION_GLYPH_DETECTION || "[not supported]"})"
24
+ exit(0)
25
+ end
26
+
27
+ if argv[:help]
28
+ puts <<-HELP
29
+
30
+ #{Paint["DESCRIPTION", :underline]}
31
+
32
+ Describes a string of Unicode characters with their name and shows compositions.
33
+
34
+ #{Paint["USAGE", :underline]}
35
+
36
+ #{Paint["uniscribe", :bold]} [options] data
37
+
38
+ --encoding <encoding> | -e | which (Unicode) encoding to use for given data
39
+ --help | | this help page
40
+ --version | -v | displays version of uniscribe
41
+ --wide-ambiguous | | ambiguous characters
42
+
43
+ #{Paint["COLOR CODING", :underline]}
44
+
45
+ #{Paint["blank", Uniscribe::COLORS[:blank]]}
46
+ #{Paint["control", Uniscribe::COLORS[:control]]}
47
+ #{Paint["format", Uniscribe::COLORS[:format]]}
48
+ #{Paint["mark", Uniscribe::COLORS[:mark]]}
49
+ #{Paint["unassigned", Uniscribe::COLORS[:unassigned]]}
50
+ #{Paint["unassigned and ignorable", Uniscribe::COLORS[:ignorable]]}
51
+
52
+ random color for other characters and compositions
53
+
54
+ #{Paint["MORE INFO", :underline]}
55
+
56
+ https://github.com/janlelis/uniscribe
57
+
58
+ HELP
59
+ exit(0)
60
+ end
61
+
62
+ if argv[:_] && argv[:_][0]
63
+ data = argv[:_][0]
64
+ elsif !$stdin.tty?
65
+ data = $stdin.read
66
+ else
67
+ data = nil
68
+ end
69
+
70
+ begin
71
+ Uniscribe.of(data)
72
+ rescue ArgumentError
73
+ $stderr.puts Paint[$!.message, :red]
74
+ exit(1)
75
+ end
@@ -0,0 +1,206 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "uniscribe/version"
4
+
5
+ require "unicode/name"
6
+ require "unicode/sequence_name"
7
+ require "symbolify"
8
+ require "characteristics"
9
+ require "paint"
10
+ require "unicode/display_width"
11
+ require "unicode/emoji"
12
+
13
+ module Uniscribe
14
+ SUPPORTED_ENCODINGS = Encoding.name_list.grep(
15
+ Regexp.union(
16
+ /^UTF-8$/,
17
+ /^UTF8-/,
18
+ /^UTF-...E$/,
19
+ /^US-ASCII$/,
20
+ /^ISO-8859-1$/,
21
+ )
22
+ ).sort.freeze
23
+
24
+ COLORS = {
25
+ control: "#0000FF",
26
+ blank: "#33AADD",
27
+ format: "#FF00FF",
28
+ mark: "#228822",
29
+ unassigned: "#FF5500",
30
+ ignorable: "#FFAA00",
31
+ }
32
+
33
+ def self.of(string, encoding: nil, wide_ambiguous: false)
34
+ string = convert_to_encoding_or_raise(string, encoding)
35
+ glyphs = string.encode("UTF-8").scan(/\X/)
36
+
37
+ visualize(glyphs, wide_ambiguous: wide_ambiguous)
38
+ end
39
+
40
+ def self.convert_to_encoding_or_raise(string, encoding)
41
+ raise ArgumentError, "no data given to uniscribe" if !string || string.empty?
42
+
43
+ string.force_encoding(encoding) if encoding
44
+
45
+ case string.encoding.name
46
+ when *SUPPORTED_ENCODINGS
47
+ unless string.valid_encoding?
48
+ raise ArgumentError, "uniscribe can only describe strings with a valid encoding"
49
+ end
50
+
51
+ string
52
+ when 'UTF-16', 'UTF-32'
53
+ raise ArgumentError, "unibits only supports #{string.encoding.name} with specified endianess, please use #{string.encoding.name}LE or #{string.encoding.name}BE"
54
+ else
55
+ raise ArgumentError, "uniscribe can only describe Unicode strings (or US-ASCII or ISO-8859-1)"
56
+ end
57
+ end
58
+
59
+ def self.visualize(glyphs, wide_ambiguous: false)
60
+ puts
61
+ ( glyphs[0..-2] || [] ).each{ |glyph|
62
+ cps = glyph.codepoints
63
+ if cps.size > 1
64
+ puts_composition(cps, wide_ambiguous)
65
+ else
66
+ puts_codepoint(cps[0], false, false, wide_ambiguous)
67
+ end
68
+ }
69
+
70
+ cps = glyphs[-1].codepoints
71
+ if cps.size > 1
72
+ puts_composition(cps, wide_ambiguous)
73
+ else
74
+ puts_codepoint(cps[0], false, true, wide_ambiguous)
75
+ end
76
+ puts
77
+ end
78
+
79
+ def self.puts_composition(cps, wide_ambiguous = false)
80
+ char = cps.pack("U*")
81
+ if sequence_name = Unicode::SequenceName.of(char)
82
+ name = "Composition: #{sequence_name}"
83
+ else
84
+ name = "Composition"
85
+ end
86
+ char_color = random_color
87
+ cp_hex = "----"
88
+ symbolified_char = symbolify_composition(char)
89
+ padding = determine_padding(symbolified_char, false, wide_ambiguous)
90
+
91
+ puts " %s ├┬ %s%s├┬ %s" % [
92
+ Paint[cp_hex, char_color],
93
+ Paint[symbolified_char, char_color],
94
+ padding,
95
+ Paint[name, char_color],
96
+ ]
97
+ ( cps[0..-2] || [] ).each{ |cp|
98
+ puts_codepoint(cp, true, false, wide_ambiguous)
99
+ }
100
+ puts_codepoint(cps[-1], true, true, wide_ambiguous)
101
+ end
102
+
103
+ def self.puts_codepoint(cp, composed = false, last = false, wide_ambiguous = false)
104
+ char = [cp].pack("U*")
105
+ char_info = UnicodeCharacteristics.new(char)
106
+ char_color = determine_codepoint_color(char_info)
107
+ cp_hex = cp.to_s(16).rjust(4, "0").rjust(6).upcase
108
+ symbolified_char = Symbolify.unicode(char, char_info)
109
+ if composed && !last
110
+ branch = "│├─"
111
+ elsif composed && last
112
+ branch = "│└─"
113
+ else
114
+ branch = "├─"
115
+ end
116
+ name = determine_codepoint_name(char)
117
+ padding = determine_padding(symbolified_char, composed, wide_ambiguous)
118
+
119
+ puts " %s %s %s%s%s %s" % [
120
+ Paint[cp_hex, char_color],
121
+ branch,
122
+ Paint[symbolified_char, char_color],
123
+ padding,
124
+ branch,
125
+ Paint[name, char_color],
126
+ ]
127
+ end
128
+
129
+ def self.determine_codepoint_color(char_info)
130
+ if !char_info.assigned?
131
+ if char_info.ignorable?
132
+ COLORS[:ignorable]
133
+ else
134
+ COLORS[:unassigned]
135
+ end
136
+ elsif char_info.blank?
137
+ COLORS[:blank]
138
+ elsif char_info.control?
139
+ COLORS[:control]
140
+ elsif char_info.format?
141
+ COLORS[:format]
142
+ elsif char_info.unicode? && char_info.category[0] == "M"
143
+ COLORS[:mark]
144
+ else
145
+ random_color
146
+ end
147
+ end
148
+
149
+ def self.random_color
150
+ "%.2x%.2x%.2x" % [rand(90) + 60, rand(90) + 60, rand(90) + 60]
151
+ end
152
+
153
+ def self.determine_codepoint_name(char)
154
+ name = Unicode::Name.correct(char)
155
+ return name if name
156
+
157
+ name = Unicode::Name.label(char)
158
+ as = Unicode::Name.aliases(char)
159
+ return name if !as
160
+
161
+ alias_ = ( as[:control] && as[:control][0] ||
162
+ as[:figment] && as[:figment][0] ||
163
+ as[:alternate] && as[:alternate][0] ||
164
+ as[:abbreviation] && as[:abbreviation][0] )
165
+ return name if !alias_
166
+
167
+ name + " " + alias_
168
+ end
169
+
170
+ def self.determine_padding(char, composed, wide_ambiguous)
171
+ required_width = Unicode::DisplayWidth.of(char, wide_ambiguous ? 2 : 1, {}, emoji: true)
172
+ required_width += 1 if composed
173
+ required_width = 0 if required_width < 0
174
+
175
+ case required_width
176
+ when 0...5
177
+ "\t\t"
178
+ when 5...10
179
+ "\t"
180
+ else
181
+ ""
182
+ end
183
+ end
184
+
185
+ def self.symbolify_composition(char)
186
+ char_infos = char.chars.map{ |c| UnicodeCharacteristics.new(c) }
187
+
188
+ case
189
+ when char_infos.any?{ |c| !c.assigned? }
190
+ "n/a"
191
+ when char_infos.all?{ |c| c.separator? }
192
+ "⏎"
193
+ when char_infos.all?{ |c| c.category == "Mn" || c.category == "Me" }
194
+ if char_infos.any?{ |c| c.category == "Mn" }
195
+ "◌" + char
196
+ else
197
+ " " + char
198
+ end
199
+ when char_infos.all?{ |c| c.blank? }
200
+ "]" + char + "["
201
+ else
202
+ char
203
+ end
204
+ end
205
+ end
206
+
@@ -0,0 +1,11 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../uniscribe'
4
+
5
+ module Kernel
6
+ private
7
+
8
+ def uniscribe(string, **kwargs)
9
+ Uniscribe.of(string, **kwargs)
10
+ end
11
+ end
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Uniscribe
4
+ VERSION = "1.0.0".freeze
5
+ UNICODE_VERSION = "9.0.0".freeze
6
+
7
+ RUBY_UNICODE_VERSIONS = {
8
+ 2.4 => "9.0.0".freeze,
9
+ 2.3 => "8.0.0".freeze,
10
+ 2.2 => "7.0.0".freeze,
11
+ 2.1 => "6.1.0".freeze,
12
+ }.freeze
13
+
14
+ UNICODE_VERSION_GLYPH_DETECTION = RUBY_ENGINE == "ruby" && RUBY_UNICODE_VERSIONS[RUBY_VERSION.to_f]
15
+ end
16
+
@@ -0,0 +1,187 @@
1
+ require_relative "../lib/uniscribe/kernel_method"
2
+ require "minitest/autorun"
3
+
4
+ describe Uniscribe do
5
+ def check(string_to_test, match_regex)
6
+ uniscribe(string_to_test)
7
+ assert_output(match_regex){ uniscribe(string_to_test) }
8
+ end
9
+
10
+ describe "displays codepoints" do
11
+ it "LATIN CAPITAL LETTER" do
12
+ check "AB", /0041.*0042/m
13
+ end
14
+
15
+ it "AERIAL TRAMWAY" do
16
+ check "🚡", /1F6A1/
17
+ end
18
+ end
19
+
20
+ describe "displays glyph itself" do
21
+ it "LATIN CAPITAL LETTER" do
22
+ check "AB", /A.*B/m
23
+ end
24
+
25
+ it "AERIAL TRAMWAY" do
26
+ check "🚡", /🚡/
27
+ end
28
+ end
29
+
30
+ describe "displays names" do
31
+ it "LATIN CAPITAL LETTER" do
32
+ check "AB", /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B/m
33
+ end
34
+
35
+ it "AERIAL TRAMWAY" do
36
+ check "🚡", /AERIAL TRAMWAY/
37
+ end
38
+ end
39
+
40
+ describe "supported encodings" do
41
+ it "works with UTF-16" do
42
+ check "🚡".encode("UTF-16LE"), /AERIAL TRAMWAY/
43
+ end
44
+
45
+ it "works with UTF-32" do
46
+ check "🚡".encode("UTF-32BE"), /AERIAL TRAMWAY/
47
+ end
48
+
49
+ it "works with US-ASCII" do
50
+ check "AB".force_encoding("US-ASCII"), /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B/m
51
+ end
52
+
53
+ it "works with ISO-8859-1" do
54
+ check "AB\x81".force_encoding("ISO-8859-1"), /LATIN CAPITAL LETTER A.*LATIN CAPITAL LETTER B.*<control-0081> HIGH OCTET PRESET/m
55
+ end
56
+ end
57
+
58
+ describe "example compositions" do
59
+ describe "combining marks" do
60
+ it "DIAERESIS" do
61
+ check "g̈", /Composition.*LATIN SMALL LETTER G.*DIAERESIS/m
62
+ end
63
+
64
+ it "RING BELOW" do
65
+ check "n̥", /Composition.*LATIN SMALL LETTER N.*COMBINING RING BELOW/m
66
+ end
67
+
68
+ it "ARABIC FATHA" do
69
+ check "دَ", /Composition.*ARABIC LETTER DAL.*ARABIC FATHA/m
70
+ end
71
+
72
+ it "ACUTE ACCENT" do
73
+ check "ά", /Composition.*GREEK SMALL LETTER ALPHA.*COMBINING ACUTE ACCENT/m
74
+ end
75
+
76
+ it "HEBREW POINT HIRIQ" do
77
+ check "חִ", /Composition.*HEBREW LETTER HET.*HEBREW POINT HIRIQ/m
78
+ end
79
+
80
+ it "THAI CHARACTER SARA U" do
81
+ check "จุ", /Composition.*THAI CHARACTER CHO CHAN.*THAI CHARACTER SARA U/m
82
+ end
83
+ end
84
+
85
+ describe "misc scripts" do
86
+ if RUBY_VERSION >= "2.4.0"
87
+ it "HANGUL" do
88
+ check "ᅘᆇᇈ", /Composition.*HANGUL CHOSEONG SSANGHIEUH.*HANGUL JUNGSEONG YO-O.*HANGUL JONGSEONG NIEUN-PANSIOS/m
89
+ end
90
+
91
+ it "HANGUL 2" do
92
+ check "각", /Composition.*HANGUL CHOSEONG KIYEOK.*HANGUL JUNGSEONG A.*HANGUL JONGSEONG KIYEOK/m
93
+ end
94
+
95
+ it "HANGUL 3" do
96
+ check "ᄇᄉᄐ", /Composition.*HANGUL CHOSEONG PIEUP.*HANGUL CHOSEONG SIOS.*HANGUL CHOSEONG THIEUTH/m
97
+ end
98
+
99
+ it "TAMIL" do
100
+ check "நி", /Composition.*TAMIL SYLLABLE NI.*TAMIL LETTER NA.*TAMIL VOWEL SIGN I/m
101
+ end
102
+
103
+ it "DEVANAGARI" do
104
+ check "षि", /Composition.*DEVANAGARI LETTER SSA.*DEVANAGARI VOWEL SIGN I/m
105
+ end
106
+ end
107
+ end
108
+
109
+ describe "zwj and zwnj" do
110
+ if RUBY_VERSION >= "2.4.0"
111
+ it "ZWJ" do
112
+ check "क्‍", /Composition.*DEVANAGARI LETTER KA.*DEVANAGARI SIGN VIRAMA.*ZERO WIDTH JOINER/m
113
+ end
114
+
115
+ it "ZWNJ" do
116
+ check "t‌", /Composition.*LATIN SMALL LETTER T.*ZERO WIDTH NON-JOINER/m
117
+ end
118
+ end
119
+ end
120
+
121
+ describe "misc variations" do
122
+ it "TEXT STYLE" do
123
+ check "‼︎", /Composition.*(text style).*DOUBLE EXCLAMATION MARK.*VARIATION SELECTOR-15/m
124
+ end
125
+
126
+ it "EMOJI STYLE" do
127
+ check "‼️", /Composition.*(emoji style).*DOUBLE EXCLAMATION MARK.*VARIATION SELECTOR-16/m
128
+ end
129
+
130
+ it "DOTTED FORM" do
131
+ check "င︀", /Composition.*(dotted form).*MYANMAR LETTER NGA.*VARIATION SELECTOR-1/m
132
+ end
133
+
134
+ it "MONGOLIAN SECOND FORM" do
135
+ check "ᠠ᠋", /Composition.*(second form).*MONGOLIAN LETTER A.*MONGOLIAN FREE VARIATION SELECTOR ONE/m
136
+ end
137
+
138
+ it "CJK COMPATIBILITY IDEOGRAPH-2F81F" do
139
+ check "㓟︀", /Composition.*CJK COMPATIBILITY IDEOGRAPH-2F81F.*CJK UNIFIED IDEOGRAPH-34DF.*VARIATION SELECTOR-1/m
140
+ end
141
+
142
+ it "CID+6238" do
143
+ check "胥󠄀", /Composition.*CID\+6238.*CJK UNIFIED IDEOGRAPH-80E5.*VARIATION SELECTOR-17/m
144
+ end
145
+ end
146
+
147
+ describe "misc other" do
148
+ it "KEYCAP" do
149
+ check "5⃣", /Composition.*DIGIT FIVE.*COMBINING ENCLOSING KEYCAP/m
150
+ end
151
+
152
+ if RUBY_VERSION >= "2.4.0"
153
+ it "␍ + ␊" do
154
+ check "\r\n", /Composition.*<control-000D> CARRIAGE RETURN.*<control-000A> LINE FEED/m
155
+ end
156
+
157
+ it "REGIONAL" do
158
+ check "🇺🇳", /Composition.*UNITED NATIONS.*REGIONAL INDICATOR SYMBOL LETTER U.*REGIONAL INDICATOR SYMBOL LETTER N/m
159
+ end
160
+
161
+ it "TAG SEQUENCE" do
162
+ check "🏴󠁧󠁢󠁳󠁣󠁴󠁿", /Composition.*SCOTLAND.*WAVING BLACK FLAG.*TAG LATIN SMALL LETTER G.*TAG LATIN SMALL LETTER B.*TAG LATIN SMALL LETTER S.*TAG LATIN SMALL LETTER C.*TAG LATIN SMALL LETTER T.*CANCEL TAG/m
163
+ end
164
+
165
+ it "EMOJI MODIFIER" do
166
+ check "🙅🏿", /Composition.*PERSON GESTURING NO: DARK SKIN TONE.*FACE WITH NO GOOD GESTURE.*EMOJI MODIFIER FITZPATRICK TYPE-6/m
167
+ end
168
+
169
+ it "EMOJI ZWJ SEQUENCE" do
170
+ check "👩‍👩‍👦‍👦", /Composition.*FAMILY.*WOMAN.*ZERO WIDTH JOINER.*WOMAN.*ZERO WIDTH JOINER.*BOY.*ZERO WIDTH JOINER.*BOY/m
171
+ end
172
+ end
173
+ end
174
+ end
175
+
176
+ describe "unusual codepoints" do
177
+ if RUBY_VERSION >= "2.4.0"
178
+ it "safely prints and highlights unusual codepoints" do
179
+ check "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}", /<control-0000> NULL.*Composition.*LATIN CAPITAL LETTER A.*VARIATION SELECTOR-232.*<control-007F> DELETE.*Composition.*<control-000D> CARRIAGE RETURN.*<control-000A> LINE FEED.*<reserved-D0000>.*<control-0081> HIGH OCTET PRESET.*INTERLINEAR ANNOTATION ANCHOR.*LATIN CAPITAL LETTER B.*INTERLINEAR ANNOTATION TERMINATOR.*Composition.*WAVING BLACK FLAG.*TAG LATIN SMALL LETTER A.*CANCEL TAG.*<noncharacter-10FFFF>/m
180
+ end
181
+ end
182
+
183
+ it "safely prints and highlights various blanks" do
184
+ check "­ᅠ 𝅸", /SOFT HYPHEN.*HANGUL JUNGSEONG FILLER.*EM QUAD.*INHIBIT ARABIC FORM SHAPING.*ZERO WIDTH NO-BREAK SPACE.*MUSICAL SYMBOL END SLUR/m
185
+ end
186
+ end
187
+ end
@@ -0,0 +1,29 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require File.dirname(__FILE__) + "/lib/uniscribe/version"
4
+
5
+ Gem::Specification.new do |gem|
6
+ gem.name = "uniscribe"
7
+ gem.version = Uniscribe::VERSION
8
+ gem.summary = "Describes Unicode characters."
9
+ gem.description = "Describes Unicode characters with their name and shows compositions."
10
+ gem.authors = ["Jan Lelis"]
11
+ gem.email = ["mail@janlelis.de"]
12
+ gem.homepage = "https://github.com/janlelis/uniscribe"
13
+ gem.license = "MIT"
14
+
15
+ gem.files = Dir["{**/}{.*,*}"].select{ |path| File.file?(path) && path !~ /^(pkg|screenshots)/}
16
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
17
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
18
+ gem.require_paths = ["lib"]
19
+
20
+ gem.required_ruby_version = "~> 2.1"
21
+ gem.add_dependency "unicode-name", "~> 1.4", ">= 1.4.2"
22
+ gem.add_dependency "unicode-sequence_name", "~> 1.0"
23
+ gem.add_dependency "unicode-display_width", "~> 1.2", ">= 1.2.1"
24
+ gem.add_dependency "unicode-emoji", ">= 0.9", "< 2.0"
25
+ gem.add_dependency "symbolify", "~> 1.2"
26
+ gem.add_dependency "characteristics", ">= 0.7", "< 2.0"
27
+ gem.add_dependency "paint", ">= 0.9", "< 3.0"
28
+ gem.add_dependency "rationalist", "~> 2.0"
29
+ end
metadata ADDED
@@ -0,0 +1,203 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: uniscribe
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Jan Lelis
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-04-17 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode-name
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.4'
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 1.4.2
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - "~>"
28
+ - !ruby/object:Gem::Version
29
+ version: '1.4'
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: 1.4.2
33
+ - !ruby/object:Gem::Dependency
34
+ name: unicode-sequence_name
35
+ requirement: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '1.0'
40
+ type: :runtime
41
+ prerelease: false
42
+ version_requirements: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - "~>"
45
+ - !ruby/object:Gem::Version
46
+ version: '1.0'
47
+ - !ruby/object:Gem::Dependency
48
+ name: unicode-display_width
49
+ requirement: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - "~>"
52
+ - !ruby/object:Gem::Version
53
+ version: '1.2'
54
+ - - ">="
55
+ - !ruby/object:Gem::Version
56
+ version: 1.2.1
57
+ type: :runtime
58
+ prerelease: false
59
+ version_requirements: !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - "~>"
62
+ - !ruby/object:Gem::Version
63
+ version: '1.2'
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: 1.2.1
67
+ - !ruby/object:Gem::Dependency
68
+ name: unicode-emoji
69
+ requirement: !ruby/object:Gem::Requirement
70
+ requirements:
71
+ - - ">="
72
+ - !ruby/object:Gem::Version
73
+ version: '0.9'
74
+ - - "<"
75
+ - !ruby/object:Gem::Version
76
+ version: '2.0'
77
+ type: :runtime
78
+ prerelease: false
79
+ version_requirements: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: '0.9'
84
+ - - "<"
85
+ - !ruby/object:Gem::Version
86
+ version: '2.0'
87
+ - !ruby/object:Gem::Dependency
88
+ name: symbolify
89
+ requirement: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - "~>"
92
+ - !ruby/object:Gem::Version
93
+ version: '1.2'
94
+ type: :runtime
95
+ prerelease: false
96
+ version_requirements: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - "~>"
99
+ - !ruby/object:Gem::Version
100
+ version: '1.2'
101
+ - !ruby/object:Gem::Dependency
102
+ name: characteristics
103
+ requirement: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: '0.7'
108
+ - - "<"
109
+ - !ruby/object:Gem::Version
110
+ version: '2.0'
111
+ type: :runtime
112
+ prerelease: false
113
+ version_requirements: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0.7'
118
+ - - "<"
119
+ - !ruby/object:Gem::Version
120
+ version: '2.0'
121
+ - !ruby/object:Gem::Dependency
122
+ name: paint
123
+ requirement: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - ">="
126
+ - !ruby/object:Gem::Version
127
+ version: '0.9'
128
+ - - "<"
129
+ - !ruby/object:Gem::Version
130
+ version: '3.0'
131
+ type: :runtime
132
+ prerelease: false
133
+ version_requirements: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: '0.9'
138
+ - - "<"
139
+ - !ruby/object:Gem::Version
140
+ version: '3.0'
141
+ - !ruby/object:Gem::Dependency
142
+ name: rationalist
143
+ requirement: !ruby/object:Gem::Requirement
144
+ requirements:
145
+ - - "~>"
146
+ - !ruby/object:Gem::Version
147
+ version: '2.0'
148
+ type: :runtime
149
+ prerelease: false
150
+ version_requirements: !ruby/object:Gem::Requirement
151
+ requirements:
152
+ - - "~>"
153
+ - !ruby/object:Gem::Version
154
+ version: '2.0'
155
+ description: Describes Unicode characters with their name and shows compositions.
156
+ email:
157
+ - mail@janlelis.de
158
+ executables:
159
+ - uniscribe
160
+ extensions: []
161
+ extra_rdoc_files: []
162
+ files:
163
+ - ".gitignore"
164
+ - ".travis.yml"
165
+ - CHANGELOG.md
166
+ - CODE_OF_CONDUCT.md
167
+ - Gemfile
168
+ - Gemfile.lock
169
+ - MIT-LICENSE.txt
170
+ - README.md
171
+ - Rakefile
172
+ - bin/uniscribe
173
+ - lib/uniscribe.rb
174
+ - lib/uniscribe/kernel_method.rb
175
+ - lib/uniscribe/version.rb
176
+ - spec/uniscribe_spec.rb
177
+ - uniscribe.gemspec
178
+ homepage: https://github.com/janlelis/uniscribe
179
+ licenses:
180
+ - MIT
181
+ metadata: {}
182
+ post_install_message:
183
+ rdoc_options: []
184
+ require_paths:
185
+ - lib
186
+ required_ruby_version: !ruby/object:Gem::Requirement
187
+ requirements:
188
+ - - "~>"
189
+ - !ruby/object:Gem::Version
190
+ version: '2.1'
191
+ required_rubygems_version: !ruby/object:Gem::Requirement
192
+ requirements:
193
+ - - ">="
194
+ - !ruby/object:Gem::Version
195
+ version: '0'
196
+ requirements: []
197
+ rubyforge_project:
198
+ rubygems_version: 2.6.8
199
+ signing_key:
200
+ specification_version: 4
201
+ summary: Describes Unicode characters.
202
+ test_files:
203
+ - spec/uniscribe_spec.rb