characteristics 0.6.0 → 0.7.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +49 -14
- data/lib/characteristics/unicode.rb +93 -9
- data/lib/characteristics/version.rb +1 -1
- data/spec/characteristics_spec.rb +30 -7
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8bd09e16bed3587eaa18057790766a6e720e3d25
|
4
|
+
data.tar.gz: 0bb97a8df68aa447fce5b04750573ef9da8a7373
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0519adb1524a00105cb7bc54b329aadac50ad2fd350bbe0fda9356474e038a52796ad6548cd3a33865b943f1fd66ae22c5c6bf0673c8528b31cfdf2b5296626e'
|
7
|
+
data.tar.gz: 4a10cc4ec190ca4ed5fdf075449ce8d655182a70cc09faeb75e3805a02a0c7798b762901e6ff881723170f5a7bf51bc119c4af18c44be51c8fb803722db956f7
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -1,14 +1,17 @@
|
|
1
1
|
# Characteristics [![[version]](https://badge.fury.io/rb/characteristics.svg)](http://badge.fury.io/rb/characteristics) [![[travis]](https://travis-ci.org/janlelis/characteristics.svg)](https://travis-ci.org/janlelis/characteristics)
|
2
2
|
|
3
|
-
A Ruby library
|
3
|
+
A Ruby library that provides additional info about characters:¹
|
4
4
|
|
5
|
-
-
|
5
|
+
- Could a character be invisible (blank)?
|
6
6
|
- Is a character assigned?
|
7
7
|
- Is a character a special control character?
|
8
|
-
|
8
|
+
|
9
|
+
Extra data is available for Unicode characters (see below).
|
9
10
|
|
10
11
|
The [unibits](https://github.com/janlelis/unibits) and [uniscribe](https://github.com/janlelis/uniscribe) gems makes use of this data to visualize it accordingliy.
|
11
12
|
|
13
|
+
¹ in the sense of [codepoints](https://en.wikipedia.org/wiki/Codepoint)
|
14
|
+
|
12
15
|
## Setup
|
13
16
|
|
14
17
|
Add to your `Gemfile`:
|
@@ -20,6 +23,7 @@ gem 'characteristics'
|
|
20
23
|
## Usage
|
21
24
|
|
22
25
|
```ruby
|
26
|
+
# All supported encodings
|
23
27
|
char_info = Characteristics.create(character)
|
24
28
|
char_info.valid? # => true / false
|
25
29
|
char_info.unicode? # => true / false
|
@@ -28,6 +32,13 @@ char_info.control? # => true / false
|
|
28
32
|
char_info.blank? # => true / false
|
29
33
|
char_info.separator? # => true / false
|
30
34
|
char_info.format? # => true / false
|
35
|
+
|
36
|
+
# Unicode characters
|
37
|
+
char_info = Characteristics.create(character)
|
38
|
+
char_info.variation_selector? # => true / false
|
39
|
+
char_info.tag? # => true / false
|
40
|
+
char_info.ignorable? # => true / false
|
41
|
+
char_info.noncharacter? # => true / false
|
31
42
|
```
|
32
43
|
|
33
44
|
## Types of Encodings
|
@@ -43,44 +54,68 @@ This library knows of four different kinds of encodings:
|
|
43
54
|
- **:binary** Arbitrary string
|
44
55
|
- *ASCII-8BIT*
|
45
56
|
|
46
|
-
Other encodings are not supported
|
57
|
+
Other encodings are currently not supported.
|
58
|
+
|
59
|
+
## Properties
|
47
60
|
|
48
|
-
|
61
|
+
### General
|
49
62
|
|
50
|
-
|
63
|
+
#### `valid?`
|
51
64
|
|
52
65
|
Validness is determined by Ruby's `String#valid_encoding?`
|
53
66
|
|
54
|
-
|
67
|
+
#### `unicode?`
|
55
68
|
|
56
|
-
|
69
|
+
**true** for Unicode encodings (`UTF-X`)
|
57
70
|
|
58
|
-
|
71
|
+
#### `control?`
|
59
72
|
|
60
73
|
Control characters are codepoints in the is [C0, delete or C1 control character range](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). Characters in this range of [IBM codepage 437](https://en.wikipedia.org/wiki/Code_page_437) based encodings are always treated as control characters.
|
61
74
|
|
62
|
-
|
75
|
+
#### `assigned?`
|
63
76
|
|
64
77
|
- All valid ASCII and BINARY characters are considered assigned
|
65
78
|
- For other byte based encodings, a character is considered assigned if it is not on the exception list included in this library. C0 control characters (and `\x7F`) are always considered assigned. C1 control characters are treated as assigned, if the encoding generally does not assign characters in the C1 region.
|
66
79
|
- For Unicode, the general category is considered
|
67
80
|
|
68
|
-
|
81
|
+
#### `blank?`
|
69
82
|
|
70
83
|
The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`, and `\u{85}` in Unicode), or special formatting characters (right-to-left markers, variation selectors, etc).
|
71
84
|
|
72
|
-
|
85
|
+
#### `separator?`
|
73
86
|
|
74
87
|
Returns true if character is considered a separator. All separators also return true for the `blank?` check. In Unicode, the following characters are separators: `\n`, `\v`, `\f`, `\r`, `\u{85}` (next line), `\u{2028}` (line separator), and `\u{2029}` (paragraph separator)
|
75
88
|
|
76
|
-
|
89
|
+
#### `format?`
|
90
|
+
|
91
|
+
This flag is *true* only for special formatting characters, which are not control characters, like right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
|
92
|
+
|
93
|
+
### Additional Unicode Properties
|
77
94
|
|
78
|
-
|
95
|
+
#### `variation_selector?`
|
96
|
+
|
97
|
+
**true** for [variation selectors](https://en.wikipedia.org/wiki/Variation_Selector).
|
98
|
+
|
99
|
+
#### `tag?`
|
100
|
+
|
101
|
+
**true** for [tags](https://en.wikipedia.org/wiki/Tags_(Unicode_block)).
|
102
|
+
|
103
|
+
#### `ignorable?`
|
104
|
+
|
105
|
+
**true** for characters which might not be implemented, and thus, might render no visible glyph.
|
106
|
+
|
107
|
+
#### `noncharacter?`
|
108
|
+
|
109
|
+
**true** if codepoint will never be assigned in a future standard of Unicode.
|
79
110
|
|
80
111
|
## Todo
|
81
112
|
|
82
113
|
- Support all non-dummy encodings that Ruby supports
|
83
114
|
|
115
|
+
## Also See
|
116
|
+
|
117
|
+
- [Symbolify](https://github.com/janlelis/symbolify)
|
118
|
+
|
84
119
|
## MIT License
|
85
120
|
|
86
121
|
Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
|
@@ -91,6 +91,76 @@ class UnicodeCharacteristics < Characteristics
|
|
91
91
|
0x2069,
|
92
92
|
].freeze
|
93
93
|
|
94
|
+
VARIATION_SELECTORS = [
|
95
|
+
*0x180B..0x180D,
|
96
|
+
*0xFE00..0xFE0F,
|
97
|
+
*0xE0100..0xE01EF,
|
98
|
+
].freeze
|
99
|
+
|
100
|
+
TAGS = [
|
101
|
+
0xE0001,
|
102
|
+
*0xE0020..0xE007F,
|
103
|
+
].freeze
|
104
|
+
|
105
|
+
NONCHARACTERS = [
|
106
|
+
*0xFDD0..0xFDEF,
|
107
|
+
0xFFFE, 0xFFFF,
|
108
|
+
0x1FFFE, 0x1FFFF,
|
109
|
+
0x2FFFE, 0x2FFFF,
|
110
|
+
0x3FFFE, 0x3FFFF,
|
111
|
+
0x4FFFE, 0x4FFFF,
|
112
|
+
0x5FFFE, 0x5FFFF,
|
113
|
+
0x6FFFE, 0x6FFFF,
|
114
|
+
0x7FFFE, 0x7FFFF,
|
115
|
+
0x8FFFE, 0x8FFFF,
|
116
|
+
0x9FFFE, 0x9FFFF,
|
117
|
+
0xAFFFE, 0xAFFFF,
|
118
|
+
0xBFFFE, 0xBFFFF,
|
119
|
+
0xCFFFE, 0xCFFFF,
|
120
|
+
0xDFFFE, 0xDFFFF,
|
121
|
+
0xEFFFE, 0xEFFFF,
|
122
|
+
0xFFFFE, 0xFFFFF,
|
123
|
+
0x10FFFE, 0x10FFFF,
|
124
|
+
].freeze
|
125
|
+
|
126
|
+
IGNORABLE = [
|
127
|
+
0x00AD,
|
128
|
+
0x034F,
|
129
|
+
0x061C,
|
130
|
+
*0x115F..0x1160,
|
131
|
+
*0x17B4..0x17B5,
|
132
|
+
*0x180B..0x180E,
|
133
|
+
*0x200B..0x200F,
|
134
|
+
*0x202A..0x202E,
|
135
|
+
*0x2060..0x206F,
|
136
|
+
0x3164,
|
137
|
+
*0xFE00..0xFE0F,
|
138
|
+
0xFEFF,
|
139
|
+
0xFFA0,
|
140
|
+
*0xFFF0..0xFFF8,
|
141
|
+
*0x1BCA0..0x1BCA3,
|
142
|
+
*0x1D173..0x1D17A,
|
143
|
+
*0xE0000..0xE0FFF,
|
144
|
+
].freeze
|
145
|
+
|
146
|
+
KDDI = [
|
147
|
+
*0xE468..0xE5DF,
|
148
|
+
*0xEA80..0xEB8E,
|
149
|
+
].freeze
|
150
|
+
|
151
|
+
SOFTBANK = [
|
152
|
+
*0xE001..0xE05A,
|
153
|
+
*0xE101..0xE15A,
|
154
|
+
*0xE201..0xE25A,
|
155
|
+
*0xE301..0xE34D,
|
156
|
+
*0xE401..0xE44C,
|
157
|
+
*0xE501..0xE53E,
|
158
|
+
].freeze
|
159
|
+
|
160
|
+
DOCOMO = [
|
161
|
+
*0xE63E..0xE757,
|
162
|
+
].freeze
|
163
|
+
|
94
164
|
attr_reader :category
|
95
165
|
|
96
166
|
def initialize(char)
|
@@ -142,28 +212,42 @@ class UnicodeCharacteristics < Characteristics
|
|
142
212
|
@is_valid && BIDI_CONTROL.include?(@ord)
|
143
213
|
end
|
144
214
|
|
215
|
+
# unicode specific
|
216
|
+
|
217
|
+
def variation_selector?
|
218
|
+
@is_valid && VARIATION_SELECTORS.include?(@ord)
|
219
|
+
end
|
220
|
+
|
221
|
+
def tag?
|
222
|
+
@is_valid && TAGS.include?(@ord)
|
223
|
+
end
|
224
|
+
|
225
|
+
def noncharacter?
|
226
|
+
@is_valid && NONCHARACTERS.include?(@ord)
|
227
|
+
end
|
228
|
+
|
229
|
+
def ignorable?
|
230
|
+
@is_valid && IGNORABLE.include?(@ord)
|
231
|
+
end
|
232
|
+
|
233
|
+
# emoji
|
234
|
+
|
145
235
|
def kddi?
|
146
236
|
@is_valid &&
|
147
237
|
encoding_has_kddi? &&
|
148
|
-
(
|
149
|
-
@ord >= 0xEA80 && @ord <= 0xEB8E )
|
238
|
+
KDDI.include?(@ord)
|
150
239
|
end
|
151
240
|
|
152
241
|
def softbank?
|
153
242
|
@is_valid &&
|
154
243
|
encoding_has_softbank? &&
|
155
|
-
(
|
156
|
-
@ord >= 0xE101 && @ord <= 0xE15A ||
|
157
|
-
@ord >= 0xE201 && @ord <= 0xE25A ||
|
158
|
-
@ord >= 0xE301 && @ord <= 0xE34D ||
|
159
|
-
@ord >= 0xE401 && @ord <= 0xE44C ||
|
160
|
-
@ord >= 0xE501 && @ord <= 0xE53E )
|
244
|
+
SOFTBANK.include?(@ord)
|
161
245
|
end
|
162
246
|
|
163
247
|
def docomo?
|
164
248
|
@is_valid &&
|
165
249
|
encoding_has_docomo? &&
|
166
|
-
(
|
250
|
+
DOCOMO.include?(@ord)
|
167
251
|
end
|
168
252
|
|
169
253
|
private
|
@@ -41,13 +41,13 @@ describe Characteristics do
|
|
41
41
|
|
42
42
|
it "is assigned or not" do
|
43
43
|
assert assigned? "\x21"
|
44
|
-
refute assigned? "\
|
44
|
+
refute assigned? "\u{FFEF}"
|
45
45
|
end
|
46
46
|
|
47
47
|
it "is control or not" do
|
48
48
|
assert control? "\x1E"
|
49
49
|
assert control? "\x7F"
|
50
|
-
assert control? "\
|
50
|
+
assert control? "\u{0080}"
|
51
51
|
refute control? "\x67"
|
52
52
|
end
|
53
53
|
|
@@ -62,32 +62,55 @@ describe Characteristics do
|
|
62
62
|
end
|
63
63
|
|
64
64
|
it "is format or not" do
|
65
|
-
assert format? "\
|
65
|
+
assert format? "\u{FFF9}"
|
66
66
|
refute format? "\x21"
|
67
67
|
end
|
68
68
|
|
69
69
|
it "is bidi_control or not" do
|
70
|
-
assert bidi_control? "\
|
70
|
+
assert bidi_control? "\u{202D}"
|
71
71
|
refute bidi_control? "\x21"
|
72
72
|
end
|
73
73
|
end
|
74
74
|
|
75
|
+
describe "Unicode Properties" do
|
76
|
+
it "is variation_selector or not" do
|
77
|
+
assert Characteristics.create("\u{FE00}").variation_selector?
|
78
|
+
refute Characteristics.create("a").variation_selector?
|
79
|
+
end
|
80
|
+
|
81
|
+
it "is tag or not" do
|
82
|
+
assert Characteristics.create("\u{E0020}").tag?
|
83
|
+
refute Characteristics.create("a").tag?
|
84
|
+
end
|
85
|
+
|
86
|
+
it "is noncharacter or not" do
|
87
|
+
assert Characteristics.create("\u{10FFFF}").noncharacter?
|
88
|
+
refute Characteristics.create("a").noncharacter?
|
89
|
+
end
|
90
|
+
|
91
|
+
it "is ignorable or not" do
|
92
|
+
assert Characteristics.create("\u{AD}").ignorable?
|
93
|
+
assert Characteristics.create("\u{E0000}").ignorable?
|
94
|
+
refute Characteristics.create(" ").ignorable?
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
75
98
|
describe "Japanese Emojis" do
|
76
99
|
it "can be a KDDI emoji" do
|
77
100
|
encoding = "UTF8-KDDI"
|
78
|
-
assert Characteristics.create("\
|
101
|
+
assert Characteristics.create("\u{E468}".force_encoding(encoding)).kddi?
|
79
102
|
refute Characteristics.create("A".force_encoding(encoding)).kddi?
|
80
103
|
end
|
81
104
|
|
82
105
|
it "can be a SoftBank emoji" do
|
83
106
|
encoding = "UTF8-SoftBank"
|
84
|
-
assert Characteristics.create("\
|
107
|
+
assert Characteristics.create("\u{E001}".force_encoding(encoding)).softbank?
|
85
108
|
refute Characteristics.create("A".force_encoding(encoding)).softbank?
|
86
109
|
end
|
87
110
|
|
88
111
|
it "can be a DoCoMo emoji" do
|
89
112
|
encoding = "UTF8-DoCoMo"
|
90
|
-
assert Characteristics.create("\
|
113
|
+
assert Characteristics.create("\u{E63E}".force_encoding(encoding)).docomo?
|
91
114
|
refute Characteristics.create("A".force_encoding(encoding)).docomo?
|
92
115
|
end
|
93
116
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: characteristics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-03-
|
11
|
+
date: 2017-03-31 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: unicode-categories
|