characteristics 0.6.0 → 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +49 -14
- data/lib/characteristics/unicode.rb +93 -9
- data/lib/characteristics/version.rb +1 -1
- data/spec/characteristics_spec.rb +30 -7
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8bd09e16bed3587eaa18057790766a6e720e3d25
|
4
|
+
data.tar.gz: 0bb97a8df68aa447fce5b04750573ef9da8a7373
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0519adb1524a00105cb7bc54b329aadac50ad2fd350bbe0fda9356474e038a52796ad6548cd3a33865b943f1fd66ae22c5c6bf0673c8528b31cfdf2b5296626e'
|
7
|
+
data.tar.gz: 4a10cc4ec190ca4ed5fdf075449ce8d655182a70cc09faeb75e3805a02a0c7798b762901e6ff881723170f5a7bf51bc119c4af18c44be51c8fb803722db956f7
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -1,14 +1,17 @@
|
|
1
1
|
# Characteristics [![[version]](https://badge.fury.io/rb/characteristics.svg)](http://badge.fury.io/rb/characteristics) [![[travis]](https://travis-ci.org/janlelis/characteristics.svg)](https://travis-ci.org/janlelis/characteristics)
|
2
2
|
|
3
|
-
A Ruby library
|
3
|
+
A Ruby library that provides additional info about characters:¹
|
4
4
|
|
5
|
-
-
|
5
|
+
- Could a character be invisible (blank)?
|
6
6
|
- Is a character assigned?
|
7
7
|
- Is a character a special control character?
|
8
|
-
|
8
|
+
|
9
|
+
Extra data is available for Unicode characters (see below).
|
9
10
|
|
10
11
|
The [unibits](https://github.com/janlelis/unibits) and [uniscribe](https://github.com/janlelis/uniscribe) gems makes use of this data to visualize it accordingliy.
|
11
12
|
|
13
|
+
¹ in the sense of [codepoints](https://en.wikipedia.org/wiki/Codepoint)
|
14
|
+
|
12
15
|
## Setup
|
13
16
|
|
14
17
|
Add to your `Gemfile`:
|
@@ -20,6 +23,7 @@ gem 'characteristics'
|
|
20
23
|
## Usage
|
21
24
|
|
22
25
|
```ruby
|
26
|
+
# All supported encodings
|
23
27
|
char_info = Characteristics.create(character)
|
24
28
|
char_info.valid? # => true / false
|
25
29
|
char_info.unicode? # => true / false
|
@@ -28,6 +32,13 @@ char_info.control? # => true / false
|
|
28
32
|
char_info.blank? # => true / false
|
29
33
|
char_info.separator? # => true / false
|
30
34
|
char_info.format? # => true / false
|
35
|
+
|
36
|
+
# Unicode characters
|
37
|
+
char_info = Characteristics.create(character)
|
38
|
+
char_info.variation_selector? # => true / false
|
39
|
+
char_info.tag? # => true / false
|
40
|
+
char_info.ignorable? # => true / false
|
41
|
+
char_info.noncharacter? # => true / false
|
31
42
|
```
|
32
43
|
|
33
44
|
## Types of Encodings
|
@@ -43,44 +54,68 @@ This library knows of four different kinds of encodings:
|
|
43
54
|
- **:binary** Arbitrary string
|
44
55
|
- *ASCII-8BIT*
|
45
56
|
|
46
|
-
Other encodings are not supported
|
57
|
+
Other encodings are currently not supported.
|
58
|
+
|
59
|
+
## Properties
|
47
60
|
|
48
|
-
|
61
|
+
### General
|
49
62
|
|
50
|
-
|
63
|
+
#### `valid?`
|
51
64
|
|
52
65
|
Validness is determined by Ruby's `String#valid_encoding?`
|
53
66
|
|
54
|
-
|
67
|
+
#### `unicode?`
|
55
68
|
|
56
|
-
|
69
|
+
**true** for Unicode encodings (`UTF-X`)
|
57
70
|
|
58
|
-
|
71
|
+
#### `control?`
|
59
72
|
|
60
73
|
Control characters are codepoints in the is [C0, delete or C1 control character range](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). Characters in this range of [IBM codepage 437](https://en.wikipedia.org/wiki/Code_page_437) based encodings are always treated as control characters.
|
61
74
|
|
62
|
-
|
75
|
+
#### `assigned?`
|
63
76
|
|
64
77
|
- All valid ASCII and BINARY characters are considered assigned
|
65
78
|
- For other byte based encodings, a character is considered assigned if it is not on the exception list included in this library. C0 control characters (and `\x7F`) are always considered assigned. C1 control characters are treated as assigned, if the encoding generally does not assign characters in the C1 region.
|
66
79
|
- For Unicode, the general category is considered
|
67
80
|
|
68
|
-
|
81
|
+
#### `blank?`
|
69
82
|
|
70
83
|
The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`, and `\u{85}` in Unicode), or special formatting characters (right-to-left markers, variation selectors, etc).
|
71
84
|
|
72
|
-
|
85
|
+
#### `separator?`
|
73
86
|
|
74
87
|
Returns true if character is considered a separator. All separators also return true for the `blank?` check. In Unicode, the following characters are separators: `\n`, `\v`, `\f`, `\r`, `\u{85}` (next line), `\u{2028}` (line separator), and `\u{2029}` (paragraph separator)
|
75
88
|
|
76
|
-
|
89
|
+
#### `format?`
|
90
|
+
|
91
|
+
This flag is *true* only for special formatting characters, which are not control characters, like right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
|
92
|
+
|
93
|
+
### Additional Unicode Properties
|
77
94
|
|
78
|
-
|
95
|
+
#### `variation_selector?`
|
96
|
+
|
97
|
+
**true** for [variation selectors](https://en.wikipedia.org/wiki/Variation_Selector).
|
98
|
+
|
99
|
+
#### `tag?`
|
100
|
+
|
101
|
+
**true** for [tags](https://en.wikipedia.org/wiki/Tags_(Unicode_block)).
|
102
|
+
|
103
|
+
#### `ignorable?`
|
104
|
+
|
105
|
+
**true** for characters which might not be implemented, and thus, might render no visible glyph.
|
106
|
+
|
107
|
+
#### `noncharacter?`
|
108
|
+
|
109
|
+
**true** if codepoint will never be assigned in a future standard of Unicode.
|
79
110
|
|
80
111
|
## Todo
|
81
112
|
|
82
113
|
- Support all non-dummy encodings that Ruby supports
|
83
114
|
|
115
|
+
## Also See
|
116
|
+
|
117
|
+
- [Symbolify](https://github.com/janlelis/symbolify)
|
118
|
+
|
84
119
|
## MIT License
|
85
120
|
|
86
121
|
Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
|
@@ -91,6 +91,76 @@ class UnicodeCharacteristics < Characteristics
|
|
91
91
|
0x2069,
|
92
92
|
].freeze
|
93
93
|
|
94
|
+
VARIATION_SELECTORS = [
|
95
|
+
*0x180B..0x180D,
|
96
|
+
*0xFE00..0xFE0F,
|
97
|
+
*0xE0100..0xE01EF,
|
98
|
+
].freeze
|
99
|
+
|
100
|
+
TAGS = [
|
101
|
+
0xE0001,
|
102
|
+
*0xE0020..0xE007F,
|
103
|
+
].freeze
|
104
|
+
|
105
|
+
NONCHARACTERS = [
|
106
|
+
*0xFDD0..0xFDEF,
|
107
|
+
0xFFFE, 0xFFFF,
|
108
|
+
0x1FFFE, 0x1FFFF,
|
109
|
+
0x2FFFE, 0x2FFFF,
|
110
|
+
0x3FFFE, 0x3FFFF,
|
111
|
+
0x4FFFE, 0x4FFFF,
|
112
|
+
0x5FFFE, 0x5FFFF,
|
113
|
+
0x6FFFE, 0x6FFFF,
|
114
|
+
0x7FFFE, 0x7FFFF,
|
115
|
+
0x8FFFE, 0x8FFFF,
|
116
|
+
0x9FFFE, 0x9FFFF,
|
117
|
+
0xAFFFE, 0xAFFFF,
|
118
|
+
0xBFFFE, 0xBFFFF,
|
119
|
+
0xCFFFE, 0xCFFFF,
|
120
|
+
0xDFFFE, 0xDFFFF,
|
121
|
+
0xEFFFE, 0xEFFFF,
|
122
|
+
0xFFFFE, 0xFFFFF,
|
123
|
+
0x10FFFE, 0x10FFFF,
|
124
|
+
].freeze
|
125
|
+
|
126
|
+
IGNORABLE = [
|
127
|
+
0x00AD,
|
128
|
+
0x034F,
|
129
|
+
0x061C,
|
130
|
+
*0x115F..0x1160,
|
131
|
+
*0x17B4..0x17B5,
|
132
|
+
*0x180B..0x180E,
|
133
|
+
*0x200B..0x200F,
|
134
|
+
*0x202A..0x202E,
|
135
|
+
*0x2060..0x206F,
|
136
|
+
0x3164,
|
137
|
+
*0xFE00..0xFE0F,
|
138
|
+
0xFEFF,
|
139
|
+
0xFFA0,
|
140
|
+
*0xFFF0..0xFFF8,
|
141
|
+
*0x1BCA0..0x1BCA3,
|
142
|
+
*0x1D173..0x1D17A,
|
143
|
+
*0xE0000..0xE0FFF,
|
144
|
+
].freeze
|
145
|
+
|
146
|
+
KDDI = [
|
147
|
+
*0xE468..0xE5DF,
|
148
|
+
*0xEA80..0xEB8E,
|
149
|
+
].freeze
|
150
|
+
|
151
|
+
SOFTBANK = [
|
152
|
+
*0xE001..0xE05A,
|
153
|
+
*0xE101..0xE15A,
|
154
|
+
*0xE201..0xE25A,
|
155
|
+
*0xE301..0xE34D,
|
156
|
+
*0xE401..0xE44C,
|
157
|
+
*0xE501..0xE53E,
|
158
|
+
].freeze
|
159
|
+
|
160
|
+
DOCOMO = [
|
161
|
+
*0xE63E..0xE757,
|
162
|
+
].freeze
|
163
|
+
|
94
164
|
attr_reader :category
|
95
165
|
|
96
166
|
def initialize(char)
|
@@ -142,28 +212,42 @@ class UnicodeCharacteristics < Characteristics
|
|
142
212
|
@is_valid && BIDI_CONTROL.include?(@ord)
|
143
213
|
end
|
144
214
|
|
215
|
+
# unicode specific
|
216
|
+
|
217
|
+
def variation_selector?
|
218
|
+
@is_valid && VARIATION_SELECTORS.include?(@ord)
|
219
|
+
end
|
220
|
+
|
221
|
+
def tag?
|
222
|
+
@is_valid && TAGS.include?(@ord)
|
223
|
+
end
|
224
|
+
|
225
|
+
def noncharacter?
|
226
|
+
@is_valid && NONCHARACTERS.include?(@ord)
|
227
|
+
end
|
228
|
+
|
229
|
+
def ignorable?
|
230
|
+
@is_valid && IGNORABLE.include?(@ord)
|
231
|
+
end
|
232
|
+
|
233
|
+
# emoji
|
234
|
+
|
145
235
|
def kddi?
|
146
236
|
@is_valid &&
|
147
237
|
encoding_has_kddi? &&
|
148
|
-
(
|
149
|
-
@ord >= 0xEA80 && @ord <= 0xEB8E )
|
238
|
+
KDDI.include?(@ord)
|
150
239
|
end
|
151
240
|
|
152
241
|
def softbank?
|
153
242
|
@is_valid &&
|
154
243
|
encoding_has_softbank? &&
|
155
|
-
(
|
156
|
-
@ord >= 0xE101 && @ord <= 0xE15A ||
|
157
|
-
@ord >= 0xE201 && @ord <= 0xE25A ||
|
158
|
-
@ord >= 0xE301 && @ord <= 0xE34D ||
|
159
|
-
@ord >= 0xE401 && @ord <= 0xE44C ||
|
160
|
-
@ord >= 0xE501 && @ord <= 0xE53E )
|
244
|
+
SOFTBANK.include?(@ord)
|
161
245
|
end
|
162
246
|
|
163
247
|
def docomo?
|
164
248
|
@is_valid &&
|
165
249
|
encoding_has_docomo? &&
|
166
|
-
(
|
250
|
+
DOCOMO.include?(@ord)
|
167
251
|
end
|
168
252
|
|
169
253
|
private
|
@@ -41,13 +41,13 @@ describe Characteristics do
|
|
41
41
|
|
42
42
|
it "is assigned or not" do
|
43
43
|
assert assigned? "\x21"
|
44
|
-
refute assigned? "\
|
44
|
+
refute assigned? "\u{FFEF}"
|
45
45
|
end
|
46
46
|
|
47
47
|
it "is control or not" do
|
48
48
|
assert control? "\x1E"
|
49
49
|
assert control? "\x7F"
|
50
|
-
assert control? "\
|
50
|
+
assert control? "\u{0080}"
|
51
51
|
refute control? "\x67"
|
52
52
|
end
|
53
53
|
|
@@ -62,32 +62,55 @@ describe Characteristics do
|
|
62
62
|
end
|
63
63
|
|
64
64
|
it "is format or not" do
|
65
|
-
assert format? "\
|
65
|
+
assert format? "\u{FFF9}"
|
66
66
|
refute format? "\x21"
|
67
67
|
end
|
68
68
|
|
69
69
|
it "is bidi_control or not" do
|
70
|
-
assert bidi_control? "\
|
70
|
+
assert bidi_control? "\u{202D}"
|
71
71
|
refute bidi_control? "\x21"
|
72
72
|
end
|
73
73
|
end
|
74
74
|
|
75
|
+
describe "Unicode Properties" do
|
76
|
+
it "is variation_selector or not" do
|
77
|
+
assert Characteristics.create("\u{FE00}").variation_selector?
|
78
|
+
refute Characteristics.create("a").variation_selector?
|
79
|
+
end
|
80
|
+
|
81
|
+
it "is tag or not" do
|
82
|
+
assert Characteristics.create("\u{E0020}").tag?
|
83
|
+
refute Characteristics.create("a").tag?
|
84
|
+
end
|
85
|
+
|
86
|
+
it "is noncharacter or not" do
|
87
|
+
assert Characteristics.create("\u{10FFFF}").noncharacter?
|
88
|
+
refute Characteristics.create("a").noncharacter?
|
89
|
+
end
|
90
|
+
|
91
|
+
it "is ignorable or not" do
|
92
|
+
assert Characteristics.create("\u{AD}").ignorable?
|
93
|
+
assert Characteristics.create("\u{E0000}").ignorable?
|
94
|
+
refute Characteristics.create(" ").ignorable?
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
75
98
|
describe "Japanese Emojis" do
|
76
99
|
it "can be a KDDI emoji" do
|
77
100
|
encoding = "UTF8-KDDI"
|
78
|
-
assert Characteristics.create("\
|
101
|
+
assert Characteristics.create("\u{E468}".force_encoding(encoding)).kddi?
|
79
102
|
refute Characteristics.create("A".force_encoding(encoding)).kddi?
|
80
103
|
end
|
81
104
|
|
82
105
|
it "can be a SoftBank emoji" do
|
83
106
|
encoding = "UTF8-SoftBank"
|
84
|
-
assert Characteristics.create("\
|
107
|
+
assert Characteristics.create("\u{E001}".force_encoding(encoding)).softbank?
|
85
108
|
refute Characteristics.create("A".force_encoding(encoding)).softbank?
|
86
109
|
end
|
87
110
|
|
88
111
|
it "can be a DoCoMo emoji" do
|
89
112
|
encoding = "UTF8-DoCoMo"
|
90
|
-
assert Characteristics.create("\
|
113
|
+
assert Characteristics.create("\u{E63E}".force_encoding(encoding)).docomo?
|
91
114
|
refute Characteristics.create("A".force_encoding(encoding)).docomo?
|
92
115
|
end
|
93
116
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: characteristics
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-03-
|
11
|
+
date: 2017-03-31 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: unicode-categories
|