characteristics 0.6.0 → 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 95c672e25105f6a9281caee75f58cc242e50bee5
4
- data.tar.gz: 17fe74e2842c0175b284e9a0b170561aba80c7af
3
+ metadata.gz: 8bd09e16bed3587eaa18057790766a6e720e3d25
4
+ data.tar.gz: 0bb97a8df68aa447fce5b04750573ef9da8a7373
5
5
  SHA512:
6
- metadata.gz: f12e4d9b7b471019a606060166629a99e950b1e5fc3d44f933da70239f2cbc5ed4ac8023b9bb79998709a6bed14c3681dd141f1894c11ddec14649e31c68d841
7
- data.tar.gz: 5110369bc3619f9e3ddc23c9bc6dd5abd7ff0e8ffdfd46889fbc2b04aea1d7611de607852ed0dfab04befa330e8c3e8c3afe5fa680ff0cb16ace232a5a968f6d
6
+ metadata.gz: '0519adb1524a00105cb7bc54b329aadac50ad2fd350bbe0fda9356474e038a52796ad6548cd3a33865b943f1fd66ae22c5c6bf0673c8528b31cfdf2b5296626e'
7
+ data.tar.gz: 4a10cc4ec190ca4ed5fdf075449ce8d655182a70cc09faeb75e3805a02a0c7798b762901e6ff881723170f5a7bf51bc119c4af18c44be51c8fb803722db956f7
@@ -1,5 +1,13 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 0.7.0
4
+
5
+ * Add more Unicode properties
6
+ * variation_selector?
7
+ * tag?
8
+ * ignorable?
9
+ * noncharacter?
10
+
3
11
  ### 0.6.0
4
12
 
5
13
  * Add separator? property
data/README.md CHANGED
@@ -1,14 +1,17 @@
1
1
  # Characteristics [![[version]](https://badge.fury.io/rb/characteristics.svg)](http://badge.fury.io/rb/characteristics) [![[travis]](https://travis-ci.org/janlelis/characteristics.svg)](https://travis-ci.org/janlelis/characteristics)
2
2
 
3
- A Ruby library which provides some basic information about how characters behave in different encodings:
3
+ A Ruby library that provides additional info about characters
4
4
 
5
- - Is a character valid according to its encoding?
5
+ - Could a character be invisible (blank)?
6
6
  - Is a character assigned?
7
7
  - Is a character a special control character?
8
- - Could a character be invisible (blank)?
8
+
9
+ Extra data is available for Unicode characters (see below).
9
10
 
10
11
  The [unibits](https://github.com/janlelis/unibits) and [uniscribe](https://github.com/janlelis/uniscribe) gems makes use of this data to visualize it accordingliy.
11
12
 
13
+ ¹ in the sense of [codepoints](https://en.wikipedia.org/wiki/Codepoint)
14
+
12
15
  ## Setup
13
16
 
14
17
  Add to your `Gemfile`:
@@ -20,6 +23,7 @@ gem 'characteristics'
20
23
  ## Usage
21
24
 
22
25
  ```ruby
26
+ # All supported encodings
23
27
  char_info = Characteristics.create(character)
24
28
  char_info.valid? # => true / false
25
29
  char_info.unicode? # => true / false
@@ -28,6 +32,13 @@ char_info.control? # => true / false
28
32
  char_info.blank? # => true / false
29
33
  char_info.separator? # => true / false
30
34
  char_info.format? # => true / false
35
+
36
+ # Unicode characters
37
+ char_info = Characteristics.create(character)
38
+ char_info.variation_selector? # => true / false
39
+ char_info.tag? # => true / false
40
+ char_info.ignorable? # => true / false
41
+ char_info.noncharacter? # => true / false
31
42
  ```
32
43
 
33
44
  ## Types of Encodings
@@ -43,44 +54,68 @@ This library knows of four different kinds of encodings:
43
54
  - **:binary** Arbitrary string
44
55
  - *ASCII-8BIT*
45
56
 
46
- Other encodings are not supported, yet.
57
+ Other encodings are currently not supported.
58
+
59
+ ## Properties
47
60
 
48
- ## Predicates
61
+ ### General
49
62
 
50
- ### `valid?`
63
+ #### `valid?`
51
64
 
52
65
  Validness is determined by Ruby's `String#valid_encoding?`
53
66
 
54
- ### `unicode?`
67
+ #### `unicode?`
55
68
 
56
- `true` for Unicode encodings (`UTF-X`)
69
+ **true** for Unicode encodings (`UTF-X`)
57
70
 
58
- ### `control?`
71
+ #### `control?`
59
72
 
60
73
  Control characters are codepoints in the is [C0, delete or C1 control character range](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). Characters in this range of [IBM codepage 437](https://en.wikipedia.org/wiki/Code_page_437) based encodings are always treated as control characters.
61
74
 
62
- ### `assigned?`
75
+ #### `assigned?`
63
76
 
64
77
  - All valid ASCII and BINARY characters are considered assigned
65
78
  - For other byte based encodings, a character is considered assigned if it is not on the exception list included in this library. C0 control characters (and `\x7F`) are always considered assigned. C1 control characters are treated as assigned, if the encoding generally does not assign characters in the C1 region.
66
79
  - For Unicode, the general category is considered
67
80
 
68
- ### `blank?`
81
+ #### `blank?`
69
82
 
70
83
  The library includes a list of characters that might not be rendered visually. This list does not include unassigned codepoints, control characters (except for `\t`, `\n`, `\v`, `\f`, `\r`, and `\u{85}` in Unicode), or special formatting characters (right-to-left markers, variation selectors, etc).
71
84
 
72
- ### `separator?`
85
+ #### `separator?`
73
86
 
74
87
  Returns true if character is considered a separator. All separators also return true for the `blank?` check. In Unicode, the following characters are separators: `\n`, `\v`, `\f`, `\r`, `\u{85}` (next line), `\u{2028}` (line separator), and `\u{2029}` (paragraph separator)
75
88
 
76
- ### `format?`
89
+ #### `format?`
90
+
91
+ This flag is *true* only for special formatting characters, which are not control characters, like right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
92
+
93
+ ### Additional Unicode Properties
77
94
 
78
- This flag is `true` only for special formatting characters, which are not control characters, like Right-to-left marks. In Unicode, this means codepoints with the General Category of **Cf**.
95
+ #### `variation_selector?`
96
+
97
+ **true** for [variation selectors](https://en.wikipedia.org/wiki/Variation_Selector).
98
+
99
+ #### `tag?`
100
+
101
+ **true** for [tags](https://en.wikipedia.org/wiki/Tags_(Unicode_block)).
102
+
103
+ #### `ignorable?`
104
+
105
+ **true** for characters which might not be implemented, and thus, might render no visible glyph.
106
+
107
+ #### `noncharacter?`
108
+
109
+ **true** if codepoint will never be assigned in a future standard of Unicode.
79
110
 
80
111
  ## Todo
81
112
 
82
113
  - Support all non-dummy encodings that Ruby supports
83
114
 
115
+ ## Also See
116
+
117
+ - [Symbolify](https://github.com/janlelis/symbolify)
118
+
84
119
  ## MIT License
85
120
 
86
121
  Copyright (C) 2017 Jan Lelis <http://janlelis.com>. Released under the MIT license.
@@ -91,6 +91,76 @@ class UnicodeCharacteristics < Characteristics
91
91
  0x2069,
92
92
  ].freeze
93
93
 
94
+ VARIATION_SELECTORS = [
95
+ *0x180B..0x180D,
96
+ *0xFE00..0xFE0F,
97
+ *0xE0100..0xE01EF,
98
+ ].freeze
99
+
100
+ TAGS = [
101
+ 0xE0001,
102
+ *0xE0020..0xE007F,
103
+ ].freeze
104
+
105
+ NONCHARACTERS = [
106
+ *0xFDD0..0xFDEF,
107
+ 0xFFFE, 0xFFFF,
108
+ 0x1FFFE, 0x1FFFF,
109
+ 0x2FFFE, 0x2FFFF,
110
+ 0x3FFFE, 0x3FFFF,
111
+ 0x4FFFE, 0x4FFFF,
112
+ 0x5FFFE, 0x5FFFF,
113
+ 0x6FFFE, 0x6FFFF,
114
+ 0x7FFFE, 0x7FFFF,
115
+ 0x8FFFE, 0x8FFFF,
116
+ 0x9FFFE, 0x9FFFF,
117
+ 0xAFFFE, 0xAFFFF,
118
+ 0xBFFFE, 0xBFFFF,
119
+ 0xCFFFE, 0xCFFFF,
120
+ 0xDFFFE, 0xDFFFF,
121
+ 0xEFFFE, 0xEFFFF,
122
+ 0xFFFFE, 0xFFFFF,
123
+ 0x10FFFE, 0x10FFFF,
124
+ ].freeze
125
+
126
+ IGNORABLE = [
127
+ 0x00AD,
128
+ 0x034F,
129
+ 0x061C,
130
+ *0x115F..0x1160,
131
+ *0x17B4..0x17B5,
132
+ *0x180B..0x180E,
133
+ *0x200B..0x200F,
134
+ *0x202A..0x202E,
135
+ *0x2060..0x206F,
136
+ 0x3164,
137
+ *0xFE00..0xFE0F,
138
+ 0xFEFF,
139
+ 0xFFA0,
140
+ *0xFFF0..0xFFF8,
141
+ *0x1BCA0..0x1BCA3,
142
+ *0x1D173..0x1D17A,
143
+ *0xE0000..0xE0FFF,
144
+ ].freeze
145
+
146
+ KDDI = [
147
+ *0xE468..0xE5DF,
148
+ *0xEA80..0xEB8E,
149
+ ].freeze
150
+
151
+ SOFTBANK = [
152
+ *0xE001..0xE05A,
153
+ *0xE101..0xE15A,
154
+ *0xE201..0xE25A,
155
+ *0xE301..0xE34D,
156
+ *0xE401..0xE44C,
157
+ *0xE501..0xE53E,
158
+ ].freeze
159
+
160
+ DOCOMO = [
161
+ *0xE63E..0xE757,
162
+ ].freeze
163
+
94
164
  attr_reader :category
95
165
 
96
166
  def initialize(char)
@@ -142,28 +212,42 @@ class UnicodeCharacteristics < Characteristics
142
212
  @is_valid && BIDI_CONTROL.include?(@ord)
143
213
  end
144
214
 
215
+ # unicode specific
216
+
217
+ def variation_selector?
218
+ @is_valid && VARIATION_SELECTORS.include?(@ord)
219
+ end
220
+
221
+ def tag?
222
+ @is_valid && TAGS.include?(@ord)
223
+ end
224
+
225
+ def noncharacter?
226
+ @is_valid && NONCHARACTERS.include?(@ord)
227
+ end
228
+
229
+ def ignorable?
230
+ @is_valid && IGNORABLE.include?(@ord)
231
+ end
232
+
233
+ # emoji
234
+
145
235
  def kddi?
146
236
  @is_valid &&
147
237
  encoding_has_kddi? &&
148
- ( @ord >= 0xE468 && @ord <= 0xE5DF ||
149
- @ord >= 0xEA80 && @ord <= 0xEB8E )
238
+ KDDI.include?(@ord)
150
239
  end
151
240
 
152
241
  def softbank?
153
242
  @is_valid &&
154
243
  encoding_has_softbank? &&
155
- ( @ord >= 0xE001 && @ord <= 0xE05A ||
156
- @ord >= 0xE101 && @ord <= 0xE15A ||
157
- @ord >= 0xE201 && @ord <= 0xE25A ||
158
- @ord >= 0xE301 && @ord <= 0xE34D ||
159
- @ord >= 0xE401 && @ord <= 0xE44C ||
160
- @ord >= 0xE501 && @ord <= 0xE53E )
244
+ SOFTBANK.include?(@ord)
161
245
  end
162
246
 
163
247
  def docomo?
164
248
  @is_valid &&
165
249
  encoding_has_docomo? &&
166
- ( @ord >= 0xE63E && @ord <= 0xE757 )
250
+ DOCOMO.include?(@ord)
167
251
  end
168
252
 
169
253
  private
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class Characteristics
4
- VERSION = "0.6.0".freeze
4
+ VERSION = "0.7.0".freeze
5
5
  UNICODE_VERSION = "9.0.0".freeze
6
6
  end
@@ -41,13 +41,13 @@ describe Characteristics do
41
41
 
42
42
  it "is assigned or not" do
43
43
  assert assigned? "\x21"
44
- refute assigned? "\uFFEF"
44
+ refute assigned? "\u{FFEF}"
45
45
  end
46
46
 
47
47
  it "is control or not" do
48
48
  assert control? "\x1E"
49
49
  assert control? "\x7F"
50
- assert control? "\u0080"
50
+ assert control? "\u{0080}"
51
51
  refute control? "\x67"
52
52
  end
53
53
 
@@ -62,32 +62,55 @@ describe Characteristics do
62
62
  end
63
63
 
64
64
  it "is format or not" do
65
- assert format? "\uFFF9"
65
+ assert format? "\u{FFF9}"
66
66
  refute format? "\x21"
67
67
  end
68
68
 
69
69
  it "is bidi_control or not" do
70
- assert bidi_control? "\u202D"
70
+ assert bidi_control? "\u{202D}"
71
71
  refute bidi_control? "\x21"
72
72
  end
73
73
  end
74
74
 
75
+ describe "Unicode Properties" do
76
+ it "is variation_selector or not" do
77
+ assert Characteristics.create("\u{FE00}").variation_selector?
78
+ refute Characteristics.create("a").variation_selector?
79
+ end
80
+
81
+ it "is tag or not" do
82
+ assert Characteristics.create("\u{E0020}").tag?
83
+ refute Characteristics.create("a").tag?
84
+ end
85
+
86
+ it "is noncharacter or not" do
87
+ assert Characteristics.create("\u{10FFFF}").noncharacter?
88
+ refute Characteristics.create("a").noncharacter?
89
+ end
90
+
91
+ it "is ignorable or not" do
92
+ assert Characteristics.create("\u{AD}").ignorable?
93
+ assert Characteristics.create("\u{E0000}").ignorable?
94
+ refute Characteristics.create(" ").ignorable?
95
+ end
96
+ end
97
+
75
98
  describe "Japanese Emojis" do
76
99
  it "can be a KDDI emoji" do
77
100
  encoding = "UTF8-KDDI"
78
- assert Characteristics.create("\uE468".force_encoding(encoding)).kddi?
101
+ assert Characteristics.create("\u{E468}".force_encoding(encoding)).kddi?
79
102
  refute Characteristics.create("A".force_encoding(encoding)).kddi?
80
103
  end
81
104
 
82
105
  it "can be a SoftBank emoji" do
83
106
  encoding = "UTF8-SoftBank"
84
- assert Characteristics.create("\uE001".force_encoding(encoding)).softbank?
107
+ assert Characteristics.create("\u{E001}".force_encoding(encoding)).softbank?
85
108
  refute Characteristics.create("A".force_encoding(encoding)).softbank?
86
109
  end
87
110
 
88
111
  it "can be a DoCoMo emoji" do
89
112
  encoding = "UTF8-DoCoMo"
90
- assert Characteristics.create("\uE63E".force_encoding(encoding)).docomo?
113
+ assert Characteristics.create("\u{E63E}".force_encoding(encoding)).docomo?
91
114
  refute Characteristics.create("A".force_encoding(encoding)).docomo?
92
115
  end
93
116
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: characteristics
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.0
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-03-30 00:00:00.000000000 Z
11
+ date: 2017-03-31 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: unicode-categories