twitter_cldr 1.3.6 → 1.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +47 -2
- data/lib/twitter_cldr/core_ext/calendars/datetime.rb +2 -2
- data/lib/twitter_cldr/core_ext/calendars/timespan.rb +11 -13
- data/lib/twitter_cldr/normalizers.rb +3 -0
- data/lib/twitter_cldr/normalizers/base.rb +34 -0
- data/lib/twitter_cldr/normalizers/nfc.rb +24 -0
- data/lib/twitter_cldr/normalizers/nfd.rb +1 -1
- data/lib/twitter_cldr/normalizers/nfkc.rb +126 -0
- data/lib/twitter_cldr/normalizers/nfkd.rb +9 -17
- data/lib/twitter_cldr/shared.rb +1 -1
- data/lib/twitter_cldr/shared/code_point.rb +116 -0
- data/lib/twitter_cldr/tokenizers/base.rb +2 -2
- data/lib/twitter_cldr/utils.rb +8 -0
- data/lib/twitter_cldr/version.rb +1 -1
- data/resources/unicode_data/blocks_hangul.yml +46 -0
- data/resources/unicode_data/composition_exclusions.yml +293 -0
- data/resources/unicode_data/decomposition_map.yml +4565 -0
- data/spec/normalizers/NormalizationTestShort.txt +66 -66
- data/spec/normalizers/base_spec.rb +17 -0
- data/spec/normalizers/normalization_spec.rb +10 -0
- data/spec/readme_spec.rb +26 -1
- data/spec/shared/code_point_spec.rb +152 -0
- data/spec/tokenizers/base_spec.rb +0 -10
- data/spec/utils/{code_point_spec.rb → code_points_spec.rb} +0 -0
- data/spec/utils_spec.rb +10 -0
- metadata +16 -10
- data/lib/twitter_cldr/shared/unicode_data.rb +0 -64
- data/spec/normalizers/nfd_spec.rb +0 -21
- data/spec/shared/unicode_data_spec.rb +0 -51
data/README.md
CHANGED
@@ -109,6 +109,51 @@ dt = TwitterCldr::LocalizedDateTime.new(DateTime.now, :es)
|
|
109
109
|
dt.to_short_s # ...etc
|
110
110
|
```
|
111
111
|
|
112
|
+
#### Relative Dates and Times
|
113
|
+
|
114
|
+
In addition to formatting full dates and times, TwitterCLDR supports relative time spans via several convenience methods and the `LocalizedTimespan` class. TwitterCLDR tries to guess the best time unit (eg. days, hours, minutes, etc) based on the length of the time span. Unless otherwise specified, TwitterCLDR will use the current date and time as the reference point for the calculation.
|
115
|
+
|
116
|
+
```ruby
|
117
|
+
(DateTime.now - 1).localize.ago # 1 day ago
|
118
|
+
(DateTime.now - 0.5).localize.ago # 12 hours ago (i.e. half a day)
|
119
|
+
|
120
|
+
(DateTime.now + 1).localize.until # In 1 day
|
121
|
+
(DateTime.now + 0.5).localize.until # In 12 hours
|
122
|
+
```
|
123
|
+
|
124
|
+
Specify other locales:
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
(DateTime.now - 1).localize(:de).ago # Vor 1 Tag
|
128
|
+
(DateTime.now + 1).localize(:de).until # In 1 Tag
|
129
|
+
```
|
130
|
+
|
131
|
+
Force TwitterCLDR to use a specific time unit by including the `:unit` option:
|
132
|
+
|
133
|
+
```ruby
|
134
|
+
(DateTime.now - 1).localize(:de).ago(:unit => :hour) # Vor 24 Stunden
|
135
|
+
(DateTime.now + 1).localize(:de).until(:unit => :hour) # In 24 Stunden
|
136
|
+
```
|
137
|
+
|
138
|
+
Specify a different reference point for the time span calculation:
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
# 86400 = 1 day in seconds, 259200 = 3 days in seconds
|
142
|
+
(Time.now + 86400).localize(:de).ago(:unit => :hour, :base_time => (Time.now + 259200)) # Vor 48 Stunden
|
143
|
+
```
|
144
|
+
|
145
|
+
Behind the scenes, these convenience methods are creating instances of `LocalizedTimespan`, whose constructor accepts a number of seconds as the first argument. You can do the same thing if you're feeling adventurous:
|
146
|
+
|
147
|
+
```ruby
|
148
|
+
ts = TwitterCldr::LocalizedTimespan.new(86400, :de)
|
149
|
+
ts.to_s # In 1 Tag
|
150
|
+
ts.to_s(:hour) # In 24 Stunden
|
151
|
+
|
152
|
+
ts = TwitterCldr::LocalizedTimespan.new(-86400, :de)
|
153
|
+
ts.to_s # Vor 1 Tag
|
154
|
+
ts.to_s(:hour) # Vor 24 Stunden
|
155
|
+
```
|
156
|
+
|
112
157
|
### Plural Rules
|
113
158
|
|
114
159
|
Some languages, like English, have "countable" nouns. You probably know this concept better as "plural" and "singular", i.e. the difference between "strawberry" and "strawberries". Other languages, like Russian, have three plural forms: one (numbers ending in 1), few (numbers ending in 2, 3, or 4), and many (everything else). Still other languages like Japanese don't use countable nouns at all.
|
@@ -233,7 +278,7 @@ TwitterCLDR provides ways to retrieve individual code points as well as normaliz
|
|
233
278
|
Retrieve data for code points:
|
234
279
|
|
235
280
|
```ruby
|
236
|
-
code_point = TwitterCldr::Shared::
|
281
|
+
code_point = TwitterCldr::Shared::CodePoint.for_hex("1F3E9")
|
237
282
|
code_point.name # "LOVE HOTEL"
|
238
283
|
code_point.bidi_mirrored # "N"
|
239
284
|
code_point.category # "So"
|
@@ -252,7 +297,7 @@ Convert code points to characters:
|
|
252
297
|
TwitterCldr::Utils::CodePoints.to_string(["00BF"]) # "¿"
|
253
298
|
```
|
254
299
|
|
255
|
-
Normalize/decompose a Unicode string (NFD, NFKD implementations available). Note that the normalized string will almost always look the same as the original string because most character display systems automatically combine decomposed characters.
|
300
|
+
Normalize/decompose a Unicode string (NFD, NFKD, NFC, and NFKC implementations available). Note that the normalized string will almost always look the same as the original string because most character display systems automatically combine decomposed characters.
|
256
301
|
|
257
302
|
```ruby
|
258
303
|
TwitterCldr::Normalizers::NFD.normalize("français") # "français"
|
@@ -28,14 +28,14 @@ module TwitterCldr
|
|
28
28
|
base_time = options[:base_time] || Time.now
|
29
29
|
seconds = self.to_time.base_obj.to_i - base_time.to_i
|
30
30
|
raise ArgumentError.new('Start date is after end date. Consider using "until" function.') if seconds > 0
|
31
|
-
TwitterCldr::
|
31
|
+
TwitterCldr::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
|
32
32
|
end
|
33
33
|
|
34
34
|
def until(options = {})
|
35
35
|
base_time = options[:base_time] || Time.now
|
36
36
|
seconds = self.to_time.base_obj.to_i - base_time.to_i
|
37
37
|
raise ArgumentError.new('End date is before start date. Consider using "ago" function.') if seconds < 0
|
38
|
-
TwitterCldr::
|
38
|
+
TwitterCldr::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
|
39
39
|
end
|
40
40
|
|
41
41
|
def to_s
|
@@ -4,23 +4,21 @@
|
|
4
4
|
# http://www.apache.org/licenses/LICENSE-2.0
|
5
5
|
|
6
6
|
module TwitterCldr
|
7
|
-
|
8
|
-
class LocalizedTimespan < LocalizedObject
|
7
|
+
class LocalizedTimespan < LocalizedObject
|
9
8
|
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
9
|
+
def initialize(seconds, locale)
|
10
|
+
@formatter = TwitterCldr::Formatters::TimespanFormatter.new(:locale => locale)
|
11
|
+
@seconds = seconds
|
12
|
+
end
|
14
13
|
|
15
|
-
|
16
|
-
|
17
|
-
|
14
|
+
def to_s(unit = :default)
|
15
|
+
@formatter.format(@seconds, unit)
|
16
|
+
end
|
18
17
|
|
19
|
-
|
18
|
+
protected
|
20
19
|
|
21
|
-
|
22
|
-
|
23
|
-
end
|
20
|
+
def formatter_const
|
21
|
+
TwitterCldr::Formatters::TimespanFormatter
|
24
22
|
end
|
25
23
|
end
|
26
24
|
end
|
@@ -5,7 +5,10 @@
|
|
5
5
|
|
6
6
|
module TwitterCldr
|
7
7
|
module Normalizers
|
8
|
+
autoload :Base, 'twitter_cldr/normalizers/base'
|
8
9
|
autoload :NFD, 'twitter_cldr/normalizers/nfd'
|
9
10
|
autoload :NFKD, 'twitter_cldr/normalizers/nfkd'
|
11
|
+
autoload :NFC, 'twitter_cldr/normalizers/nfc'
|
12
|
+
autoload :NFKC, 'twitter_cldr/normalizers/nfkc'
|
10
13
|
end
|
11
14
|
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
# Copyright 2012 Twitter, Inc
|
4
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
5
|
+
|
6
|
+
module TwitterCldr
|
7
|
+
module Normalizers
|
8
|
+
class Base
|
9
|
+
|
10
|
+
class << self
|
11
|
+
|
12
|
+
HANGUL_DECOMPOSITION_CONSTANTS = {
|
13
|
+
:SBase => 0xAC00,
|
14
|
+
:LBase => 0x1100,
|
15
|
+
:VBase => 0x1161,
|
16
|
+
:TBase => 0x11A7,
|
17
|
+
:LCount => 19,
|
18
|
+
:VCount => 21,
|
19
|
+
:TCount => 28,
|
20
|
+
:NCount => 588, # VCount * TCount
|
21
|
+
:SCount => 11172 # LCount * NCount
|
22
|
+
}
|
23
|
+
|
24
|
+
def combining_class_for(code_point)
|
25
|
+
TwitterCldr::Shared::CodePoint.for_hex(code_point).combining_class.to_i
|
26
|
+
rescue NoMethodError
|
27
|
+
0
|
28
|
+
end
|
29
|
+
|
30
|
+
end
|
31
|
+
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
# Copyright 2012 Twitter, Inc
|
4
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
5
|
+
|
6
|
+
module TwitterCldr
|
7
|
+
module Normalizers
|
8
|
+
|
9
|
+
# Implements normalization of a Unicode string to Normalization Form C (NFC).
|
10
|
+
# This normalization includes canonical decomposition followed by canonical composition.
|
11
|
+
#
|
12
|
+
class NFC < NFKC
|
13
|
+
|
14
|
+
class << self
|
15
|
+
|
16
|
+
def normalize_code_points(code_points)
|
17
|
+
compose(TwitterCldr::Normalizers::NFD.normalize_code_points(code_points))
|
18
|
+
end
|
19
|
+
|
20
|
+
end
|
21
|
+
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
@@ -7,7 +7,7 @@ module TwitterCldr
|
|
7
7
|
module Normalizers
|
8
8
|
|
9
9
|
# Implements normalization of a Unicode string to Normalization Form D (NFD).
|
10
|
-
# This normalization includes only
|
10
|
+
# This normalization includes only canonical decomposition.
|
11
11
|
#
|
12
12
|
class NFD < NFKD
|
13
13
|
|
@@ -0,0 +1,126 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
# Copyright 2012 Twitter, Inc
|
4
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
5
|
+
|
6
|
+
module TwitterCldr
|
7
|
+
module Normalizers
|
8
|
+
|
9
|
+
# Implements normalization of a Unicode string to Normalization Form KC (NFKC).
|
10
|
+
# This normalization form includes compatibility decomposition followed by compatibility composition.
|
11
|
+
#
|
12
|
+
class NFKC < Base
|
13
|
+
|
14
|
+
class << self
|
15
|
+
|
16
|
+
def normalize(string)
|
17
|
+
code_points = TwitterCldr::Utils::CodePoints.from_string(string)
|
18
|
+
normalized_code_points = normalize_code_points(code_points)
|
19
|
+
TwitterCldr::Utils::CodePoints.to_string(normalized_code_points)
|
20
|
+
end
|
21
|
+
|
22
|
+
def normalize_code_points(code_points)
|
23
|
+
compose(TwitterCldr::Normalizers::NFKD.normalize_code_points(code_points))
|
24
|
+
end
|
25
|
+
|
26
|
+
protected
|
27
|
+
|
28
|
+
def compose(code_points)
|
29
|
+
final = []
|
30
|
+
hangul_code_points = []
|
31
|
+
|
32
|
+
code_points.each_with_index do |code_point, index|
|
33
|
+
final << code_point
|
34
|
+
hangul_type = TwitterCldr::Shared::CodePoint.hangul_type(code_point)
|
35
|
+
next_hangul_type = TwitterCldr::Shared::CodePoint.hangul_type(code_points[index + 1])
|
36
|
+
|
37
|
+
if valid_hangul_sequence?(hangul_code_points.size, hangul_type)
|
38
|
+
hangul_code_points << code_point
|
39
|
+
unless valid_hangul_sequence?(hangul_code_points.size, next_hangul_type)
|
40
|
+
next_hangul_type = nil
|
41
|
+
end
|
42
|
+
else
|
43
|
+
hangul_code_points.clear
|
44
|
+
end
|
45
|
+
|
46
|
+
if hangul_code_points.size > 1 && !next_hangul_type
|
47
|
+
hangul_code_points.size.times { final.pop }
|
48
|
+
final << compose_hangul(hangul_code_points)
|
49
|
+
hangul_code_points.clear
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
compose_normal(final)
|
54
|
+
final
|
55
|
+
end
|
56
|
+
|
57
|
+
def valid_hangul_sequence?(buffer_size, hangul_type)
|
58
|
+
case [buffer_size, hangul_type]
|
59
|
+
when [0, :lparts], [1, :vparts], [2, :tparts]
|
60
|
+
true
|
61
|
+
else
|
62
|
+
false
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
# Special composition for Hangul syllables. Documented in Section 3.12 at
|
67
|
+
# http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf
|
68
|
+
#
|
69
|
+
def compose_hangul(code_points)
|
70
|
+
l_index = code_points.first.hex - HANGUL_DECOMPOSITION_CONSTANTS[:LBase]
|
71
|
+
v_index = code_points[1].hex - HANGUL_DECOMPOSITION_CONSTANTS[:VBase]
|
72
|
+
t_index = code_points[2] ? code_points[2].hex - HANGUL_DECOMPOSITION_CONSTANTS[:TBase] : 0 # tpart may be missing, that's ok
|
73
|
+
lv_index = (l_index * HANGUL_DECOMPOSITION_CONSTANTS[:NCount]) + (v_index * HANGUL_DECOMPOSITION_CONSTANTS[:TCount])
|
74
|
+
(HANGUL_DECOMPOSITION_CONSTANTS[:SBase] + lv_index + t_index).to_s(16).upcase.rjust(4, "0")
|
75
|
+
end
|
76
|
+
|
77
|
+
# Implements composition of Unicode code points following the guidelines here:
|
78
|
+
# http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf - Section 3.12
|
79
|
+
# Combining code points are combined with their base characters. For example, "ñ"
|
80
|
+
# can be decomposed into 006E 0303, one code point for the "n" and the "˜" respectively.
|
81
|
+
# Composition reverses this process, turning 006E 0303 into a single 00F1 code point.
|
82
|
+
#
|
83
|
+
def compose_normal(code_points)
|
84
|
+
index = 1
|
85
|
+
|
86
|
+
while index < code_points.size
|
87
|
+
code_point = code_points[index]
|
88
|
+
combining_class = combining_class_for(code_point)
|
89
|
+
starter_index = find_starter_index(index, code_points)
|
90
|
+
|
91
|
+
# is this character blocked from combining with the last starter?
|
92
|
+
if starter_index < index - 1
|
93
|
+
previous_combining_class = combining_class_for(code_points[index - 1])
|
94
|
+
blocked = (previous_combining_class == 0) || (previous_combining_class >= combining_class)
|
95
|
+
else
|
96
|
+
blocked = false
|
97
|
+
end
|
98
|
+
|
99
|
+
unless blocked
|
100
|
+
# do a reverse-lookup for the decomposed code points
|
101
|
+
decomp_data = TwitterCldr::Shared::CodePoint.for_decomposition([code_points[starter_index], code_point])
|
102
|
+
|
103
|
+
# check if two code points are canonically equivalent
|
104
|
+
if decomp_data && !decomp_data.excluded_from_composition?
|
105
|
+
# combine the characters
|
106
|
+
code_points[starter_index] = decomp_data.code_point
|
107
|
+
code_points.delete_at(index)
|
108
|
+
index -= 1
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
index += 1
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
def find_starter_index(start_pos, code_points)
|
117
|
+
start_pos.times do |i|
|
118
|
+
return start_pos - i - 1 if combining_class_for(code_points[start_pos - i - 1]) == 0
|
119
|
+
end
|
120
|
+
end
|
121
|
+
|
122
|
+
end
|
123
|
+
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
@@ -10,7 +10,11 @@ module TwitterCldr
|
|
10
10
|
# latest version at the moment (for Unicode 6.1) is available at http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf.
|
11
11
|
#
|
12
12
|
module Normalizers
|
13
|
-
|
13
|
+
|
14
|
+
# Implements normalization of a Unicode string to Normalization Form KD (NFKD).
|
15
|
+
# This normalization form includes only compatibility decomposition.
|
16
|
+
#
|
17
|
+
class NFKD < Base
|
14
18
|
|
15
19
|
class << self
|
16
20
|
|
@@ -27,16 +31,16 @@ module TwitterCldr
|
|
27
31
|
protected
|
28
32
|
|
29
33
|
def decomposition(code_points)
|
30
|
-
code_points.map{ |code_point| decompose_recursively(code_point) }.flatten
|
34
|
+
code_points.map { |code_point| decompose_recursively(code_point) }.flatten
|
31
35
|
end
|
32
36
|
|
33
37
|
# Recursively decomposes a given code point with the values in its Decomposition Mapping property.
|
34
38
|
#
|
35
39
|
def decompose_recursively(code_point)
|
36
|
-
unicode_data = TwitterCldr::Shared::
|
40
|
+
unicode_data = TwitterCldr::Shared::CodePoint.for_hex(code_point)
|
37
41
|
return code_point unless unicode_data
|
38
42
|
|
39
|
-
if unicode_data.
|
43
|
+
if unicode_data.hangul_type == :compositions
|
40
44
|
decompose_hangul(code_point)
|
41
45
|
else
|
42
46
|
decompose_regular(code_point, decomposition_mapping(unicode_data))
|
@@ -139,7 +143,7 @@ module TwitterCldr
|
|
139
143
|
end
|
140
144
|
|
141
145
|
def combining_class_for(code_point)
|
142
|
-
TwitterCldr::Shared::
|
146
|
+
TwitterCldr::Shared::CodePoint.for_hex(code_point).combining_class.to_i
|
143
147
|
rescue NoMethodError
|
144
148
|
0
|
145
149
|
end
|
@@ -148,18 +152,6 @@ module TwitterCldr
|
|
148
152
|
|
149
153
|
COMPATIBILITY_FORMATTING_TAG_REGEXP = /^<.*>$/
|
150
154
|
|
151
|
-
HANGUL_DECOMPOSITION_CONSTANTS = {
|
152
|
-
:SBase => 0xAC00,
|
153
|
-
:LBase => 0x1100,
|
154
|
-
:VBase => 0x1161,
|
155
|
-
:TBase => 0x11A7,
|
156
|
-
:LCount => 19,
|
157
|
-
:VCount => 21,
|
158
|
-
:TCount => 28,
|
159
|
-
:NCount => 588, # VCount * TCount
|
160
|
-
:Scount => 11172 # LCount * NCount
|
161
|
-
}
|
162
|
-
|
163
155
|
end
|
164
156
|
end
|
165
157
|
end
|
data/lib/twitter_cldr/shared.rb
CHANGED
@@ -10,6 +10,6 @@ module TwitterCldr
|
|
10
10
|
autoload :Languages, 'twitter_cldr/shared/languages'
|
11
11
|
autoload :Numbers, 'twitter_cldr/shared/numbers'
|
12
12
|
autoload :Resources, 'twitter_cldr/shared/resources'
|
13
|
-
autoload :
|
13
|
+
autoload :CodePoint, 'twitter_cldr/shared/code_point'
|
14
14
|
end
|
15
15
|
end
|
@@ -0,0 +1,116 @@
|
|
1
|
+
# encoding: UTF-8
|
2
|
+
|
3
|
+
# Copyright 2012 Twitter, Inc
|
4
|
+
# http://www.apache.org/licenses/LICENSE-2.0
|
5
|
+
|
6
|
+
module TwitterCldr
|
7
|
+
module Shared
|
8
|
+
|
9
|
+
CODE_POINT_FIELDS = [
|
10
|
+
:code_point,
|
11
|
+
:name,
|
12
|
+
:category,
|
13
|
+
:combining_class,
|
14
|
+
:bidi_class,
|
15
|
+
:decomposition,
|
16
|
+
:digit_value,
|
17
|
+
:non_decimal_digit_value,
|
18
|
+
:numeric_value,
|
19
|
+
:bidi_mirrored,
|
20
|
+
:unicode1_name,
|
21
|
+
:iso_comment,
|
22
|
+
:simple_uppercase_map,
|
23
|
+
:simple_lowercase_map,
|
24
|
+
:simple_titlecase_map
|
25
|
+
]
|
26
|
+
|
27
|
+
CodePoint = Struct.new(*CODE_POINT_FIELDS) do
|
28
|
+
DECOMPOSITION_DATA_INDEX = 5
|
29
|
+
|
30
|
+
def hangul_type
|
31
|
+
CodePoint.hangul_type(code_point)
|
32
|
+
end
|
33
|
+
|
34
|
+
def excluded_from_composition?
|
35
|
+
CodePoint.excluded_from_composition?(code_point)
|
36
|
+
end
|
37
|
+
|
38
|
+
class << self
|
39
|
+
|
40
|
+
def for_hex(code_point)
|
41
|
+
target = get_block(code_point.rjust(4, "0").upcase)
|
42
|
+
|
43
|
+
if target && target.first
|
44
|
+
block_data = TwitterCldr.get_resource(:unicode_data, target.first)
|
45
|
+
code_point_data = block_data.fetch(code_point.to_sym) { |code_point_sym| get_range_start(code_point_sym, block_data) }
|
46
|
+
CodePoint.new(*code_point_data) if code_point_data
|
47
|
+
else
|
48
|
+
nil
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
def for_decomposition(code_points)
|
53
|
+
@decomposition_map ||= TwitterCldr.get_resource(:unicode_data, :decomposition_map)
|
54
|
+
key = code_points.join(" ").to_sym
|
55
|
+
|
56
|
+
if @decomposition_map.include?(key)
|
57
|
+
for_hex(@decomposition_map[key])
|
58
|
+
else
|
59
|
+
nil
|
60
|
+
end
|
61
|
+
end
|
62
|
+
|
63
|
+
def hangul_type(code_point)
|
64
|
+
if code_point
|
65
|
+
code_point_int = code_point.hex
|
66
|
+
[:lparts, :vparts, :tparts, :compositions, :decompositions].each do |type|
|
67
|
+
hangul_blocks[type].each do |range|
|
68
|
+
return type if range.include?(code_point_int)
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
nil
|
73
|
+
end
|
74
|
+
|
75
|
+
def excluded_from_composition?(code_point)
|
76
|
+
code_point_int = code_point.hex
|
77
|
+
composition_exclusions.any? { |excl| excl.include?(code_point_int) }
|
78
|
+
end
|
79
|
+
|
80
|
+
protected
|
81
|
+
|
82
|
+
def hangul_blocks
|
83
|
+
@hangul_blocks ||= TwitterCldr.get_resource(:unicode_data, :blocks_hangul)
|
84
|
+
end
|
85
|
+
|
86
|
+
def composition_exclusions
|
87
|
+
@composition_exclusions ||= TwitterCldr.get_resource(:unicode_data, :composition_exclusions)
|
88
|
+
end
|
89
|
+
|
90
|
+
def get_block(code_point)
|
91
|
+
blocks = TwitterCldr.get_resource(:unicode_data, :blocks)
|
92
|
+
code_point_int = code_point.hex
|
93
|
+
|
94
|
+
# Find the target block
|
95
|
+
blocks.find do |block_name, range|
|
96
|
+
range.include?(code_point_int)
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
100
|
+
# Check if block constitutes a range. The code point beginning a range will have a name enclosed in <>, ending with 'First'
|
101
|
+
# eg: <CJK Ideograph Extension A, First>
|
102
|
+
# http://unicode.org/reports/tr44/#Code_Point_Ranges
|
103
|
+
def get_range_start(code_point, block_data)
|
104
|
+
start_code_point = block_data.keys.sort_by { |key| key.to_s.hex }.first
|
105
|
+
start_data = block_data[start_code_point].clone
|
106
|
+
if start_data[1] =~ /<.*, First>/
|
107
|
+
start_data[0] = code_point.to_s
|
108
|
+
start_data[1] = start_data[1].sub(', First', '')
|
109
|
+
start_data
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
end
|
114
|
+
end
|
115
|
+
end
|
116
|
+
end
|