twitter_cldr 1.3.6 → 1.4.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -109,6 +109,51 @@ dt = TwitterCldr::LocalizedDateTime.new(DateTime.now, :es)
109
109
  dt.to_short_s # ...etc
110
110
  ```
111
111
 
112
+ #### Relative Dates and Times
113
+
114
+ In addition to formatting full dates and times, TwitterCLDR supports relative time spans via several convenience methods and the `LocalizedTimespan` class. TwitterCLDR tries to guess the best time unit (eg. days, hours, minutes, etc) based on the length of the time span. Unless otherwise specified, TwitterCLDR will use the current date and time as the reference point for the calculation.
115
+
116
+ ```ruby
117
+ (DateTime.now - 1).localize.ago # 1 day ago
118
+ (DateTime.now - 0.5).localize.ago # 12 hours ago (i.e. half a day)
119
+
120
+ (DateTime.now + 1).localize.until # In 1 day
121
+ (DateTime.now + 0.5).localize.until # In 12 hours
122
+ ```
123
+
124
+ Specify other locales:
125
+
126
+ ```ruby
127
+ (DateTime.now - 1).localize(:de).ago # Vor 1 Tag
128
+ (DateTime.now + 1).localize(:de).until # In 1 Tag
129
+ ```
130
+
131
+ Force TwitterCLDR to use a specific time unit by including the `:unit` option:
132
+
133
+ ```ruby
134
+ (DateTime.now - 1).localize(:de).ago(:unit => :hour) # Vor 24 Stunden
135
+ (DateTime.now + 1).localize(:de).until(:unit => :hour) # In 24 Stunden
136
+ ```
137
+
138
+ Specify a different reference point for the time span calculation:
139
+
140
+ ```ruby
141
+ # 86400 = 1 day in seconds, 259200 = 3 days in seconds
142
+ (Time.now + 86400).localize(:de).ago(:unit => :hour, :base_time => (Time.now + 259200)) # Vor 48 Stunden
143
+ ```
144
+
145
+ Behind the scenes, these convenience methods are creating instances of `LocalizedTimespan`, whose constructor accepts a number of seconds as the first argument. You can do the same thing if you're feeling adventurous:
146
+
147
+ ```ruby
148
+ ts = TwitterCldr::LocalizedTimespan.new(86400, :de)
149
+ ts.to_s # In 1 Tag
150
+ ts.to_s(:hour) # In 24 Stunden
151
+
152
+ ts = TwitterCldr::LocalizedTimespan.new(-86400, :de)
153
+ ts.to_s # Vor 1 Tag
154
+ ts.to_s(:hour) # Vor 24 Stunden
155
+ ```
156
+
112
157
  ### Plural Rules
113
158
 
114
159
  Some languages, like English, have "countable" nouns. You probably know this concept better as "plural" and "singular", i.e. the difference between "strawberry" and "strawberries". Other languages, like Russian, have three plural forms: one (numbers ending in 1), few (numbers ending in 2, 3, or 4), and many (everything else). Still other languages like Japanese don't use countable nouns at all.
@@ -233,7 +278,7 @@ TwitterCLDR provides ways to retrieve individual code points as well as normaliz
233
278
  Retrieve data for code points:
234
279
 
235
280
  ```ruby
236
- code_point = TwitterCldr::Shared::UnicodeData.for_code_point("1F3E9")
281
+ code_point = TwitterCldr::Shared::CodePoint.for_hex("1F3E9")
237
282
  code_point.name # "LOVE HOTEL"
238
283
  code_point.bidi_mirrored # "N"
239
284
  code_point.category # "So"
@@ -252,7 +297,7 @@ Convert code points to characters:
252
297
  TwitterCldr::Utils::CodePoints.to_string(["00BF"]) # "¿"
253
298
  ```
254
299
 
255
- Normalize/decompose a Unicode string (NFD, NFKD implementations available). Note that the normalized string will almost always look the same as the original string because most character display systems automatically combine decomposed characters.
300
+ Normalize/decompose a Unicode string (NFD, NFKD, NFC, and NFKC implementations available). Note that the normalized string will almost always look the same as the original string because most character display systems automatically combine decomposed characters.
256
301
 
257
302
  ```ruby
258
303
  TwitterCldr::Normalizers::NFD.normalize("français") # "français"
@@ -28,14 +28,14 @@ module TwitterCldr
28
28
  base_time = options[:base_time] || Time.now
29
29
  seconds = self.to_time.base_obj.to_i - base_time.to_i
30
30
  raise ArgumentError.new('Start date is after end date. Consider using "until" function.') if seconds > 0
31
- TwitterCldr::Shared::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
31
+ TwitterCldr::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
32
32
  end
33
33
 
34
34
  def until(options = {})
35
35
  base_time = options[:base_time] || Time.now
36
36
  seconds = self.to_time.base_obj.to_i - base_time.to_i
37
37
  raise ArgumentError.new('End date is before start date. Consider using "ago" function.') if seconds < 0
38
- TwitterCldr::Shared::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
38
+ TwitterCldr::LocalizedTimespan.new(seconds, @locale).to_s(options[:unit])
39
39
  end
40
40
 
41
41
  def to_s
@@ -4,23 +4,21 @@
4
4
  # http://www.apache.org/licenses/LICENSE-2.0
5
5
 
6
6
  module TwitterCldr
7
- module Shared
8
- class LocalizedTimespan < LocalizedObject
7
+ class LocalizedTimespan < LocalizedObject
9
8
 
10
- def initialize(seconds, locale)
11
- @formatter = TwitterCldr::Formatters::TimespanFormatter.new(:locale => locale)
12
- @seconds = seconds
13
- end
9
+ def initialize(seconds, locale)
10
+ @formatter = TwitterCldr::Formatters::TimespanFormatter.new(:locale => locale)
11
+ @seconds = seconds
12
+ end
14
13
 
15
- def to_s(unit = :default)
16
- @formatter.format(@seconds, unit)
17
- end
14
+ def to_s(unit = :default)
15
+ @formatter.format(@seconds, unit)
16
+ end
18
17
 
19
- protected
18
+ protected
20
19
 
21
- def formatter_const
22
- TwitterCldr::Formatters::TimespanFormatter
23
- end
20
+ def formatter_const
21
+ TwitterCldr::Formatters::TimespanFormatter
24
22
  end
25
23
  end
26
24
  end
@@ -5,7 +5,10 @@
5
5
 
6
6
  module TwitterCldr
7
7
  module Normalizers
8
+ autoload :Base, 'twitter_cldr/normalizers/base'
8
9
  autoload :NFD, 'twitter_cldr/normalizers/nfd'
9
10
  autoload :NFKD, 'twitter_cldr/normalizers/nfkd'
11
+ autoload :NFC, 'twitter_cldr/normalizers/nfc'
12
+ autoload :NFKC, 'twitter_cldr/normalizers/nfkc'
10
13
  end
11
14
  end
@@ -0,0 +1,34 @@
1
+ # encoding: UTF-8
2
+
3
+ # Copyright 2012 Twitter, Inc
4
+ # http://www.apache.org/licenses/LICENSE-2.0
5
+
6
+ module TwitterCldr
7
+ module Normalizers
8
+ class Base
9
+
10
+ class << self
11
+
12
+ HANGUL_DECOMPOSITION_CONSTANTS = {
13
+ :SBase => 0xAC00,
14
+ :LBase => 0x1100,
15
+ :VBase => 0x1161,
16
+ :TBase => 0x11A7,
17
+ :LCount => 19,
18
+ :VCount => 21,
19
+ :TCount => 28,
20
+ :NCount => 588, # VCount * TCount
21
+ :SCount => 11172 # LCount * NCount
22
+ }
23
+
24
+ def combining_class_for(code_point)
25
+ TwitterCldr::Shared::CodePoint.for_hex(code_point).combining_class.to_i
26
+ rescue NoMethodError
27
+ 0
28
+ end
29
+
30
+ end
31
+
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,24 @@
1
+ # encoding: UTF-8
2
+
3
+ # Copyright 2012 Twitter, Inc
4
+ # http://www.apache.org/licenses/LICENSE-2.0
5
+
6
+ module TwitterCldr
7
+ module Normalizers
8
+
9
+ # Implements normalization of a Unicode string to Normalization Form C (NFC).
10
+ # This normalization includes canonical decomposition followed by canonical composition.
11
+ #
12
+ class NFC < NFKC
13
+
14
+ class << self
15
+
16
+ def normalize_code_points(code_points)
17
+ compose(TwitterCldr::Normalizers::NFD.normalize_code_points(code_points))
18
+ end
19
+
20
+ end
21
+
22
+ end
23
+ end
24
+ end
@@ -7,7 +7,7 @@ module TwitterCldr
7
7
  module Normalizers
8
8
 
9
9
  # Implements normalization of a Unicode string to Normalization Form D (NFD).
10
- # This normalization includes only Canonical Decomposition.
10
+ # This normalization includes only canonical decomposition.
11
11
  #
12
12
  class NFD < NFKD
13
13
 
@@ -0,0 +1,126 @@
1
+ # encoding: UTF-8
2
+
3
+ # Copyright 2012 Twitter, Inc
4
+ # http://www.apache.org/licenses/LICENSE-2.0
5
+
6
+ module TwitterCldr
7
+ module Normalizers
8
+
9
+ # Implements normalization of a Unicode string to Normalization Form KC (NFKC).
10
+ # This normalization form includes compatibility decomposition followed by compatibility composition.
11
+ #
12
+ class NFKC < Base
13
+
14
+ class << self
15
+
16
+ def normalize(string)
17
+ code_points = TwitterCldr::Utils::CodePoints.from_string(string)
18
+ normalized_code_points = normalize_code_points(code_points)
19
+ TwitterCldr::Utils::CodePoints.to_string(normalized_code_points)
20
+ end
21
+
22
+ def normalize_code_points(code_points)
23
+ compose(TwitterCldr::Normalizers::NFKD.normalize_code_points(code_points))
24
+ end
25
+
26
+ protected
27
+
28
+ def compose(code_points)
29
+ final = []
30
+ hangul_code_points = []
31
+
32
+ code_points.each_with_index do |code_point, index|
33
+ final << code_point
34
+ hangul_type = TwitterCldr::Shared::CodePoint.hangul_type(code_point)
35
+ next_hangul_type = TwitterCldr::Shared::CodePoint.hangul_type(code_points[index + 1])
36
+
37
+ if valid_hangul_sequence?(hangul_code_points.size, hangul_type)
38
+ hangul_code_points << code_point
39
+ unless valid_hangul_sequence?(hangul_code_points.size, next_hangul_type)
40
+ next_hangul_type = nil
41
+ end
42
+ else
43
+ hangul_code_points.clear
44
+ end
45
+
46
+ if hangul_code_points.size > 1 && !next_hangul_type
47
+ hangul_code_points.size.times { final.pop }
48
+ final << compose_hangul(hangul_code_points)
49
+ hangul_code_points.clear
50
+ end
51
+ end
52
+
53
+ compose_normal(final)
54
+ final
55
+ end
56
+
57
+ def valid_hangul_sequence?(buffer_size, hangul_type)
58
+ case [buffer_size, hangul_type]
59
+ when [0, :lparts], [1, :vparts], [2, :tparts]
60
+ true
61
+ else
62
+ false
63
+ end
64
+ end
65
+
66
+ # Special composition for Hangul syllables. Documented in Section 3.12 at
67
+ # http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf
68
+ #
69
+ def compose_hangul(code_points)
70
+ l_index = code_points.first.hex - HANGUL_DECOMPOSITION_CONSTANTS[:LBase]
71
+ v_index = code_points[1].hex - HANGUL_DECOMPOSITION_CONSTANTS[:VBase]
72
+ t_index = code_points[2] ? code_points[2].hex - HANGUL_DECOMPOSITION_CONSTANTS[:TBase] : 0 # tpart may be missing, that's ok
73
+ lv_index = (l_index * HANGUL_DECOMPOSITION_CONSTANTS[:NCount]) + (v_index * HANGUL_DECOMPOSITION_CONSTANTS[:TCount])
74
+ (HANGUL_DECOMPOSITION_CONSTANTS[:SBase] + lv_index + t_index).to_s(16).upcase.rjust(4, "0")
75
+ end
76
+
77
+ # Implements composition of Unicode code points following the guidelines here:
78
+ # http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf - Section 3.12
79
+ # Combining code points are combined with their base characters. For example, "ñ"
80
+ # can be decomposed into 006E 0303, one code point for the "n" and the "˜" respectively.
81
+ # Composition reverses this process, turning 006E 0303 into a single 00F1 code point.
82
+ #
83
+ def compose_normal(code_points)
84
+ index = 1
85
+
86
+ while index < code_points.size
87
+ code_point = code_points[index]
88
+ combining_class = combining_class_for(code_point)
89
+ starter_index = find_starter_index(index, code_points)
90
+
91
+ # is this character blocked from combining with the last starter?
92
+ if starter_index < index - 1
93
+ previous_combining_class = combining_class_for(code_points[index - 1])
94
+ blocked = (previous_combining_class == 0) || (previous_combining_class >= combining_class)
95
+ else
96
+ blocked = false
97
+ end
98
+
99
+ unless blocked
100
+ # do a reverse-lookup for the decomposed code points
101
+ decomp_data = TwitterCldr::Shared::CodePoint.for_decomposition([code_points[starter_index], code_point])
102
+
103
+ # check if two code points are canonically equivalent
104
+ if decomp_data && !decomp_data.excluded_from_composition?
105
+ # combine the characters
106
+ code_points[starter_index] = decomp_data.code_point
107
+ code_points.delete_at(index)
108
+ index -= 1
109
+ end
110
+ end
111
+
112
+ index += 1
113
+ end
114
+ end
115
+
116
+ def find_starter_index(start_pos, code_points)
117
+ start_pos.times do |i|
118
+ return start_pos - i - 1 if combining_class_for(code_points[start_pos - i - 1]) == 0
119
+ end
120
+ end
121
+
122
+ end
123
+
124
+ end
125
+ end
126
+ end
@@ -10,7 +10,11 @@ module TwitterCldr
10
10
  # latest version at the moment (for Unicode 6.1) is available at http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf.
11
11
  #
12
12
  module Normalizers
13
- class NFKD
13
+
14
+ # Implements normalization of a Unicode string to Normalization Form KD (NFKD).
15
+ # This normalization form includes only compatibility decomposition.
16
+ #
17
+ class NFKD < Base
14
18
 
15
19
  class << self
16
20
 
@@ -27,16 +31,16 @@ module TwitterCldr
27
31
  protected
28
32
 
29
33
  def decomposition(code_points)
30
- code_points.map{ |code_point| decompose_recursively(code_point) }.flatten
34
+ code_points.map { |code_point| decompose_recursively(code_point) }.flatten
31
35
  end
32
36
 
33
37
  # Recursively decomposes a given code point with the values in its Decomposition Mapping property.
34
38
  #
35
39
  def decompose_recursively(code_point)
36
- unicode_data = TwitterCldr::Shared::UnicodeData.for_code_point(code_point)
40
+ unicode_data = TwitterCldr::Shared::CodePoint.for_hex(code_point)
37
41
  return code_point unless unicode_data
38
42
 
39
- if unicode_data.name.include?('Hangul')
43
+ if unicode_data.hangul_type == :compositions
40
44
  decompose_hangul(code_point)
41
45
  else
42
46
  decompose_regular(code_point, decomposition_mapping(unicode_data))
@@ -139,7 +143,7 @@ module TwitterCldr
139
143
  end
140
144
 
141
145
  def combining_class_for(code_point)
142
- TwitterCldr::Shared::UnicodeData.for_code_point(code_point).combining_class.to_i
146
+ TwitterCldr::Shared::CodePoint.for_hex(code_point).combining_class.to_i
143
147
  rescue NoMethodError
144
148
  0
145
149
  end
@@ -148,18 +152,6 @@ module TwitterCldr
148
152
 
149
153
  COMPATIBILITY_FORMATTING_TAG_REGEXP = /^<.*>$/
150
154
 
151
- HANGUL_DECOMPOSITION_CONSTANTS = {
152
- :SBase => 0xAC00,
153
- :LBase => 0x1100,
154
- :VBase => 0x1161,
155
- :TBase => 0x11A7,
156
- :LCount => 19,
157
- :VCount => 21,
158
- :TCount => 28,
159
- :NCount => 588, # VCount * TCount
160
- :Scount => 11172 # LCount * NCount
161
- }
162
-
163
155
  end
164
156
  end
165
157
  end
@@ -10,6 +10,6 @@ module TwitterCldr
10
10
  autoload :Languages, 'twitter_cldr/shared/languages'
11
11
  autoload :Numbers, 'twitter_cldr/shared/numbers'
12
12
  autoload :Resources, 'twitter_cldr/shared/resources'
13
- autoload :UnicodeData, 'twitter_cldr/shared/unicode_data'
13
+ autoload :CodePoint, 'twitter_cldr/shared/code_point'
14
14
  end
15
15
  end
@@ -0,0 +1,116 @@
1
+ # encoding: UTF-8
2
+
3
+ # Copyright 2012 Twitter, Inc
4
+ # http://www.apache.org/licenses/LICENSE-2.0
5
+
6
+ module TwitterCldr
7
+ module Shared
8
+
9
+ CODE_POINT_FIELDS = [
10
+ :code_point,
11
+ :name,
12
+ :category,
13
+ :combining_class,
14
+ :bidi_class,
15
+ :decomposition,
16
+ :digit_value,
17
+ :non_decimal_digit_value,
18
+ :numeric_value,
19
+ :bidi_mirrored,
20
+ :unicode1_name,
21
+ :iso_comment,
22
+ :simple_uppercase_map,
23
+ :simple_lowercase_map,
24
+ :simple_titlecase_map
25
+ ]
26
+
27
+ CodePoint = Struct.new(*CODE_POINT_FIELDS) do
28
+ DECOMPOSITION_DATA_INDEX = 5
29
+
30
+ def hangul_type
31
+ CodePoint.hangul_type(code_point)
32
+ end
33
+
34
+ def excluded_from_composition?
35
+ CodePoint.excluded_from_composition?(code_point)
36
+ end
37
+
38
+ class << self
39
+
40
+ def for_hex(code_point)
41
+ target = get_block(code_point.rjust(4, "0").upcase)
42
+
43
+ if target && target.first
44
+ block_data = TwitterCldr.get_resource(:unicode_data, target.first)
45
+ code_point_data = block_data.fetch(code_point.to_sym) { |code_point_sym| get_range_start(code_point_sym, block_data) }
46
+ CodePoint.new(*code_point_data) if code_point_data
47
+ else
48
+ nil
49
+ end
50
+ end
51
+
52
+ def for_decomposition(code_points)
53
+ @decomposition_map ||= TwitterCldr.get_resource(:unicode_data, :decomposition_map)
54
+ key = code_points.join(" ").to_sym
55
+
56
+ if @decomposition_map.include?(key)
57
+ for_hex(@decomposition_map[key])
58
+ else
59
+ nil
60
+ end
61
+ end
62
+
63
+ def hangul_type(code_point)
64
+ if code_point
65
+ code_point_int = code_point.hex
66
+ [:lparts, :vparts, :tparts, :compositions, :decompositions].each do |type|
67
+ hangul_blocks[type].each do |range|
68
+ return type if range.include?(code_point_int)
69
+ end
70
+ end
71
+ end
72
+ nil
73
+ end
74
+
75
+ def excluded_from_composition?(code_point)
76
+ code_point_int = code_point.hex
77
+ composition_exclusions.any? { |excl| excl.include?(code_point_int) }
78
+ end
79
+
80
+ protected
81
+
82
+ def hangul_blocks
83
+ @hangul_blocks ||= TwitterCldr.get_resource(:unicode_data, :blocks_hangul)
84
+ end
85
+
86
+ def composition_exclusions
87
+ @composition_exclusions ||= TwitterCldr.get_resource(:unicode_data, :composition_exclusions)
88
+ end
89
+
90
+ def get_block(code_point)
91
+ blocks = TwitterCldr.get_resource(:unicode_data, :blocks)
92
+ code_point_int = code_point.hex
93
+
94
+ # Find the target block
95
+ blocks.find do |block_name, range|
96
+ range.include?(code_point_int)
97
+ end
98
+ end
99
+
100
+ # Check if block constitutes a range. The code point beginning a range will have a name enclosed in <>, ending with 'First'
101
+ # eg: <CJK Ideograph Extension A, First>
102
+ # http://unicode.org/reports/tr44/#Code_Point_Ranges
103
+ def get_range_start(code_point, block_data)
104
+ start_code_point = block_data.keys.sort_by { |key| key.to_s.hex }.first
105
+ start_data = block_data[start_code_point].clone
106
+ if start_data[1] =~ /<.*, First>/
107
+ start_data[0] = code_point.to_s
108
+ start_data[1] = start_data[1].sub(', First', '')
109
+ start_data
110
+ end
111
+ end
112
+
113
+ end
114
+ end
115
+ end
116
+ end