icu_name 0.1.4 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -23,50 +23,56 @@ To create a name object, supply both the first and second names separately to th
23
23
 
24
24
  Capitalisation, white space and punctuation will all be automatically corrected:
25
25
 
26
- robert.name # => 'Robert J. Fischer'
27
- robert.rname # => 'Fischer, Robert J.' (reversed name)
26
+ robert.name # => 'Robert J. Fischer'
27
+ robert.rname # => 'Fischer, Robert J.' (reversed name)
28
28
 
29
29
  The input text, without any changes apart from white-space cleanup and the insertion of a comma
30
- (to separate the two names), is returned by the _original_ method:
30
+ (to separate the two names), is returned by the <tt>original</tt> method:
31
31
 
32
- robert.original # => 'FISCHER, robert j'
32
+ robert.original # => 'FISCHER, robert j'
33
33
 
34
34
  To avoid ambiguity when either the first or second names consist of multiple words, it is better to
35
- supply the two separately, if known. However, the full name can be supplied alone to the constructor
36
- and a guess will be made as to the first and last names (the last distinct word becomes the last name).
35
+ supply the two separately. If the full name is supplied alone to the constructor, without any indication
36
+ of where the first names end, then the last distinct name is assumed to be the last name.
37
37
 
38
38
  bobby = ICU::Name.new(' bobby fischer ')
39
39
 
40
- bobby.first # => 'Bobby'
41
- bobby.last # => 'Fischer'
40
+ bobby.first # => 'Bobby'
41
+ bobby.last # => 'Fischer'
42
42
 
43
- But in this case, since the names were not supplied separately, the _original_ text will not contain a comma:
43
+ In this case, since the names were not supplied separately, the <tt>original</tt> text will not contain a comma:
44
44
 
45
- bobby.original # => 'bobby fischer'
45
+ bobby.original # => 'bobby fischer'
46
46
 
47
47
  Names will match even if one is missing middle initials or if a nickname is used for one of the first names.
48
48
 
49
- bobby.match('Robert J.', 'Fischer') # => true
49
+ bobby.match('Robert J.', 'Fischer') # => true
50
50
 
51
- Note that the class is aware of only common nicknames (e.g. _Bobby_ and _Robert_, _Bill_ and _William_, etc)
52
- and not all possibilities.
51
+ The method <tt>alternatives</tt> can be used to list alternatives to a given first or last name:
53
52
 
54
- Supplying the _match_ method with strings is equivalent to instantiating a Name instance with the same
53
+ Name.new('Stephen', 'Orr').alternatives(:first) # => ["Steve"]
54
+ Name.new('Michael Stephen', 'Orr').alternatives(:first) # => ["Steve", "Mike", "Mick", "Mikey"],
55
+ Name.new('Mark', 'Orr').alternatives(:first) # => []
56
+
57
+ By default the class is only aware of a few common alternatives for first names (e.g. _Bobby_ and _Robert_,
58
+ _Bill_ and _William_, etc). However, this can be customized (see below).
59
+
60
+ Supplying the <tt>match</tt> method with strings is equivalent to instantiating an instance with the same
55
61
  strings and then matching it. So, for example the following are equivalent:
56
62
 
57
- robert.match('R.', 'Fischer') # => true
58
- robert.match(ICU::Name.new('R.', 'Fischer')) # => true
63
+ robert.match('R.', 'Fischer') # => true
64
+ robert.match(ICU::Name.new('R.', 'Fischer')) # => true
59
65
 
60
- The inital _R_, for example, matches the first letter of _Robert_. However, nickname matches will not
61
- always work with initials. In the next example, the initial _R_ does not match the first letter _B_ of the
62
- nickname _Bobby_.
66
+ Here the inital _R_ matches the first letter of _Robert_. However, nickname matches will not
67
+ always work with initials. In the next example, the initial _R_ does not match the first letter
68
+ _B_ of the nickname _Bobby_.
63
69
 
64
- bobby.match('R. J.', 'Fischer') # => false
70
+ bobby.match('R. J.', 'Fischer') # => false
65
71
 
66
- Some of the ways last names are canonicalised are illustrated below:
72
+ Some other ways last names are canonicalised are illustrated below:
67
73
 
68
- ICU::Name.new('John', 'O Reilly').last # => "O'Reilly"
69
- ICU::Name.new('dave', 'mcmanus').last # => "McManus"
74
+ ICU::Name.new('John', 'O Reilly').last # => "O'Reilly, John"
75
+ ICU::Name.new('dave', 'mcmanus').last # => "McManus, Dave"
70
76
 
71
77
  == Characters and Encoding
72
78
 
@@ -76,40 +82,127 @@ Along with hyphens and single quotes (which represent apostophes) letters in ISO
76
82
  character plus one or more diacritics (e.g. "ł" or "Ś") are preserved, while everything
77
83
  else is removed.
78
84
 
79
- ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
80
- ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
81
- ICU::Name.new(' 渡井美代子').name # => ""
85
+ ICU::Name.new('éric', 'PRIÉ').name # => "Éric Prié"
86
+ ICU::Name.new('BARTŁOMIEJ', 'śliwa').name # => "Bartłomiej Śliwa"
87
+ ICU::Name.new('Սմբատ', 'Լպուտյան').name # => ""
82
88
 
83
- The various accessors (_first_, _last_, _name_, _rname_, _to_s_, _original_) always return
89
+ The various accessors (<tt>first</tt>, <tt>last</tt>, <tt>name</tt>, <tt>rname</tt>, <tt>to_s</tt>, <tt>original</tt>) always return
84
90
  strings encoded in UTF-8, no matter what the input encoding.
85
91
 
86
92
  eric = ICU::Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
87
- eric.rname # => "Prié, Éric"
88
- eric.rname.encoding.name # => "UTF-8"
89
- eric.original # => "PRIÉ, éric"
90
- eric.original.encoding.name # => "UTF-8"
93
+ eric.rname # => "Prié, Éric"
94
+ eric.rname.encoding.name # => "UTF-8"
95
+ eric.original # => "PRIÉ, éric"
96
+ eric.original.encoding.name # => "UTF-8"
91
97
 
92
98
  Accented letters can be transliterated into their US-ASCII counterparts by setting the
93
- _chars_ option, which is available in all accessors. For example:
99
+ <tt>:chars</tt> option, which is available in all accessors. For example:
94
100
 
95
- eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
96
- eric.original(:chars => "US-ASCII") # => "PRIE, eric"
101
+ eric.rname(:chars => "US-ASCII") # => "Prie, Eric"
102
+ eric.original(:chars => "US-ASCII") # => "PRIE, eric"
97
103
 
98
104
  Also possible is the preservation of ISO-8859-1 characters, but the transliteration of
99
105
  all other accented characters:
100
106
 
101
107
  joe = Name.new('Józef', 'Żabiński')
102
- joe.rname # => "Żabiński, Józef"
103
- joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
104
- joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
108
+ joe.rname # => "Żabiński, Józef"
109
+ joe.rname(:chars => "ISO-8859-1") # => "Zabinski, Józef"
110
+ joe.rname(:chars => "US-ASCII") # => "Zabinski, Jozef"
105
111
 
106
112
  Note that the character encoding of the strings returned is still UTF-8 in all cases.
107
113
  The same option also relaxes the need for accented characters to match exactly:
108
114
 
109
- eric.match('Eric', 'Prie') # => false
110
- eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
111
- joe.match('Józef', 'Zabinski') # => false
112
- joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
115
+ eric.match('Eric', 'Prie') # => false
116
+ eric.match('Eric', 'Prie', :chars => "US-ASCII") # => true
117
+ joe.match('Józef', 'Zabinski') # => false
118
+ joe.match('Józef', 'Zabinski', :chars => "ISO-8859-1") # => true
119
+
120
+ == Customization of Alternative Names
121
+
122
+ We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
123
+ matcher is aware of some common English nicknames. These name alternatives can be
124
+ customised to handle additional nick names and other types of alternative names
125
+ such as common spelling mistakes and name changes.
126
+
127
+ The alternative names are specified in two YAML files, one for first names and
128
+ one for last names. Each YAML file represents an array and each element in the
129
+ array is an array representing a set of alternative names. Here, for example,
130
+ are some of the default first name alternatives:
131
+
132
+ [Anthony, Tony]
133
+ [James, Jim, Jimmy]
134
+ [Michael, Mike, Mick, Mikey]
135
+ [Robert, Bob, Bobby]
136
+ [Stephen, Steve]
137
+ [Steven, Steve]
138
+ [Thomas, Tom, Tommy]
139
+ [William, Will, Willy, Willie, Bill]
140
+
141
+ The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
142
+
143
+ Name.new("Tony", "Miles").match("Anthony", "Miles") # => true
144
+
145
+ Note that both _Steven_ and _Stephen_ match _Steve_ but, because they don't occur in the
146
+ same group, they don't match each other.
147
+
148
+ Name.new("Steven", "Hanly").match("Steve", "Hanly") # => true
149
+ Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
150
+ Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
151
+
152
+ To customize alternative name behaviour, prepare YAML files with your chosen alternatives
153
+ and then replace the default alternatives like this:
154
+
155
+ Name.load_alternatives(:first, "my_first_name_alternatives.yaml")
156
+ Name.load_alternatives(:last, "my_last_name_alternatives.yaml")
157
+
158
+ An example of one way in which you might want to customize the alternatives is to
159
+ cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
160
+ don't match by default, but you can make them so by replacing the two default rules:
161
+
162
+ [Stephen, Steve]
163
+ [Steven, Steve]
164
+
165
+ with the following single rule:
166
+
167
+ [Stephen, Steven, Steve]
168
+
169
+ so that now:
170
+
171
+ Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
172
+
173
+ Another use is to cater for English and Irish versions of the same name. For example,
174
+ for last names:
175
+
176
+ [Murphy, Murchadha]
177
+
178
+ or for first names, including spelling variations:
179
+
180
+ [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
181
+
182
+ == Conditional Alternatives
183
+
184
+ Normally, entries in the two YAML files are just lists of alternative names. There is one
185
+ exception to this however, when one of the entries (it doesn't matter which one but,
186
+ by convention, the last one) is a regular expression. Here is an example that might
187
+ be added to the last name alternatives:
188
+
189
+ [Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/]
190
+
191
+ What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
192
+ first name matches the regular expression.
193
+
194
+ Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
195
+ Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
196
+
197
+ Another example, this time for first names, is:
198
+
199
+ [Sean, John, !ruby/regexp /^Bradley$/]
200
+
201
+ This caters for an individual who is known by two normally unrelated first names.
202
+ We only want these two names to match for that individual and no others.
203
+
204
+ Name.new("John", "Bradley").match("Sean", "Bradley") # => true
205
+ Name.new("John", "Alfred").match("Sean", "Alfred") # => false
113
206
 
114
207
  == Author
115
208
 
@@ -0,0 +1,36 @@
1
+ ---
2
+ - [Alexander, Alex]
3
+ - [Andrew, Andy]
4
+ - [Anthony, Tony]
5
+ - [Benjamin, Ben]
6
+ - [Catherine, Cathy, Cath]
7
+ - [Daniel, Danny, Dan]
8
+ - [David, Dave]
9
+ - [Deborah, Debbie]
10
+ - [Des, Desmond]
11
+ - [Edward, Eddie, Eddy, Ed]
12
+ - [Frederick, Fred]
13
+ - [Frederic, Fred]
14
+ - [Gerald, Gerry]
15
+ - [Gerard, Gerry]
16
+ - [James, Jim, Jimmy]
17
+ - [John, Johnny]
18
+ - [Jonathan, Jon]
19
+ - [Kenneth, Ken, Kenny]
20
+ - [Michael, Mike, Mick, Mikey]
21
+ - [Nic, Nick, Nicolas]
22
+ - [Nicola, Nickie, Nicky]
23
+ - [Patrick, Pat]
24
+ - [Patricia, Patty, Pat]
25
+ - [Peter, Pete]
26
+ - [Philip, Phil]
27
+ - [Phillip, Phil]
28
+ - [Rick, Ricky]
29
+ - [Robert, Bob, Bobby]
30
+ - [Samual, Sam]
31
+ - [Samuel, Sam]
32
+ - [Stephen, Steve]
33
+ - [Steven, Steve]
34
+ - [Terence, Terry]
35
+ - [Thomas, Tom, Tommy]
36
+ - [William, Will, Willy, Willie, Bill]
@@ -0,0 +1 @@
1
+ --- []
@@ -0,0 +1,41 @@
1
+ ---
2
+ - [Abdul, Abul]
3
+ - [Alexander, Alex]
4
+ - [Anandagopal, Ananda]
5
+ - [Andrew, Andy]
6
+ - [Anne, Ann]
7
+ - [Anthony, Tony]
8
+ - [Benjamin, Ben]
9
+ - [Catherine, Cathy, Cath]
10
+ - [Daniel, Danial, Danny, Dan]
11
+ - [David, Dave]
12
+ - [Deborah, Debbie]
13
+ - [Des, Desmond]
14
+ - [Eamonn, Eamon]
15
+ - [Edward, Eddie, Eddy, Ed]
16
+ - [Eric, Erick, Erik]
17
+ - [Frederick, Frederic, Fred]
18
+ - [Gerald, Gerry]
19
+ - [Gerhard, Gerard, Ger, Gerry]
20
+ - [James, Jim, Jimmy]
21
+ - [Joanna, Joan, Joanne]
22
+ - [John, Johnny]
23
+ - [Jonathan, Jon]
24
+ - [Kenneth, Ken, Kenny]
25
+ - [Michael, Mike, Mick, Micky, Mickie, Mikey]
26
+ - [Nicholas, Nick, Nicolas]
27
+ - [Nicola, Nickie, Nicky]
28
+ - [Patrick, Pat, Paddy, Padraig, Padraic, Padhraig, Padhraic]
29
+ - [Patricia, Paddy, Patty, Pat]
30
+ - [Peter, Pete]
31
+ - [Philippe, Philip, Phillippe, Phillip]
32
+ - [Rick, Ricky]
33
+ - [Robert, Bob, Bobby]
34
+ - [Samual, Sam, Samuel]
35
+ - [Stef, Stefan, Stephan, Stefen, Stephen]
36
+ - [Steffy, Stefanie, Stephanie, Stefenie, Stephenie]
37
+ - [Stephen, Steve, Steven]
38
+ - [Terence, Terry]
39
+ - [Thomas, Tom, Tommy]
40
+ - [William, Will, Willy, Willie, Bill]
41
+ - [Sean, John, !ruby/regexp /^Bradley$/]
@@ -0,0 +1,5 @@
1
+ ---
2
+ - [Ffrench, French]
3
+ - [Murchadha, Murphy]
4
+ - [Quinn, Benjamin, !ruby/regexp /^(Debbie|Deborah)$/]
5
+ - ["O'Siochru", King, !ruby/regexp /^Mairead$/]
data/lib/icu_name/name.rb CHANGED
@@ -5,31 +5,42 @@ require 'active_support/core_ext/string/multibyte'
5
5
 
6
6
  module ICU
7
7
  class Name
8
+ # Revert to the default sets of alternative names.
9
+ def self.reset_alternatives
10
+ @@alts = Hash.new
11
+ @@cmps = Hash.new
12
+ end
13
+
14
+ # Perform a reset when the class is first loaded.
15
+ self.reset_alternatives
8
16
 
9
- # Construct from one or two strings or any objects that have a to_s method.
17
+ # Construct a new name from one or two strings or any objects that have a to_s method.
10
18
  def initialize(name1='', name2='')
11
19
  @name1 = Util.to_utf8(name1.to_s)
12
20
  @name2 = Util.to_utf8(name2.to_s)
13
21
  originalize
14
22
  canonicalize
23
+ @first.freeze
24
+ @last.freeze
25
+ @original.freeze
15
26
  end
16
-
27
+
17
28
  # Original text getter.
18
29
  def original(opts={})
19
30
  return transliterate(@original, opts[:chars]) if opts[:chars]
20
- @original
31
+ @original.dup
21
32
  end
22
33
 
23
34
  # First name getter.
24
35
  def first(opts={})
25
36
  return transliterate(@first, opts[:chars]) if opts[:chars]
26
- @first
37
+ @first.dup
27
38
  end
28
39
 
29
40
  # Last name getter.
30
41
  def last(opts={})
31
42
  return transliterate(@last, opts[:chars]) if opts[:chars]
32
- @last
43
+ @last.dup
33
44
  end
34
45
 
35
46
  # Return a complete name, first name first, no comma.
@@ -50,7 +61,7 @@ module ICU
50
61
  name
51
62
  end
52
63
 
53
- # Convert object to a string.
64
+ # Convert to a string (same as rname).
54
65
  def to_s(opts={})
55
66
  rname(opts)
56
67
  end
@@ -61,6 +72,17 @@ module ICU
61
72
  match_first(first(opts), other.first(opts)) && match_last(last(opts), other.last(opts))
62
73
  end
63
74
 
75
+ # Load a set of first or last name alternatives. If the YAML file name is absent,
76
+ # the default set is loaded. <tt>type</tt> should be <tt>:first</tt> or <tt>:last</tt>.
77
+ def self.load_alternatives(type, file=nil)
78
+ compile_alts(check_type(type), file, true)
79
+ end
80
+
81
+ # Show first name or last name alternatives.
82
+ def alternatives(type)
83
+ get_alts(check_type(type))
84
+ end
85
+
64
86
  # :stopdoc:
65
87
  private
66
88
 
@@ -70,7 +92,7 @@ module ICU
70
92
  @original.strip!
71
93
  @original.gsub!(/\s+/, ' ')
72
94
  end
73
-
95
+
74
96
  # Transliterate characters to ASCII or Latin1.
75
97
  def transliterate(str, chars='US-ASCII')
76
98
  case chars
@@ -154,6 +176,10 @@ module ICU
154
176
  names
155
177
  end
156
178
 
179
+ # Check the type argument to the public methods.
180
+ def check_type(type) self.class.instance_eval { check_type(type) }; end
181
+ def self.check_type(type) type = type.to_s == "last" ? :last : :first; end
182
+
157
183
  # Match a complete first name.
158
184
  def match_first(first1, first2)
159
185
  # Is this one a walk in the park?
@@ -166,8 +192,9 @@ module ICU
166
192
  # Get the long list and the short list.
167
193
  long, short = first1.size >= first2.size ? [first1, first2] : [first2, first1]
168
194
 
169
- # The short one must be a "subset" of the long one.
170
- # An extra condition must also be satisfied.
195
+ # The short one must be a "subset" of the long one. An extra condition must also be satisfied:
196
+ # either there has to be at least one match not involving initials or the first names must match.
197
+ # For example "M. J." matches "Mark" but not "John".
171
198
  extra = false
172
199
  (0..long.size-1).each do |i|
173
200
  lword = long.shift
@@ -186,6 +213,7 @@ module ICU
186
213
  # Match a complete last name.
187
214
  def match_last(last1, last2)
188
215
  return true if last1 == last2
216
+ return true if match_alt(:last, last1, last2)
189
217
  [last1, last2].each do |last|
190
218
  last.downcase! # case insensitive
191
219
  last.gsub!(/\bmac/, 'mc') # MacDonaugh and McDonaugh
@@ -211,74 +239,73 @@ module ICU
211
239
  initials = 0
212
240
  initials+= 1 if first1.match(/^[A-Z\u{c0}-\u{de}]\.?$/)
213
241
  initials+= 1 if first2.match(/^[A-Z\u{c0}-\u{de}]\.?$/)
214
- return initials if first1 == first2
215
- return 0 if initials == 0 && match_nick_name(first1, first2)
216
- return -1 unless initials > 0
217
- return initials if first1[0] == first2[0]
242
+ return initials if first1 == first2 # "W." and "W." or "William" and "William"
243
+ return 0 if initials == 0 && match_alt(:first, first1, first2) # "William"" and "Bill"
244
+ return -1 unless initials > 0 # "William" and "Patricia"
245
+ return initials if first1[0] == first2[0] # "W." and "William" or "W." and "W"
218
246
  -1
219
247
  end
220
248
 
221
- # Match two first names that might be equivalent nicknames.
222
- def match_nick_name(nick1, nick2)
223
- compile_nick_names unless @@nc
224
- code1 = @@nc[nick1]
225
- return false unless code1
226
- code1 == @@nc[nick2]
249
+ # Match two names that might be equivalent due to nicknames, misspellings, changed married names etc.
250
+ def match_alt(type, nam1, nam2)
251
+ self.class.compile_alts(type)
252
+ return false unless nams = @@alts[type][nam1]
253
+ return false unless cond = nams[nam2]
254
+ return true if cond == true
255
+ cond.match(type == :first ? @last : @first)
227
256
  end
228
257
 
229
- # Compile the nick names code hash when matching nick names is first attempted.
230
- def compile_nick_names
231
- @@nc = Hash.new
258
+ # Return an array of alternative first or second names (not including the original name).
259
+ # Allow for double barrelled last names or multiple first names.
260
+ def get_alts(type)
261
+ self.class.compile_alts(type)
262
+ name = self.send(type)
263
+ names = name.split(/[- ]/)
264
+ names.push(name) if names.length > 1
265
+ target = type == :first ? @last : @first
266
+ alts = Array.new
267
+ names.each do |n|
268
+ next unless @@alts[type][n]
269
+ @@alts[type][n].each_pair do |k, v|
270
+ alts.push k if v == true || v.match(target)
271
+ end
272
+ end
273
+ alts
274
+ end
275
+
276
+ # Compile an alternative names hash (for either first names or last names) before matching is first attempted.
277
+ def self.compile_alts(type, file=nil, force=false)
278
+ return if @@alts[type] && !force
279
+ file ||= File.expand_path(File.dirname(__FILE__) + "/../../config/#{type}_alternatives.yaml")
280
+ data = YAML.load(File.open file)
281
+ @@cmps[type] ||= 0
282
+ @@alts[type] = Hash.new
232
283
  code = 1
233
- @@nl.each do |nicks|
234
- nicks.each do |n|
235
- throw "duplicate name #{n}" if @@nc[n]
236
- @@nc[n] = code
284
+ data.each do |alts|
285
+ cond = true
286
+ alts.reject! do |a|
287
+ if a.instance_of?(Regexp)
288
+ cond = a
289
+ else
290
+ false
291
+ end
292
+ end
293
+ alts.each do |name|
294
+ alts.each do |other|
295
+ unless other == name
296
+ @@alts[type][name] ||= Hash.new
297
+ @@alts[type][name][other] = cond
298
+ end
299
+ end
237
300
  end
238
301
  code+= 1
239
302
  end
303
+ @@cmps[type] += 1
240
304
  end
241
305
 
242
- # A array of data for matching nicknames and also a few common misspellings.
243
- @@nc = nil
244
- @@nl = <<EOF.split(/\n/).reject{|x| x.length == 0 }.map{|x| x.split(' ')}
245
- Abdul Abul
246
- Alexander Alex
247
- Anandagopal Ananda
248
- Andrew Andy
249
- Anne Ann
250
- Anthony Tony
251
- Benjamin Ben
252
- Catherine Cathy Cath
253
- Daniel Danial Danny Dan
254
- David Dave
255
- Deborah Debbie
256
- Des Desmond
257
- Eamonn Eamon
258
- Edward Eddie Ed
259
- Eric Erick Erik
260
- Frederick Frederic Fred
261
- Gerald Gerry
262
- Gerhard Gerard Ger
263
- James Jim
264
- Joanna Joan Joanne
265
- John Johnny
266
- Jonathan Jon
267
- Kenneth Ken Kenny
268
- Michael Mike Mick Micky
269
- Nicholas Nick Nicolas
270
- Nicola Nickie Nicky
271
- Patrick Pat Paddy
272
- Peter Pete
273
- Philippe Philip Phillippe Phillip
274
- Rick Ricky
275
- Robert Bob Bobby
276
- Samual Sam Samuel
277
- Stefanie Stef
278
- Stephen Steven Steve
279
- Terence Terry
280
- Thomas Tom Tommy
281
- William Will Willy Willie Bill
282
- EOF
306
+ # Return the number of YAML file compilations (for testing).
307
+ def self.alt_compilations(type)
308
+ @@cmps[check_type(type)] || 0
309
+ end
283
310
  end
284
311
  end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module ICU
4
4
  class Name
5
- VERSION = "0.1.4"
5
+ VERSION = "1.0.0"
6
6
  end
7
7
  end
data/spec/name_spec.rb CHANGED
@@ -3,6 +3,17 @@ require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
3
3
 
4
4
  module ICU
5
5
  describe Name do
6
+ def load_alt_test(*types)
7
+ types.each do |type|
8
+ file = File.expand_path(File.dirname(__FILE__) + "/../config/test_#{type}_alts.yaml")
9
+ Name.load_alternatives(type, file)
10
+ end
11
+ end
12
+
13
+ def alt_compilations(type)
14
+ Name.alt_compilations(type)
15
+ end
16
+
6
17
  context "public methods" do
7
18
  before(:each) do
8
19
  @simple = Name.new('mark j l', 'ORR')
@@ -68,7 +79,7 @@ module ICU
68
79
  it "characters and encoding" do
69
80
  ICU::Name.new('éric', 'PRIÉ').name.should == "Éric Prié"
70
81
  ICU::Name.new('BARTŁOMIEJ', 'śliwa').name.should == "Bartłomiej Śliwa"
71
- ICU::Name.new(' 渡井美代子').name.should == ""
82
+ ICU::Name.new('Սմբատ', 'Լպուտյան').name.should == ""
72
83
  eric = Name.new('éric'.encode("ISO-8859-1"), 'PRIÉ'.force_encoding("ASCII-8BIT"))
73
84
  eric.rname.should == "Prié, Éric"
74
85
  eric.rname.encoding.name.should == "UTF-8"
@@ -244,7 +255,7 @@ module ICU
244
255
  @opt = { :chars => "US-ASCII" }
245
256
  end
246
257
 
247
- it "should be a no-op for names that already ASCII" do
258
+ it "should be a no-op for names that are already ASCII" do
248
259
  name = Name.new('Mark J. L.', 'Orr')
249
260
  name.first(@opt).should == 'Mark J. L.'
250
261
  name.last(@opt).should == 'Orr'
@@ -325,6 +336,21 @@ module ICU
325
336
  Name.new('Mick', 'Orr').match('Mike', 'Orr').should be_true
326
337
  end
327
338
 
339
+ it "should handle ambiguous nicknames" do
340
+ Name.new('Gerry', 'Orr').match('Gerald', 'Orr').should be_true
341
+ Name.new('Gerry', 'Orr').match('Gerard', 'Orr').should be_true
342
+ Name.new('Gerard', 'Orr').match('Gerald', 'Orr').should be_false
343
+ end
344
+
345
+ it "should by default be cautious about misspellings" do
346
+ Name.new('Steven', 'Brady').match('Stephen', 'Brady').should be_false
347
+ Name.new('Philip', 'Short').match('Phillip', 'Short').should be_false
348
+ end
349
+
350
+ it "should by default have no conditional matches" do
351
+ Name.new('Sean', 'Bradley').match('John', 'Bradley').should be_false
352
+ end
353
+
328
354
  it "should not mix up nick names" do
329
355
  Name.new('David', 'Orr').match('Bill', 'Orr').should be_false
330
356
  end
@@ -343,6 +369,11 @@ module ICU
343
369
  Name.new('Alan', 'McDonagh').match('Alan', 'MacDonagh').should be_true
344
370
  Name.new('Darko', 'Polimac').match('Darko', 'Polimc').should be_false
345
371
  end
372
+
373
+ it "should by defaut have no conditional matches" do
374
+ Name.new('Debbie', 'Quinn').match('Debbie', 'Benjamin').should be_false
375
+ Name.new('Mairead', "O'Siochru").match('Mairead', 'King').should be_false
376
+ end
346
377
  end
347
378
 
348
379
  context "matches involving accented characters" do
@@ -361,5 +392,173 @@ module ICU
361
392
  Name.new('Èric-K.', 'Cantona').match('E. K.', 'Cantona', :chars => "US-ASCII").should be_true
362
393
  end
363
394
  end
395
+
396
+ context "configuring new first name alternatives" do
397
+ before(:all) do
398
+ load_alt_test(:first)
399
+ end
400
+
401
+ it "should match some spelling errors" do
402
+ Name.new('Steven', 'Brady').match('Stephen', 'Brady').should be_true
403
+ Name.new('Philip', 'Short').match('Phillip', 'Short').should be_true
404
+ end
405
+
406
+ it "should handle conditional matches" do
407
+ Name.new('Sean', 'Collins').match('John', 'Collins').should be_false
408
+ Name.new('Sean', 'Bradley').match('John', 'Bradley').should be_true
409
+ end
410
+ end
411
+
412
+ context "configuring new last name alternatives" do
413
+ before(:all) do
414
+ load_alt_test(:last)
415
+ end
416
+
417
+ it "should match some spelling errors" do
418
+ Name.new('William', 'Ffrench').match('William', 'French').should be_true
419
+ end
420
+
421
+ it "should handle conditional matches" do
422
+ Name.new('Mark', 'Quinn').match('Mark', 'Benjamin').should be_false
423
+ Name.new('Debbie', 'Quinn').match('Debbie', 'Benjamin').should be_true
424
+ Name.new('Oisin', "O'Siochru").match('Oisin', 'King').should be_false
425
+ Name.new('Mairead', "O'Siochru").match('Mairead', 'King').should be_true
426
+ end
427
+
428
+ it "should allow some awesome matches" do
429
+ Name.new('debbie quinn').match('Deborah', 'Benjamin').should be_true
430
+ Name.new('french, william').match('Bill', 'Ffrench').should be_true
431
+ Name.new('Oissine', 'Murphy').match('Oissine', 'Murchadha').should be_true
432
+ end
433
+ end
434
+
435
+ context "configuring new first and new last name alternatives" do
436
+ before(:all) do
437
+ load_alt_test(:first, :last)
438
+ end
439
+
440
+ it "should allow some awesome matches" do
441
+ Name.new('french, steven').match('Stephen', 'Ffrench').should be_true
442
+ Name.new('Patrick', 'Murphy').match('Padraic', 'Murchadha').should be_true
443
+ end
444
+ end
445
+
446
+ context "reverting to the default configuration" do
447
+ before(:all) do
448
+ load_alt_test(:first, :last)
449
+ end
450
+
451
+ it "should not match so boldly after reverting" do
452
+ Name.new('french, steven').match('Stephen', 'Ffrench').should be_true
453
+ Name.load_alternatives(:first)
454
+ Name.new('Patrick', 'Murphy').match('Padraic', 'Murchadha').should be_false
455
+ Name.new('Patrick', 'Murphy').match('Patrick', 'Murchadha').should be_true
456
+ Name.load_alternatives(:last)
457
+ Name.new('Patrick', 'Murphy').match('Patrick', 'Murchadha').should be_false
458
+ end
459
+ end
460
+
461
+ context "name alternatives with default configuration" do
462
+ it "should show common nicknames" do
463
+ Name.new('William', 'Ffrench').alternatives(:first).should =~ %w{Bill Willy Willie Will}
464
+ Name.new('Bill', 'Ffrench').alternatives(:first).should =~ %w{William Willy Will Willie}
465
+ Name.new('Steven', 'Ffrench').alternatives(:first).should =~ %w{Steve}
466
+ Name.new('Stephen', 'Ffrench').alternatives(:first).should =~ %w{Steve}
467
+ Name.new('Michael Stephen', 'Ffrench').alternatives(:first).should =~ %w{Steve Mike Mick Mikey}
468
+ Name.new('Stephen M.', 'Ffrench').alternatives(:first).should =~ %w{Steve}
469
+ Name.new('S.', 'Ffrench').alternatives(:first).should =~ []
470
+ Name.new('Sean', 'Bradley').alternatives(:first).should =~ []
471
+ end
472
+
473
+ it "should not have any last name alternatives" do
474
+ Name.new('William', 'Ffrench').alternatives(:last).should =~ []
475
+ Name.new('Mairead', "O'Siochru").alternatives(:last).should =~ []
476
+ Name.new('Oissine', 'Murphy').alternatives(:last).should =~ []
477
+ Name.new('Debbie', 'Quinn').alternatives(:last).should =~ []
478
+ end
479
+ end
480
+
481
+ context "name alternatives with more adventurous configuration" do
482
+ before(:all) do
483
+ load_alt_test(:first, :last)
484
+ end
485
+
486
+ it "should show additional nicknames" do
487
+ Name.new('Steven', 'Ffrench').alternatives(:first).should =~ %w{Stephen Steve}
488
+ Name.new('Stephen', 'Ffrench').alternatives(:first).should =~ %w{Stef Stefan Stefen Stephan Steve Steven}
489
+ Name.new('Stephen Mike', 'Ffrench').alternatives(:first).should =~ %w{Michael Mick Mickie Micky Mikey Stef Stefan Stefen Stephan Steve Steven}
490
+ Name.new('Sean', 'Bradley').alternatives(:first).should =~ %w{John}
491
+ Name.new('Sean', 'McDonagh').alternatives(:first).should =~ []
492
+ Name.new('John', 'Bradley').alternatives(:first).should =~ %w{Sean Johnny}
493
+ end
494
+
495
+ it "should have some last name alternatives" do
496
+ Name.new('William', 'Ffrench').alternatives(:last).should =~ %w{French}
497
+ Name.new('Mairead', "O'Siochru").alternatives(:last).should =~ %w{King}
498
+ Name.new('Oissine', 'Murphy').alternatives(:last).should =~ %w{Murchadha}
499
+ Name.new('Debbie', 'Quinn').alternatives(:last).should =~ %w{Benjamin}
500
+ Name.new('Mark', 'Quinn').alternatives(:last).should =~ []
501
+ Name.new('Debbie', 'Quinn-French').alternatives(:last).should =~ %w{Benjamin Ffrench}
502
+ end
503
+ end
504
+
505
+ context "number of alternative compilations" do
506
+ before(:all) do
507
+ Name.reset_alternatives
508
+ end
509
+
510
+ it "should be no more than necessary" do
511
+ alt_compilations(:first).should == 0
512
+ alt_compilations(:last).should == 0
513
+ Name.new('William', 'Ffrench').match('Bill', 'French')
514
+ alt_compilations(:first).should == 1
515
+ alt_compilations(:last).should == 1
516
+ Name.new('Debbie', 'Quinn').match('Deborah', 'Benjamin')
517
+ alt_compilations(:first).should == 1
518
+ alt_compilations(:last).should == 1
519
+ load_alt_test(:first)
520
+ alt_compilations(:first).should == 2
521
+ alt_compilations(:last).should == 1
522
+ load_alt_test(:last)
523
+ alt_compilations(:first).should == 2
524
+ alt_compilations(:last).should == 2
525
+ Name.new('William', 'Ffrench').match('Bill', 'French')
526
+ Name.new('Debbie', 'Quinn').match('Deborah', 'Benjamin')
527
+ Name.new('Mark', 'Orr').alternatives(:first)
528
+ Name.new('Mark', 'Orr').alternatives(:last)
529
+ alt_compilations(:first).should == 2
530
+ alt_compilations(:last).should == 2
531
+ end
532
+ end
533
+
534
+ context "immutability" do
535
+ before(:each) do
536
+ @mark = ICU::Name.new('Màrk', 'Orr')
537
+ end
538
+
539
+ it "there are no setters" do
540
+ lambda { @mark.first = "Malcolm" }.should raise_error(/undefined/)
541
+ lambda { @mark.last = "Dickie" }.should raise_error(/undefined/)
542
+ lambda { @mark.original = "mark orr" }.should raise_error(/undefined/)
543
+ end
544
+
545
+ it "should prevent accidentally access to the instance variables" do
546
+ @mark.first.downcase!
547
+ @mark.first.should == "Màrk"
548
+ @mark.last.downcase!
549
+ @mark.last.should == "Orr"
550
+ @mark.original.downcase!
551
+ @mark.original.should == "Orr, Màrk"
552
+ end
553
+
554
+ it "should prevent accidentally access to the instance variables when transliterating" do
555
+ @mark.first(:chars => "US-ASCII").downcase!
556
+ @mark.first.should == "Màrk"
557
+ @mark.last(:chars => "US-ASCII").downcase!
558
+ @mark.last.should == "Orr"
559
+ @mark.original(:chars => "US-ASCII").downcase!
560
+ @mark.original.should == "Orr, Màrk"
561
+ end
562
+ end
364
563
  end
365
564
  end
metadata CHANGED
@@ -3,10 +3,10 @@ name: icu_name
3
3
  version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
- - 0
7
6
  - 1
8
- - 4
9
- version: 0.1.4
7
+ - 0
8
+ - 0
9
+ version: 1.0.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Mark Orr
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2011-03-23 00:00:00 +00:00
17
+ date: 2011-04-16 00:00:00 +01:00
18
18
  default_executable:
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
@@ -138,6 +138,10 @@ files:
138
138
  - spec/name_spec.rb
139
139
  - spec/spec_helper.rb
140
140
  - spec/util_spec.rb
141
+ - config/first_alternatives.yaml
142
+ - config/last_alternatives.yaml
143
+ - config/test_first_alts.yaml
144
+ - config/test_last_alts.yaml
141
145
  - LICENCE
142
146
  - README.rdoc
143
147
  has_rdoc: true