icu_name 1.0.0 → 1.0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +37 -30
- data/lib/icu_name/name.rb +5 -3
- data/lib/icu_name/version.rb +1 -1
- data/spec/name_spec.rb +2 -1
- metadata +5 -5
data/README.rdoc
CHANGED
@@ -121,22 +121,22 @@ The same option also relaxes the need for accented characters to match exactly:
|
|
121
121
|
|
122
122
|
We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
|
123
123
|
matcher is aware of some common English nicknames. These name alternatives can be
|
124
|
-
customised to handle additional
|
125
|
-
such as common spelling
|
126
|
-
|
127
|
-
The alternative names
|
128
|
-
one for last names. Each
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
[Anthony, Tony]
|
133
|
-
[James, Jim, Jimmy]
|
134
|
-
[Michael, Mike, Mick, Mikey]
|
135
|
-
[Robert, Bob, Bobby]
|
136
|
-
[Stephen, Steve]
|
137
|
-
[Steven, Steve]
|
138
|
-
[Thomas, Tom, Tommy]
|
139
|
-
[William, Will, Willy, Willie, Bill]
|
124
|
+
customised to handle additional nicknames and other types of alternative names
|
125
|
+
such as common spelling error and player name changes.
|
126
|
+
|
127
|
+
The alternative names consist of two arrays, one for first names and
|
128
|
+
one for last names. Each array element is itself an array of strings
|
129
|
+
representing a set of equivalent names. Here, for example, are some
|
130
|
+
of the default first name alternatives:
|
131
|
+
|
132
|
+
["Anthony", "Tony"]
|
133
|
+
["James", "Jim", "Jimmy"]
|
134
|
+
["Michael", "Mike", "Mick", "Mikey"]
|
135
|
+
["Robert", "Bob", "Bobby"]
|
136
|
+
["Stephen", "Steve"]
|
137
|
+
["Steven", "Steve"]
|
138
|
+
["Thomas", "Tom", "Tommy"]
|
139
|
+
["William", "Will", "Willy", "Willie", "Bill"]
|
140
140
|
|
141
141
|
The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
|
142
142
|
|
@@ -149,29 +149,35 @@ same group, they don't match each other.
|
|
149
149
|
Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
|
150
150
|
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
|
151
151
|
|
152
|
-
To
|
153
|
-
|
152
|
+
To change alternative name behaviour, you can replace the default alternatives
|
153
|
+
with a customized set perhaps stored in a database or a YAML file, as illustrated below:
|
154
154
|
|
155
|
-
|
156
|
-
Name.load_alternatives(:
|
155
|
+
data = YAML.load(File open "my_last_name_alternatives.yaml")
|
156
|
+
Name.load_alternatives(:first, data)
|
157
|
+
data = YAML.load(File open "my_first_name_alternatives.yaml")
|
158
|
+
Name.load_alternatives(:first, data)
|
157
159
|
|
158
160
|
An example of one way in which you might want to customize the alternatives is to
|
159
161
|
cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
|
160
162
|
don't match by default, but you can make them so by replacing the two default rules:
|
161
163
|
|
162
|
-
[Stephen, Steve]
|
163
|
-
[Steven, Steve]
|
164
|
+
["Stephen", "Steve"]
|
165
|
+
["Steven", "Steve"]
|
164
166
|
|
165
167
|
with the following single rule:
|
166
168
|
|
167
|
-
[Stephen, Steven, Steve]
|
169
|
+
["Stephen", "Steven", "Steve"]
|
168
170
|
|
169
171
|
so that now:
|
170
172
|
|
171
173
|
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
|
172
174
|
|
173
|
-
|
174
|
-
|
175
|
+
This kind of rule risks producing false positives - you must judge
|
176
|
+
carefully whether that risk is outweighed by the benefits of being
|
177
|
+
able to overcome spelling mistakes in the context of your application.
|
178
|
+
|
179
|
+
Another use is to cater for English and Irish versions of the same name.
|
180
|
+
For example, for last names:
|
175
181
|
|
176
182
|
[Murphy, Murchadha]
|
177
183
|
|
@@ -181,25 +187,26 @@ or for first names, including spelling variations:
|
|
181
187
|
|
182
188
|
== Conditional Alternatives
|
183
189
|
|
184
|
-
Normally, entries in the two
|
190
|
+
Normally, entries in the two arrays are just lists of alternative names. There is one
|
185
191
|
exception to this however, when one of the entries (it doesn't matter which one but,
|
186
192
|
by convention, the last one) is a regular expression. Here is an example that might
|
187
193
|
be added to the last name alternatives:
|
188
194
|
|
189
|
-
[Quinn, Benjamin,
|
195
|
+
["Quinn", "Benjamin", /^(Debbie|Deborah)$/]
|
190
196
|
|
191
197
|
What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
|
192
|
-
first name matches the regular expression.
|
198
|
+
first name matches the given regular expression. In this case it caters for a female
|
199
|
+
whose last name changed after marriage.
|
193
200
|
|
194
201
|
Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
|
195
202
|
Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
|
196
203
|
|
197
204
|
Another example, this time for first names, is:
|
198
205
|
|
199
|
-
[Sean, John,
|
206
|
+
["Sean", "John", /^Bradley$/]
|
200
207
|
|
201
208
|
This caters for an individual who is known by two normally unrelated first names.
|
202
|
-
|
209
|
+
The two first names only match when the last name is _Bradley_.
|
203
210
|
|
204
211
|
Name.new("John", "Bradley").match("Sean", "Bradley") # => true
|
205
212
|
Name.new("John", "Alfred").match("Sean", "Alfred") # => false
|
data/lib/icu_name/name.rb
CHANGED
@@ -274,10 +274,12 @@ module ICU
|
|
274
274
|
end
|
275
275
|
|
276
276
|
# Compile an alternative names hash (for either first names or last names) before matching is first attempted.
|
277
|
-
def self.compile_alts(type,
|
277
|
+
def self.compile_alts(type, data=nil, force=false)
|
278
278
|
return if @@alts[type] && !force
|
279
|
-
|
280
|
-
|
279
|
+
unless data
|
280
|
+
file = File.expand_path(File.dirname(__FILE__) + "/../../config/#{type}_alternatives.yaml")
|
281
|
+
data = YAML.load(File.open file)
|
282
|
+
end
|
281
283
|
@@cmps[type] ||= 0
|
282
284
|
@@alts[type] = Hash.new
|
283
285
|
code = 1
|
data/lib/icu_name/version.rb
CHANGED
data/spec/name_spec.rb
CHANGED
@@ -6,7 +6,8 @@ module ICU
|
|
6
6
|
def load_alt_test(*types)
|
7
7
|
types.each do |type|
|
8
8
|
file = File.expand_path(File.dirname(__FILE__) + "/../config/test_#{type}_alts.yaml")
|
9
|
-
|
9
|
+
data = YAML.load(File.open file)
|
10
|
+
Name.load_alternatives(type, data)
|
10
11
|
end
|
11
12
|
end
|
12
13
|
|
metadata
CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
|
|
5
5
|
segments:
|
6
6
|
- 1
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
version: 1.0.
|
8
|
+
- 1
|
9
|
+
version: 1.0.1
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Mark Orr
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2011-04-
|
17
|
+
date: 2011-04-17 00:00:00 +01:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
@@ -28,8 +28,8 @@ dependencies:
|
|
28
28
|
segments:
|
29
29
|
- 3
|
30
30
|
- 0
|
31
|
-
-
|
32
|
-
version: 3.0.
|
31
|
+
- 6
|
32
|
+
version: 3.0.6
|
33
33
|
type: :runtime
|
34
34
|
version_requirements: *id001
|
35
35
|
- !ruby/object:Gem::Dependency
|