icu_name 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +37 -30
- data/lib/icu_name/name.rb +5 -3
- data/lib/icu_name/version.rb +1 -1
- data/spec/name_spec.rb +2 -1
- metadata +5 -5
data/README.rdoc
CHANGED
@@ -121,22 +121,22 @@ The same option also relaxes the need for accented characters to match exactly:
|
|
121
121
|
|
122
122
|
We saw above how _Bobby_ and _Robert_ were able to match because, by default, the
|
123
123
|
matcher is aware of some common English nicknames. These name alternatives can be
|
124
|
-
customised to handle additional
|
125
|
-
such as common spelling
|
126
|
-
|
127
|
-
The alternative names
|
128
|
-
one for last names. Each
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
[Anthony, Tony]
|
133
|
-
[James, Jim, Jimmy]
|
134
|
-
[Michael, Mike, Mick, Mikey]
|
135
|
-
[Robert, Bob, Bobby]
|
136
|
-
[Stephen, Steve]
|
137
|
-
[Steven, Steve]
|
138
|
-
[Thomas, Tom, Tommy]
|
139
|
-
[William, Will, Willy, Willie, Bill]
|
124
|
+
customised to handle additional nicknames and other types of alternative names
|
125
|
+
such as common spelling error and player name changes.
|
126
|
+
|
127
|
+
The alternative names consist of two arrays, one for first names and
|
128
|
+
one for last names. Each array element is itself an array of strings
|
129
|
+
representing a set of equivalent names. Here, for example, are some
|
130
|
+
of the default first name alternatives:
|
131
|
+
|
132
|
+
["Anthony", "Tony"]
|
133
|
+
["James", "Jim", "Jimmy"]
|
134
|
+
["Michael", "Mike", "Mick", "Mikey"]
|
135
|
+
["Robert", "Bob", "Bobby"]
|
136
|
+
["Stephen", "Steve"]
|
137
|
+
["Steven", "Steve"]
|
138
|
+
["Thomas", "Tom", "Tommy"]
|
139
|
+
["William", "Will", "Willy", "Willie", "Bill"]
|
140
140
|
|
141
141
|
The first of these means that _Anthony_ and _Tony_ are considered equivalent and can match.
|
142
142
|
|
@@ -149,29 +149,35 @@ same group, they don't match each other.
|
|
149
149
|
Name.new("Stephen", "Hanly").match("Steve", "Hanly") # => true
|
150
150
|
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => false
|
151
151
|
|
152
|
-
To
|
153
|
-
|
152
|
+
To change alternative name behaviour, you can replace the default alternatives
|
153
|
+
with a customized set perhaps stored in a database or a YAML file, as illustrated below:
|
154
154
|
|
155
|
-
|
156
|
-
Name.load_alternatives(:
|
155
|
+
data = YAML.load(File open "my_last_name_alternatives.yaml")
|
156
|
+
Name.load_alternatives(:first, data)
|
157
|
+
data = YAML.load(File open "my_first_name_alternatives.yaml")
|
158
|
+
Name.load_alternatives(:first, data)
|
157
159
|
|
158
160
|
An example of one way in which you might want to customize the alternatives is to
|
159
161
|
cater for common spelling mistakes such as _Steven_ and _Stephen_. These two names
|
160
162
|
don't match by default, but you can make them so by replacing the two default rules:
|
161
163
|
|
162
|
-
[Stephen, Steve]
|
163
|
-
[Steven, Steve]
|
164
|
+
["Stephen", "Steve"]
|
165
|
+
["Steven", "Steve"]
|
164
166
|
|
165
167
|
with the following single rule:
|
166
168
|
|
167
|
-
[Stephen, Steven, Steve]
|
169
|
+
["Stephen", "Steven", "Steve"]
|
168
170
|
|
169
171
|
so that now:
|
170
172
|
|
171
173
|
Name.new("Stephen", "Hanly").match("Steven", "Hanly") # => true
|
172
174
|
|
173
|
-
|
174
|
-
|
175
|
+
This kind of rule risks producing false positives - you must judge
|
176
|
+
carefully whether that risk is outweighed by the benefits of being
|
177
|
+
able to overcome spelling mistakes in the context of your application.
|
178
|
+
|
179
|
+
Another use is to cater for English and Irish versions of the same name.
|
180
|
+
For example, for last names:
|
175
181
|
|
176
182
|
[Murphy, Murchadha]
|
177
183
|
|
@@ -181,25 +187,26 @@ or for first names, including spelling variations:
|
|
181
187
|
|
182
188
|
== Conditional Alternatives
|
183
189
|
|
184
|
-
Normally, entries in the two
|
190
|
+
Normally, entries in the two arrays are just lists of alternative names. There is one
|
185
191
|
exception to this however, when one of the entries (it doesn't matter which one but,
|
186
192
|
by convention, the last one) is a regular expression. Here is an example that might
|
187
193
|
be added to the last name alternatives:
|
188
194
|
|
189
|
-
[Quinn, Benjamin,
|
195
|
+
["Quinn", "Benjamin", /^(Debbie|Deborah)$/]
|
190
196
|
|
191
197
|
What this means is that the last names _Quinn_ and _Benjamin_ match but only when the
|
192
|
-
first name matches the regular expression.
|
198
|
+
first name matches the given regular expression. In this case it caters for a female
|
199
|
+
whose last name changed after marriage.
|
193
200
|
|
194
201
|
Name.new("Debbie", "Quinn").match("Debbie", "Benjamin") # => true
|
195
202
|
Name.new("Mark", "Quinn").match("Mark", "Benjamin") # => false
|
196
203
|
|
197
204
|
Another example, this time for first names, is:
|
198
205
|
|
199
|
-
[Sean, John,
|
206
|
+
["Sean", "John", /^Bradley$/]
|
200
207
|
|
201
208
|
This caters for an individual who is known by two normally unrelated first names.
|
202
|
-
|
209
|
+
The two first names only match when the last name is _Bradley_.
|
203
210
|
|
204
211
|
Name.new("John", "Bradley").match("Sean", "Bradley") # => true
|
205
212
|
Name.new("John", "Alfred").match("Sean", "Alfred") # => false
|
data/lib/icu_name/name.rb
CHANGED
@@ -274,10 +274,12 @@ module ICU
|
|
274
274
|
end
|
275
275
|
|
276
276
|
# Compile an alternative names hash (for either first names or last names) before matching is first attempted.
|
277
|
-
def self.compile_alts(type,
|
277
|
+
def self.compile_alts(type, data=nil, force=false)
|
278
278
|
return if @@alts[type] && !force
|
279
|
-
|
280
|
-
|
279
|
+
unless data
|
280
|
+
file = File.expand_path(File.dirname(__FILE__) + "/../../config/#{type}_alternatives.yaml")
|
281
|
+
data = YAML.load(File.open file)
|
282
|
+
end
|
281
283
|
@@cmps[type] ||= 0
|
282
284
|
@@alts[type] = Hash.new
|
283
285
|
code = 1
|
data/lib/icu_name/version.rb
CHANGED
data/spec/name_spec.rb
CHANGED
@@ -6,7 +6,8 @@ module ICU
|
|
6
6
|
def load_alt_test(*types)
|
7
7
|
types.each do |type|
|
8
8
|
file = File.expand_path(File.dirname(__FILE__) + "/../config/test_#{type}_alts.yaml")
|
9
|
-
|
9
|
+
data = YAML.load(File.open file)
|
10
|
+
Name.load_alternatives(type, data)
|
10
11
|
end
|
11
12
|
end
|
12
13
|
|
metadata
CHANGED
@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
|
|
5
5
|
segments:
|
6
6
|
- 1
|
7
7
|
- 0
|
8
|
-
-
|
9
|
-
version: 1.0.
|
8
|
+
- 1
|
9
|
+
version: 1.0.1
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Mark Orr
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2011-04-
|
17
|
+
date: 2011-04-17 00:00:00 +01:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
@@ -28,8 +28,8 @@ dependencies:
|
|
28
28
|
segments:
|
29
29
|
- 3
|
30
30
|
- 0
|
31
|
-
-
|
32
|
-
version: 3.0.
|
31
|
+
- 6
|
32
|
+
version: 3.0.6
|
33
33
|
type: :runtime
|
34
34
|
version_requirements: *id001
|
35
35
|
- !ruby/object:Gem::Dependency
|