name-tamer 0.0.3 → 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +29 -1
- data/lib/name-tamer.rb +66 -58
- data/lib/name-tamer/version.rb +1 -1
- data/spec/name_tamer_spec.rb +6 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9bb03b1eb2ecf3657424b2eb6d15009143783799
|
4
|
+
data.tar.gz: cb539043cf2bad1f2ce258355fb5b12995078642
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5c004130e0b5cd5f6a14de3e061cf448c0c9a40081d0f76544ae9b7e0f7311661fc767f74b742b0f76ffa28c74d067584261ea04f4e50b270e62b6c0df369fc1
|
7
|
+
data.tar.gz: 99b9a308fc495e1c8ee25452d62f264af02ba301243cd9cb80ff9cfeeb2af024dd3da38d1f4cf79e7e6c0cbf306d62208a84cad688a50df4c7ac6f688972f567
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -28,14 +28,42 @@ Examples:
|
|
28
28
|
NameTamer['Mr. John Q. Smith III, MD'].simple_name # => John Smith
|
29
29
|
```
|
30
30
|
|
31
|
+
Or you can create an instance if you need several versions of the name
|
32
|
+
|
31
33
|
```ruby
|
32
|
-
name_tamer = NameTamer
|
34
|
+
name_tamer = NameTamer.new 'Mr. John Q. Smith III, MD'
|
33
35
|
name_tamer.slug # => john-smith
|
34
36
|
name_tamer.nice_name # => John Q. Smith
|
37
|
+
name_tamer.contact_type # => :person
|
38
|
+
```
|
39
|
+
|
40
|
+
NameTamer will make an intelligent guess at the type of the name but it's not infallible. NameTamer likes it if you tell it whether the name is a person or an organization:
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
name_tamer = NameTamer.new 'Di Doo Doo d.o.o.', contact_type: :organization
|
44
|
+
name_tamer.simple_name # => Di Doo Doo
|
35
45
|
```
|
36
46
|
|
37
47
|
## Contributing
|
38
48
|
|
49
|
+
There must be lots of name suffixes and prefixes that I haven't catered for, so please get in touch if `name-tamer` doesn't recognise one that you've found.
|
50
|
+
|
51
|
+
If there are any other common two-word family names that I've missed then please let me know. `name-tamer` tries to make sure Helena Bonham Carter gets slugified to `helena-bonham-carter` and not `helena-carter`, but I'm sure there are loads of two-word family names I don't know about.
|
52
|
+
|
53
|
+
Please read all the following articles before contributing:
|
54
|
+
|
55
|
+
* [Personal names around the world](http://www.w3.org/International/questions/qa-personal-names)
|
56
|
+
* [Namae (名前)](https://github.com/berkmancenter/namae)
|
57
|
+
* [Matts Name Parser](https://github.com/mericson/people)
|
58
|
+
* [Types of business entity](http://en.wikipedia.org/wiki/Types_of_business_entity)
|
59
|
+
* [List of professional designations in the United States](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA))
|
60
|
+
* [List of post-nominal letters (United Kingdom)](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom))
|
61
|
+
* [Nobiliary particle](http://en.wikipedia.org/wiki/Nobiliary_particle)
|
62
|
+
* [Spanish naming customs](http://en.wikipedia.org/wiki/Spanish_naming_customs)
|
63
|
+
* [Unified style sheet for linguistics](http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf) [PDF]
|
64
|
+
|
65
|
+
### How to contribute
|
66
|
+
|
39
67
|
1. Fork it
|
40
68
|
1. Create your feature branch (`git checkout -b my-new-feature`)
|
41
69
|
1. Commit your changes (`git commit -am 'Add some feature'`)
|
data/lib/name-tamer.rb
CHANGED
@@ -3,7 +3,7 @@
|
|
3
3
|
# References:
|
4
4
|
# http://www.w3.org/International/questions/qa-personal-names
|
5
5
|
# https://github.com/berkmancenter/namae
|
6
|
-
# https://github.com/mericson
|
6
|
+
# https://github.com/mericson/people
|
7
7
|
# http://en.wikipedia.org/wiki/Types_of_business_entity
|
8
8
|
# http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA)
|
9
9
|
# http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom)
|
@@ -12,7 +12,7 @@
|
|
12
12
|
# http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf [PDF]
|
13
13
|
|
14
14
|
class NameTamer
|
15
|
-
attr_reader :name
|
15
|
+
attr_reader :name
|
16
16
|
|
17
17
|
class << self
|
18
18
|
def [](name, args = {})
|
@@ -21,8 +21,8 @@ class NameTamer
|
|
21
21
|
end
|
22
22
|
|
23
23
|
def nice_name
|
24
|
-
|
25
|
-
@nice_name =
|
24
|
+
unless @nice_name
|
25
|
+
@nice_name = name.dup # Start with the name we've received
|
26
26
|
|
27
27
|
tidy_spacing # " John Smith " -> "John Smith"
|
28
28
|
consolidate_initials # "I. B. M." -> "I.B.M."
|
@@ -38,7 +38,7 @@ class NameTamer
|
|
38
38
|
end
|
39
39
|
|
40
40
|
def simple_name
|
41
|
-
|
41
|
+
unless @simple_name
|
42
42
|
@simple_name = nice_name.dup # Start with nice name
|
43
43
|
|
44
44
|
remove_initials # "John Q. Doe" -> "John Doe"
|
@@ -53,7 +53,7 @@ class NameTamer
|
|
53
53
|
end
|
54
54
|
|
55
55
|
def slug
|
56
|
-
|
56
|
+
unless @slug
|
57
57
|
@slug = simple_name.dup # Start with search name
|
58
58
|
slugify # "John Doe" -> "john-doe"
|
59
59
|
end
|
@@ -66,25 +66,31 @@ class NameTamer
|
|
66
66
|
contact_type_best_effort
|
67
67
|
end
|
68
68
|
|
69
|
+
def contact_type= new_contact_type
|
70
|
+
ct_as_sym = new_contact_type.to_sym
|
71
|
+
|
72
|
+
unless @contact_type.nil? || @contact_type == ct_as_sym
|
73
|
+
puts "Changing contact type of #{@name} from #{@contact_type} to #{new_contact_type}"
|
74
|
+
end
|
75
|
+
|
76
|
+
@contact_type = ct_as_sym
|
77
|
+
end
|
78
|
+
|
69
79
|
=begin These lines aren't used and aren't covered by specs
|
70
80
|
def name=(new_name)
|
71
81
|
initialize new_name, :contact_type => @contact_type
|
72
82
|
end
|
73
83
|
|
74
|
-
def contact_type=(new_contact_type)
|
75
|
-
initialize @name, :contact_type => new_contact_type
|
76
|
-
end
|
77
|
-
|
78
84
|
def to_hash
|
79
85
|
{
|
80
|
-
name:
|
81
|
-
nice_name:
|
82
|
-
simple_name:
|
83
|
-
slug:
|
84
|
-
contact_type:
|
85
|
-
last_name:
|
86
|
-
remainder:
|
87
|
-
adfix_found:
|
86
|
+
name: name,
|
87
|
+
nice_name: nice_name,
|
88
|
+
simple_name: simple_name,
|
89
|
+
slug: slug,
|
90
|
+
contact_type: contact_type,
|
91
|
+
last_name: last_name,
|
92
|
+
remainder: remainder,
|
93
|
+
adfix_found: adfix_found
|
88
94
|
}
|
89
95
|
end
|
90
96
|
=end
|
@@ -98,7 +104,6 @@ class NameTamer
|
|
98
104
|
def tidy_spacing
|
99
105
|
@nice_name.gsub!(/,\s*/, ', ') # Ensure commas have exactly one space after them
|
100
106
|
@nice_name.strip! # remove leading & trailing whitespace
|
101
|
-
|
102
107
|
@nice_name = ensure_whitespace_is_ascii_space @nice_name
|
103
108
|
end
|
104
109
|
|
@@ -176,11 +181,7 @@ class NameTamer
|
|
176
181
|
# Conjoin compound names with non-breaking spaces
|
177
182
|
def use_nonbreaking_spaces_in_compound_names
|
178
183
|
# Fix known last names that have spaces (not hyphens!)
|
179
|
-
|
180
|
-
'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore',
|
181
|
-
'Holmes à Court', 'Holmes a Court', 'Baron Cohen',
|
182
|
-
'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
|
183
|
-
].each do |compound_name|
|
184
|
+
COMPOUND_NAMES.each do |compound_name|
|
184
185
|
@nice_name.gsub!(compound_name, compound_name.tr(ASCII_SPACE, NONBREAKING_SPACE))
|
185
186
|
end
|
186
187
|
|
@@ -197,10 +198,10 @@ class NameTamer
|
|
197
198
|
# i.e. only remove initials if there's also a proper name there
|
198
199
|
def remove_initials
|
199
200
|
if @contact_type == :person
|
200
|
-
|
201
|
+
temp_name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
|
201
202
|
|
202
203
|
# If the name still has at least one space we're OK
|
203
|
-
@simple_name =
|
204
|
+
@simple_name = temp_name if temp_name.include?(ASCII_SPACE)
|
204
205
|
end
|
205
206
|
end
|
206
207
|
|
@@ -237,9 +238,17 @@ class NameTamer
|
|
237
238
|
# Initialization and utilities
|
238
239
|
#--------------------------------------------------------
|
239
240
|
|
240
|
-
def initialize(
|
241
|
-
@name
|
242
|
-
|
241
|
+
def initialize(new_name, args = {})
|
242
|
+
@name = new_name || ''
|
243
|
+
|
244
|
+
if args[:contact_type]
|
245
|
+
ct = args[:contact_type]
|
246
|
+
ct = ct.to_s unless [String, Symbol].include? ct.class
|
247
|
+
ct.downcase! if ct.class == String
|
248
|
+
ct = ct.to_sym
|
249
|
+
ct = nil unless [:person, :organization].include? ct
|
250
|
+
@contact_type = ct
|
251
|
+
end
|
243
252
|
|
244
253
|
@nice_name = nil
|
245
254
|
@simple_name = nil
|
@@ -251,12 +260,6 @@ class NameTamer
|
|
251
260
|
@adfix_found = false
|
252
261
|
end
|
253
262
|
|
254
|
-
def set_contact_type contact_type
|
255
|
-
contact_type_sym = contact_type.to_sym
|
256
|
-
puts "Changing contact type of #{@name} from #{@contact_type} to #{contact_type}".red unless @contact_type.nil? || @contact_type == contact_type_sym
|
257
|
-
@contact_type = contact_type_sym
|
258
|
-
end
|
259
|
-
|
260
263
|
# If we don't know the contact type, what's our best guess?
|
261
264
|
def contact_type_best_effort
|
262
265
|
if @contact_type
|
@@ -275,23 +278,23 @@ class NameTamer
|
|
275
278
|
# We pass to this routine either prefixes or suffixes
|
276
279
|
def remove_outermost_adfix adfix_type, name_part
|
277
280
|
adfixes = ADFIX_PATTERNS[adfix_type]
|
278
|
-
|
279
|
-
parts = name_part.partition adfixes[
|
281
|
+
ct = contact_type_best_effort
|
282
|
+
parts = name_part.partition adfixes[ct]
|
280
283
|
@adfix_found = !parts[1].empty?
|
281
284
|
|
282
285
|
# If the contact type is indeterminate and we didn't find a diagnostic adfix
|
283
286
|
# for a person then try again for an organization
|
284
287
|
if @contact_type.nil?
|
285
288
|
unless @adfix_found
|
286
|
-
|
287
|
-
parts = name_part.partition adfixes[
|
289
|
+
ct = :organization
|
290
|
+
parts = name_part.partition adfixes[ct]
|
288
291
|
@adfix_found = !parts[1].empty?
|
289
292
|
end
|
290
293
|
end
|
291
294
|
|
292
295
|
if @adfix_found
|
293
296
|
# If we've found a diagnostic adfix then set the contact type
|
294
|
-
|
297
|
+
self.contact_type = ct
|
295
298
|
|
296
299
|
# The remainder of the name will be in parts[0] or parts[2] depending
|
297
300
|
# on whether this is a prefix or a suffix.
|
@@ -317,44 +320,44 @@ class NameTamer
|
|
317
320
|
# Improved in several areas, also now adds non-breaking spaces for
|
318
321
|
# compound names like "van der Pump"
|
319
322
|
def name_case lowercase
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
+
n = lowercase # We assume the name is passed already downcased
|
324
|
+
n.gsub!(/\b\w/) { |first| first.upcase }
|
325
|
+
n.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
|
323
326
|
|
324
327
|
# Our list of terminal characters that indicate a non-celtic name used
|
325
328
|
# to include o but we removed it because of MacMurdo.
|
326
|
-
if
|
327
|
-
|
329
|
+
if n =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or n =~ /\bMc/
|
330
|
+
n.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
|
328
331
|
|
329
332
|
# Fix Mac exceptions
|
330
333
|
[
|
331
334
|
'MacEdo', 'MacEvicius', 'MacHado', 'MacHar', 'MacHin', 'MacHlin', 'MacIas', 'MacIulis', 'MacKie', 'MacKle',
|
332
335
|
'MacKlin', 'MacKmin', 'MacKmurdo', 'MacQuarie', 'MacLise', 'MacKenzie'
|
333
|
-
].each { |mac_name|
|
336
|
+
].each { |mac_name| n.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
|
334
337
|
end
|
335
338
|
|
336
339
|
# Fix ff wierdybonks
|
337
340
|
[
|
338
341
|
'Fforbes', 'Fforde', 'Ffinch', 'Ffrench', 'Ffoulkes'
|
339
|
-
].each { |ff_name|
|
342
|
+
].each { |ff_name| n.gsub!(ff_name,ff_name.downcase) }
|
340
343
|
|
341
344
|
# Fixes for name modifiers followed by space
|
342
345
|
# Also replaces spaces with non-breaking spaces
|
343
346
|
NAME_MODIFIERS.each do |modifier|
|
344
|
-
|
347
|
+
n.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
|
345
348
|
end
|
346
349
|
|
347
350
|
# Fixes for name modifiers followed by an apostrophe, e.g. d'Artagnan, Commedia dell'Arte
|
348
351
|
['Dell', 'D'].each do |modifier|
|
349
|
-
|
352
|
+
n.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
|
350
353
|
end
|
351
354
|
|
352
355
|
# Upcase words with no vowels, e.g JPR Williams
|
353
|
-
|
356
|
+
n.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
|
354
357
|
# Except Ng
|
355
|
-
|
358
|
+
n.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
|
356
359
|
|
357
|
-
|
360
|
+
n
|
358
361
|
end
|
359
362
|
|
360
363
|
def parameterize string, args = {}
|
@@ -432,9 +435,14 @@ class NameTamer
|
|
432
435
|
FILTER_RFC3987 = /[^#{ISEGMENT_NZ_NC}]/
|
433
436
|
FILTER_COMPAT = /[^#{ALPHA}#{DIGIT}\-_#{UCSCHAR}]/
|
434
437
|
|
435
|
-
NAME_MODIFIERS
|
436
|
-
'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]',
|
437
|
-
|
438
|
+
NAME_MODIFIERS = [
|
439
|
+
'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]', 'V[ao]n', 'Of', 'St[\.]?'
|
440
|
+
]
|
441
|
+
|
442
|
+
COMPOUND_NAMES = [
|
443
|
+
'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore', 'Holmes à Court', 'Holmes a Court',
|
444
|
+
'Baron Cohen', 'Strang Steel',
|
445
|
+
'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
|
438
446
|
]
|
439
447
|
|
440
448
|
# These are the prefixes and suffixes we want to remove
|
@@ -497,10 +505,10 @@ class NameTamer
|
|
497
505
|
patterns = {}
|
498
506
|
adfix = ADFIXES[adfix_type]
|
499
507
|
|
500
|
-
[:person, :organization].each do |
|
501
|
-
with_optional_spaces = adfix[
|
508
|
+
[:person, :organization].each do |ct|
|
509
|
+
with_optional_spaces = adfix[ct].map { |p| p.gsub(ASCII_SPACE,' *') }
|
502
510
|
pattern_string = with_optional_spaces.join('|').gsub('.', '\.*')
|
503
|
-
patterns[
|
511
|
+
patterns[ct] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
|
504
512
|
end
|
505
513
|
|
506
514
|
ADFIX_PATTERNS[adfix_type] = patterns
|
data/lib/name-tamer/version.rb
CHANGED
data/spec/name_tamer_spec.rb
CHANGED
@@ -119,7 +119,12 @@ describe NameTamer do
|
|
119
119
|
{ n:'محمود ياسر', t: :organization, nn:'محمود ياسر', sn:'محمود ياسر', s:'محمود-ياسر' },
|
120
120
|
{ n:'קובי ביטר', t: :organization, nn:'קובי ביטר', sn:'קובי ביטר', s:'קובי-ביטר' },
|
121
121
|
{ n:'الملاك الحارس', t: :organization, nn:'الملاك الحارس', sn:'الملاك الحارس', s:'الملاك-الحارس' },
|
122
|
-
{ n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' }
|
122
|
+
{ n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' },
|
123
|
+
{ nn: '', sn: '', s: '' },
|
124
|
+
{ n:'Union Square Ventures', t: 'Organization', nn:'Union Square Ventures', sn:'Union Square Ventures', s:'union-square-ventures' },
|
125
|
+
{ n:'John Smith', t: 'Person', nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
126
|
+
{ n:'John Smith', t: :nonsense, nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
127
|
+
{ n:'John Smith', t: Kernel, nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
123
128
|
]
|
124
129
|
end
|
125
130
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: name-tamer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Xenapto
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-
|
11
|
+
date: 2014-06-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|