name-tamer 0.0.3 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +29 -1
- data/lib/name-tamer.rb +66 -58
- data/lib/name-tamer/version.rb +1 -1
- data/spec/name_tamer_spec.rb +6 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9bb03b1eb2ecf3657424b2eb6d15009143783799
|
4
|
+
data.tar.gz: cb539043cf2bad1f2ce258355fb5b12995078642
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5c004130e0b5cd5f6a14de3e061cf448c0c9a40081d0f76544ae9b7e0f7311661fc767f74b742b0f76ffa28c74d067584261ea04f4e50b270e62b6c0df369fc1
|
7
|
+
data.tar.gz: 99b9a308fc495e1c8ee25452d62f264af02ba301243cd9cb80ff9cfeeb2af024dd3da38d1f4cf79e7e6c0cbf306d62208a84cad688a50df4c7ac6f688972f567
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -28,14 +28,42 @@ Examples:
|
|
28
28
|
NameTamer['Mr. John Q. Smith III, MD'].simple_name # => John Smith
|
29
29
|
```
|
30
30
|
|
31
|
+
Or you can create an instance if you need several versions of the name
|
32
|
+
|
31
33
|
```ruby
|
32
|
-
name_tamer = NameTamer
|
34
|
+
name_tamer = NameTamer.new 'Mr. John Q. Smith III, MD'
|
33
35
|
name_tamer.slug # => john-smith
|
34
36
|
name_tamer.nice_name # => John Q. Smith
|
37
|
+
name_tamer.contact_type # => :person
|
38
|
+
```
|
39
|
+
|
40
|
+
NameTamer will make an intelligent guess at the type of the name but it's not infallible. NameTamer likes it if you tell it whether the name is a person or an organization:
|
41
|
+
|
42
|
+
```ruby
|
43
|
+
name_tamer = NameTamer.new 'Di Doo Doo d.o.o.', contact_type: :organization
|
44
|
+
name_tamer.simple_name # => Di Doo Doo
|
35
45
|
```
|
36
46
|
|
37
47
|
## Contributing
|
38
48
|
|
49
|
+
There must be lots of name suffixes and prefixes that I haven't catered for, so please get in touch if `name-tamer` doesn't recognise one that you've found.
|
50
|
+
|
51
|
+
If there are any other common two-word family names that I've missed then please let me know. `name-tamer` tries to make sure Helena Bonham Carter gets slugified to `helena-bonham-carter` and not `helena-carter`, but I'm sure there are loads of two-word family names I don't know about.
|
52
|
+
|
53
|
+
Please read all the following articles before contributing:
|
54
|
+
|
55
|
+
* [Personal names around the world](http://www.w3.org/International/questions/qa-personal-names)
|
56
|
+
* [Namae (名前)](https://github.com/berkmancenter/namae)
|
57
|
+
* [Matts Name Parser](https://github.com/mericson/people)
|
58
|
+
* [Types of business entity](http://en.wikipedia.org/wiki/Types_of_business_entity)
|
59
|
+
* [List of professional designations in the United States](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA))
|
60
|
+
* [List of post-nominal letters (United Kingdom)](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom))
|
61
|
+
* [Nobiliary particle](http://en.wikipedia.org/wiki/Nobiliary_particle)
|
62
|
+
* [Spanish naming customs](http://en.wikipedia.org/wiki/Spanish_naming_customs)
|
63
|
+
* [Unified style sheet for linguistics](http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf) [PDF]
|
64
|
+
|
65
|
+
### How to contribute
|
66
|
+
|
39
67
|
1. Fork it
|
40
68
|
1. Create your feature branch (`git checkout -b my-new-feature`)
|
41
69
|
1. Commit your changes (`git commit -am 'Add some feature'`)
|
data/lib/name-tamer.rb
CHANGED
@@ -3,7 +3,7 @@
|
|
3
3
|
# References:
|
4
4
|
# http://www.w3.org/International/questions/qa-personal-names
|
5
5
|
# https://github.com/berkmancenter/namae
|
6
|
-
# https://github.com/mericson
|
6
|
+
# https://github.com/mericson/people
|
7
7
|
# http://en.wikipedia.org/wiki/Types_of_business_entity
|
8
8
|
# http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA)
|
9
9
|
# http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom)
|
@@ -12,7 +12,7 @@
|
|
12
12
|
# http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf [PDF]
|
13
13
|
|
14
14
|
class NameTamer
|
15
|
-
attr_reader :name
|
15
|
+
attr_reader :name
|
16
16
|
|
17
17
|
class << self
|
18
18
|
def [](name, args = {})
|
@@ -21,8 +21,8 @@ class NameTamer
|
|
21
21
|
end
|
22
22
|
|
23
23
|
def nice_name
|
24
|
-
|
25
|
-
@nice_name =
|
24
|
+
unless @nice_name
|
25
|
+
@nice_name = name.dup # Start with the name we've received
|
26
26
|
|
27
27
|
tidy_spacing # " John Smith " -> "John Smith"
|
28
28
|
consolidate_initials # "I. B. M." -> "I.B.M."
|
@@ -38,7 +38,7 @@ class NameTamer
|
|
38
38
|
end
|
39
39
|
|
40
40
|
def simple_name
|
41
|
-
|
41
|
+
unless @simple_name
|
42
42
|
@simple_name = nice_name.dup # Start with nice name
|
43
43
|
|
44
44
|
remove_initials # "John Q. Doe" -> "John Doe"
|
@@ -53,7 +53,7 @@ class NameTamer
|
|
53
53
|
end
|
54
54
|
|
55
55
|
def slug
|
56
|
-
|
56
|
+
unless @slug
|
57
57
|
@slug = simple_name.dup # Start with search name
|
58
58
|
slugify # "John Doe" -> "john-doe"
|
59
59
|
end
|
@@ -66,25 +66,31 @@ class NameTamer
|
|
66
66
|
contact_type_best_effort
|
67
67
|
end
|
68
68
|
|
69
|
+
def contact_type= new_contact_type
|
70
|
+
ct_as_sym = new_contact_type.to_sym
|
71
|
+
|
72
|
+
unless @contact_type.nil? || @contact_type == ct_as_sym
|
73
|
+
puts "Changing contact type of #{@name} from #{@contact_type} to #{new_contact_type}"
|
74
|
+
end
|
75
|
+
|
76
|
+
@contact_type = ct_as_sym
|
77
|
+
end
|
78
|
+
|
69
79
|
=begin These lines aren't used and aren't covered by specs
|
70
80
|
def name=(new_name)
|
71
81
|
initialize new_name, :contact_type => @contact_type
|
72
82
|
end
|
73
83
|
|
74
|
-
def contact_type=(new_contact_type)
|
75
|
-
initialize @name, :contact_type => new_contact_type
|
76
|
-
end
|
77
|
-
|
78
84
|
def to_hash
|
79
85
|
{
|
80
|
-
name:
|
81
|
-
nice_name:
|
82
|
-
simple_name:
|
83
|
-
slug:
|
84
|
-
contact_type:
|
85
|
-
last_name:
|
86
|
-
remainder:
|
87
|
-
adfix_found:
|
86
|
+
name: name,
|
87
|
+
nice_name: nice_name,
|
88
|
+
simple_name: simple_name,
|
89
|
+
slug: slug,
|
90
|
+
contact_type: contact_type,
|
91
|
+
last_name: last_name,
|
92
|
+
remainder: remainder,
|
93
|
+
adfix_found: adfix_found
|
88
94
|
}
|
89
95
|
end
|
90
96
|
=end
|
@@ -98,7 +104,6 @@ class NameTamer
|
|
98
104
|
def tidy_spacing
|
99
105
|
@nice_name.gsub!(/,\s*/, ', ') # Ensure commas have exactly one space after them
|
100
106
|
@nice_name.strip! # remove leading & trailing whitespace
|
101
|
-
|
102
107
|
@nice_name = ensure_whitespace_is_ascii_space @nice_name
|
103
108
|
end
|
104
109
|
|
@@ -176,11 +181,7 @@ class NameTamer
|
|
176
181
|
# Conjoin compound names with non-breaking spaces
|
177
182
|
def use_nonbreaking_spaces_in_compound_names
|
178
183
|
# Fix known last names that have spaces (not hyphens!)
|
179
|
-
|
180
|
-
'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore',
|
181
|
-
'Holmes à Court', 'Holmes a Court', 'Baron Cohen',
|
182
|
-
'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
|
183
|
-
].each do |compound_name|
|
184
|
+
COMPOUND_NAMES.each do |compound_name|
|
184
185
|
@nice_name.gsub!(compound_name, compound_name.tr(ASCII_SPACE, NONBREAKING_SPACE))
|
185
186
|
end
|
186
187
|
|
@@ -197,10 +198,10 @@ class NameTamer
|
|
197
198
|
# i.e. only remove initials if there's also a proper name there
|
198
199
|
def remove_initials
|
199
200
|
if @contact_type == :person
|
200
|
-
|
201
|
+
temp_name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
|
201
202
|
|
202
203
|
# If the name still has at least one space we're OK
|
203
|
-
@simple_name =
|
204
|
+
@simple_name = temp_name if temp_name.include?(ASCII_SPACE)
|
204
205
|
end
|
205
206
|
end
|
206
207
|
|
@@ -237,9 +238,17 @@ class NameTamer
|
|
237
238
|
# Initialization and utilities
|
238
239
|
#--------------------------------------------------------
|
239
240
|
|
240
|
-
def initialize(
|
241
|
-
@name
|
242
|
-
|
241
|
+
def initialize(new_name, args = {})
|
242
|
+
@name = new_name || ''
|
243
|
+
|
244
|
+
if args[:contact_type]
|
245
|
+
ct = args[:contact_type]
|
246
|
+
ct = ct.to_s unless [String, Symbol].include? ct.class
|
247
|
+
ct.downcase! if ct.class == String
|
248
|
+
ct = ct.to_sym
|
249
|
+
ct = nil unless [:person, :organization].include? ct
|
250
|
+
@contact_type = ct
|
251
|
+
end
|
243
252
|
|
244
253
|
@nice_name = nil
|
245
254
|
@simple_name = nil
|
@@ -251,12 +260,6 @@ class NameTamer
|
|
251
260
|
@adfix_found = false
|
252
261
|
end
|
253
262
|
|
254
|
-
def set_contact_type contact_type
|
255
|
-
contact_type_sym = contact_type.to_sym
|
256
|
-
puts "Changing contact type of #{@name} from #{@contact_type} to #{contact_type}".red unless @contact_type.nil? || @contact_type == contact_type_sym
|
257
|
-
@contact_type = contact_type_sym
|
258
|
-
end
|
259
|
-
|
260
263
|
# If we don't know the contact type, what's our best guess?
|
261
264
|
def contact_type_best_effort
|
262
265
|
if @contact_type
|
@@ -275,23 +278,23 @@ class NameTamer
|
|
275
278
|
# We pass to this routine either prefixes or suffixes
|
276
279
|
def remove_outermost_adfix adfix_type, name_part
|
277
280
|
adfixes = ADFIX_PATTERNS[adfix_type]
|
278
|
-
|
279
|
-
parts = name_part.partition adfixes[
|
281
|
+
ct = contact_type_best_effort
|
282
|
+
parts = name_part.partition adfixes[ct]
|
280
283
|
@adfix_found = !parts[1].empty?
|
281
284
|
|
282
285
|
# If the contact type is indeterminate and we didn't find a diagnostic adfix
|
283
286
|
# for a person then try again for an organization
|
284
287
|
if @contact_type.nil?
|
285
288
|
unless @adfix_found
|
286
|
-
|
287
|
-
parts = name_part.partition adfixes[
|
289
|
+
ct = :organization
|
290
|
+
parts = name_part.partition adfixes[ct]
|
288
291
|
@adfix_found = !parts[1].empty?
|
289
292
|
end
|
290
293
|
end
|
291
294
|
|
292
295
|
if @adfix_found
|
293
296
|
# If we've found a diagnostic adfix then set the contact type
|
294
|
-
|
297
|
+
self.contact_type = ct
|
295
298
|
|
296
299
|
# The remainder of the name will be in parts[0] or parts[2] depending
|
297
300
|
# on whether this is a prefix or a suffix.
|
@@ -317,44 +320,44 @@ class NameTamer
|
|
317
320
|
# Improved in several areas, also now adds non-breaking spaces for
|
318
321
|
# compound names like "van der Pump"
|
319
322
|
def name_case lowercase
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
+
n = lowercase # We assume the name is passed already downcased
|
324
|
+
n.gsub!(/\b\w/) { |first| first.upcase }
|
325
|
+
n.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
|
323
326
|
|
324
327
|
# Our list of terminal characters that indicate a non-celtic name used
|
325
328
|
# to include o but we removed it because of MacMurdo.
|
326
|
-
if
|
327
|
-
|
329
|
+
if n =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or n =~ /\bMc/
|
330
|
+
n.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
|
328
331
|
|
329
332
|
# Fix Mac exceptions
|
330
333
|
[
|
331
334
|
'MacEdo', 'MacEvicius', 'MacHado', 'MacHar', 'MacHin', 'MacHlin', 'MacIas', 'MacIulis', 'MacKie', 'MacKle',
|
332
335
|
'MacKlin', 'MacKmin', 'MacKmurdo', 'MacQuarie', 'MacLise', 'MacKenzie'
|
333
|
-
].each { |mac_name|
|
336
|
+
].each { |mac_name| n.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
|
334
337
|
end
|
335
338
|
|
336
339
|
# Fix ff wierdybonks
|
337
340
|
[
|
338
341
|
'Fforbes', 'Fforde', 'Ffinch', 'Ffrench', 'Ffoulkes'
|
339
|
-
].each { |ff_name|
|
342
|
+
].each { |ff_name| n.gsub!(ff_name,ff_name.downcase) }
|
340
343
|
|
341
344
|
# Fixes for name modifiers followed by space
|
342
345
|
# Also replaces spaces with non-breaking spaces
|
343
346
|
NAME_MODIFIERS.each do |modifier|
|
344
|
-
|
347
|
+
n.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
|
345
348
|
end
|
346
349
|
|
347
350
|
# Fixes for name modifiers followed by an apostrophe, e.g. d'Artagnan, Commedia dell'Arte
|
348
351
|
['Dell', 'D'].each do |modifier|
|
349
|
-
|
352
|
+
n.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
|
350
353
|
end
|
351
354
|
|
352
355
|
# Upcase words with no vowels, e.g JPR Williams
|
353
|
-
|
356
|
+
n.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
|
354
357
|
# Except Ng
|
355
|
-
|
358
|
+
n.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
|
356
359
|
|
357
|
-
|
360
|
+
n
|
358
361
|
end
|
359
362
|
|
360
363
|
def parameterize string, args = {}
|
@@ -432,9 +435,14 @@ class NameTamer
|
|
432
435
|
FILTER_RFC3987 = /[^#{ISEGMENT_NZ_NC}]/
|
433
436
|
FILTER_COMPAT = /[^#{ALPHA}#{DIGIT}\-_#{UCSCHAR}]/
|
434
437
|
|
435
|
-
NAME_MODIFIERS
|
436
|
-
'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]',
|
437
|
-
|
438
|
+
NAME_MODIFIERS = [
|
439
|
+
'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]', 'V[ao]n', 'Of', 'St[\.]?'
|
440
|
+
]
|
441
|
+
|
442
|
+
COMPOUND_NAMES = [
|
443
|
+
'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore', 'Holmes à Court', 'Holmes a Court',
|
444
|
+
'Baron Cohen', 'Strang Steel',
|
445
|
+
'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
|
438
446
|
]
|
439
447
|
|
440
448
|
# These are the prefixes and suffixes we want to remove
|
@@ -497,10 +505,10 @@ class NameTamer
|
|
497
505
|
patterns = {}
|
498
506
|
adfix = ADFIXES[adfix_type]
|
499
507
|
|
500
|
-
[:person, :organization].each do |
|
501
|
-
with_optional_spaces = adfix[
|
508
|
+
[:person, :organization].each do |ct|
|
509
|
+
with_optional_spaces = adfix[ct].map { |p| p.gsub(ASCII_SPACE,' *') }
|
502
510
|
pattern_string = with_optional_spaces.join('|').gsub('.', '\.*')
|
503
|
-
patterns[
|
511
|
+
patterns[ct] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
|
504
512
|
end
|
505
513
|
|
506
514
|
ADFIX_PATTERNS[adfix_type] = patterns
|
data/lib/name-tamer/version.rb
CHANGED
data/spec/name_tamer_spec.rb
CHANGED
@@ -119,7 +119,12 @@ describe NameTamer do
|
|
119
119
|
{ n:'محمود ياسر', t: :organization, nn:'محمود ياسر', sn:'محمود ياسر', s:'محمود-ياسر' },
|
120
120
|
{ n:'קובי ביטר', t: :organization, nn:'קובי ביטר', sn:'קובי ביטר', s:'קובי-ביטר' },
|
121
121
|
{ n:'الملاك الحارس', t: :organization, nn:'الملاك الحارس', sn:'الملاك الحارس', s:'الملاك-الحارس' },
|
122
|
-
{ n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' }
|
122
|
+
{ n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' },
|
123
|
+
{ nn: '', sn: '', s: '' },
|
124
|
+
{ n:'Union Square Ventures', t: 'Organization', nn:'Union Square Ventures', sn:'Union Square Ventures', s:'union-square-ventures' },
|
125
|
+
{ n:'John Smith', t: 'Person', nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
126
|
+
{ n:'John Smith', t: :nonsense, nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
127
|
+
{ n:'John Smith', t: Kernel, nn:'John Smith', sn:'John Smith', s:'john-smith' },
|
123
128
|
]
|
124
129
|
end
|
125
130
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: name-tamer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Xenapto
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-
|
11
|
+
date: 2014-06-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|