name-tamer 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ec6a806fda32f1cde3963a72bb194491d7f34824
4
- data.tar.gz: 9adbe554327717b744cd8bb550f4c4c259f7f0a9
3
+ metadata.gz: 9bb03b1eb2ecf3657424b2eb6d15009143783799
4
+ data.tar.gz: cb539043cf2bad1f2ce258355fb5b12995078642
5
5
  SHA512:
6
- metadata.gz: ad2cf9d1f5b8f45234bb36d9e28f31a157ba2d3932ee8d3b503c8ca8f2671f30882bc9af107cfc75fa9c147646e6431f0b9712b398fb931ffa38c178abd3870a
7
- data.tar.gz: 2db7830058a83550a0ce4adb252621d4ff2dd3c1f9c65572cbb35ee490099f6bfc9cd58ac682e778c5eedebe126d76038e2bd36483fa42c71031a43dbf62cafc
6
+ metadata.gz: 5c004130e0b5cd5f6a14de3e061cf448c0c9a40081d0f76544ae9b7e0f7311661fc767f74b742b0f76ffa28c74d067584261ea04f4e50b270e62b6c0df369fc1
7
+ data.tar.gz: 99b9a308fc495e1c8ee25452d62f264af02ba301243cd9cb80ff9cfeeb2af024dd3da38d1f4cf79e7e6c0cbf306d62208a84cad688a50df4c7ac6f688972f567
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- name-tamer (0.0.2)
4
+ name-tamer (0.0.3)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -28,14 +28,42 @@ Examples:
28
28
  NameTamer['Mr. John Q. Smith III, MD'].simple_name # => John Smith
29
29
  ```
30
30
 
31
+ Or you can create an instance if you need several versions of the name
32
+
31
33
  ```ruby
32
- name_tamer = NameTamer['Mr. John Q. Smith III, MD']
34
+ name_tamer = NameTamer.new 'Mr. John Q. Smith III, MD'
33
35
  name_tamer.slug # => john-smith
34
36
  name_tamer.nice_name # => John Q. Smith
37
+ name_tamer.contact_type # => :person
38
+ ```
39
+
40
+ NameTamer will make an intelligent guess at the type of the name but it's not infallible. NameTamer likes it if you tell it whether the name is a person or an organization:
41
+
42
+ ```ruby
43
+ name_tamer = NameTamer.new 'Di Doo Doo d.o.o.', contact_type: :organization
44
+ name_tamer.simple_name # => Di Doo Doo
35
45
  ```
36
46
 
37
47
  ## Contributing
38
48
 
49
+ There must be lots of name suffixes and prefixes that I haven't catered for, so please get in touch if `name-tamer` doesn't recognise one that you've found.
50
+
51
+ If there are any other common two-word family names that I've missed then please let me know. `name-tamer` tries to make sure Helena Bonham Carter gets slugified to `helena-bonham-carter` and not `helena-carter`, but I'm sure there are loads of two-word family names I don't know about.
52
+
53
+ Please read all the following articles before contributing:
54
+
55
+ * [Personal names around the world](http://www.w3.org/International/questions/qa-personal-names)
56
+ * [Namae (名前)](https://github.com/berkmancenter/namae)
57
+ * [Matts Name Parser](https://github.com/mericson/people)
58
+ * [Types of business entity](http://en.wikipedia.org/wiki/Types_of_business_entity)
59
+ * [List of professional designations in the United States](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA))
60
+ * [List of post-nominal letters (United Kingdom)](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom))
61
+ * [Nobiliary particle](http://en.wikipedia.org/wiki/Nobiliary_particle)
62
+ * [Spanish naming customs](http://en.wikipedia.org/wiki/Spanish_naming_customs)
63
+ * [Unified style sheet for linguistics](http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf) [PDF]
64
+
65
+ ### How to contribute
66
+
39
67
  1. Fork it
40
68
  1. Create your feature branch (`git checkout -b my-new-feature`)
41
69
  1. Commit your changes (`git commit -am 'Add some feature'`)
@@ -3,7 +3,7 @@
3
3
  # References:
4
4
  # http://www.w3.org/International/questions/qa-personal-names
5
5
  # https://github.com/berkmancenter/namae
6
- # https://github.com/mericson
6
+ # https://github.com/mericson/people
7
7
  # http://en.wikipedia.org/wiki/Types_of_business_entity
8
8
  # http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA)
9
9
  # http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom)
@@ -12,7 +12,7 @@
12
12
  # http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf [PDF]
13
13
 
14
14
  class NameTamer
15
- attr_reader :name, :contact_type
15
+ attr_reader :name
16
16
 
17
17
  class << self
18
18
  def [](name, args = {})
@@ -21,8 +21,8 @@ class NameTamer
21
21
  end
22
22
 
23
23
  def nice_name
24
- if @nice_name.nil?
25
- @nice_name = @name.dup # Start with the name we've received
24
+ unless @nice_name
25
+ @nice_name = name.dup # Start with the name we've received
26
26
 
27
27
  tidy_spacing # " John Smith " -> "John Smith"
28
28
  consolidate_initials # "I. B. M." -> "I.B.M."
@@ -38,7 +38,7 @@ class NameTamer
38
38
  end
39
39
 
40
40
  def simple_name
41
- if @simple_name.nil?
41
+ unless @simple_name
42
42
  @simple_name = nice_name.dup # Start with nice name
43
43
 
44
44
  remove_initials # "John Q. Doe" -> "John Doe"
@@ -53,7 +53,7 @@ class NameTamer
53
53
  end
54
54
 
55
55
  def slug
56
- if @slug.nil?
56
+ unless @slug
57
57
  @slug = simple_name.dup # Start with search name
58
58
  slugify # "John Doe" -> "john-doe"
59
59
  end
@@ -66,25 +66,31 @@ class NameTamer
66
66
  contact_type_best_effort
67
67
  end
68
68
 
69
+ def contact_type= new_contact_type
70
+ ct_as_sym = new_contact_type.to_sym
71
+
72
+ unless @contact_type.nil? || @contact_type == ct_as_sym
73
+ puts "Changing contact type of #{@name} from #{@contact_type} to #{new_contact_type}"
74
+ end
75
+
76
+ @contact_type = ct_as_sym
77
+ end
78
+
69
79
  =begin These lines aren't used and aren't covered by specs
70
80
  def name=(new_name)
71
81
  initialize new_name, :contact_type => @contact_type
72
82
  end
73
83
 
74
- def contact_type=(new_contact_type)
75
- initialize @name, :contact_type => new_contact_type
76
- end
77
-
78
84
  def to_hash
79
85
  {
80
- name: @name,
81
- nice_name: @nice_name,
82
- simple_name: @simple_name,
83
- slug: @slug,
84
- contact_type: @contact_type,
85
- last_name: @last_name,
86
- remainder: @remainder,
87
- adfix_found: @adfix_found
86
+ name: name,
87
+ nice_name: nice_name,
88
+ simple_name: simple_name,
89
+ slug: slug,
90
+ contact_type: contact_type,
91
+ last_name: last_name,
92
+ remainder: remainder,
93
+ adfix_found: adfix_found
88
94
  }
89
95
  end
90
96
  =end
@@ -98,7 +104,6 @@ class NameTamer
98
104
  def tidy_spacing
99
105
  @nice_name.gsub!(/,\s*/, ', ') # Ensure commas have exactly one space after them
100
106
  @nice_name.strip! # remove leading & trailing whitespace
101
-
102
107
  @nice_name = ensure_whitespace_is_ascii_space @nice_name
103
108
  end
104
109
 
@@ -176,11 +181,7 @@ class NameTamer
176
181
  # Conjoin compound names with non-breaking spaces
177
182
  def use_nonbreaking_spaces_in_compound_names
178
183
  # Fix known last names that have spaces (not hyphens!)
179
- [
180
- 'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore',
181
- 'Holmes à Court', 'Holmes a Court', 'Baron Cohen',
182
- 'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
183
- ].each do |compound_name|
184
+ COMPOUND_NAMES.each do |compound_name|
184
185
  @nice_name.gsub!(compound_name, compound_name.tr(ASCII_SPACE, NONBREAKING_SPACE))
185
186
  end
186
187
 
@@ -197,10 +198,10 @@ class NameTamer
197
198
  # i.e. only remove initials if there's also a proper name there
198
199
  def remove_initials
199
200
  if @contact_type == :person
200
- name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
201
+ temp_name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
201
202
 
202
203
  # If the name still has at least one space we're OK
203
- @simple_name = name if name.include?(ASCII_SPACE)
204
+ @simple_name = temp_name if temp_name.include?(ASCII_SPACE)
204
205
  end
205
206
  end
206
207
 
@@ -237,9 +238,17 @@ class NameTamer
237
238
  # Initialization and utilities
238
239
  #--------------------------------------------------------
239
240
 
240
- def initialize(name, args = {})
241
- @name = name || ''
242
- @contact_type = args[:contact_type].to_sym unless args[:contact_type].nil?
241
+ def initialize(new_name, args = {})
242
+ @name = new_name || ''
243
+
244
+ if args[:contact_type]
245
+ ct = args[:contact_type]
246
+ ct = ct.to_s unless [String, Symbol].include? ct.class
247
+ ct.downcase! if ct.class == String
248
+ ct = ct.to_sym
249
+ ct = nil unless [:person, :organization].include? ct
250
+ @contact_type = ct
251
+ end
243
252
 
244
253
  @nice_name = nil
245
254
  @simple_name = nil
@@ -251,12 +260,6 @@ class NameTamer
251
260
  @adfix_found = false
252
261
  end
253
262
 
254
- def set_contact_type contact_type
255
- contact_type_sym = contact_type.to_sym
256
- puts "Changing contact type of #{@name} from #{@contact_type} to #{contact_type}".red unless @contact_type.nil? || @contact_type == contact_type_sym
257
- @contact_type = contact_type_sym
258
- end
259
-
260
263
  # If we don't know the contact type, what's our best guess?
261
264
  def contact_type_best_effort
262
265
  if @contact_type
@@ -275,23 +278,23 @@ class NameTamer
275
278
  # We pass to this routine either prefixes or suffixes
276
279
  def remove_outermost_adfix adfix_type, name_part
277
280
  adfixes = ADFIX_PATTERNS[adfix_type]
278
- contact_type = contact_type_best_effort
279
- parts = name_part.partition adfixes[contact_type]
281
+ ct = contact_type_best_effort
282
+ parts = name_part.partition adfixes[ct]
280
283
  @adfix_found = !parts[1].empty?
281
284
 
282
285
  # If the contact type is indeterminate and we didn't find a diagnostic adfix
283
286
  # for a person then try again for an organization
284
287
  if @contact_type.nil?
285
288
  unless @adfix_found
286
- contact_type = :organization
287
- parts = name_part.partition adfixes[contact_type]
289
+ ct = :organization
290
+ parts = name_part.partition adfixes[ct]
288
291
  @adfix_found = !parts[1].empty?
289
292
  end
290
293
  end
291
294
 
292
295
  if @adfix_found
293
296
  # If we've found a diagnostic adfix then set the contact type
294
- set_contact_type contact_type
297
+ self.contact_type = ct
295
298
 
296
299
  # The remainder of the name will be in parts[0] or parts[2] depending
297
300
  # on whether this is a prefix or a suffix.
@@ -317,44 +320,44 @@ class NameTamer
317
320
  # Improved in several areas, also now adds non-breaking spaces for
318
321
  # compound names like "van der Pump"
319
322
  def name_case lowercase
320
- name = lowercase # We assume the name is passed already downcased
321
- name.gsub!(/\b\w/) { |first| first.upcase }
322
- name.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
323
+ n = lowercase # We assume the name is passed already downcased
324
+ n.gsub!(/\b\w/) { |first| first.upcase }
325
+ n.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
323
326
 
324
327
  # Our list of terminal characters that indicate a non-celtic name used
325
328
  # to include o but we removed it because of MacMurdo.
326
- if name =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or name =~ /\bMc/
327
- name.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
329
+ if n =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or n =~ /\bMc/
330
+ n.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
328
331
 
329
332
  # Fix Mac exceptions
330
333
  [
331
334
  'MacEdo', 'MacEvicius', 'MacHado', 'MacHar', 'MacHin', 'MacHlin', 'MacIas', 'MacIulis', 'MacKie', 'MacKle',
332
335
  'MacKlin', 'MacKmin', 'MacKmurdo', 'MacQuarie', 'MacLise', 'MacKenzie'
333
- ].each { |mac_name| name.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
336
+ ].each { |mac_name| n.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
334
337
  end
335
338
 
336
339
  # Fix ff wierdybonks
337
340
  [
338
341
  'Fforbes', 'Fforde', 'Ffinch', 'Ffrench', 'Ffoulkes'
339
- ].each { |ff_name| name.gsub!(ff_name,ff_name.downcase) }
342
+ ].each { |ff_name| n.gsub!(ff_name,ff_name.downcase) }
340
343
 
341
344
  # Fixes for name modifiers followed by space
342
345
  # Also replaces spaces with non-breaking spaces
343
346
  NAME_MODIFIERS.each do |modifier|
344
- name.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
347
+ n.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
345
348
  end
346
349
 
347
350
  # Fixes for name modifiers followed by an apostrophe, e.g. d'Artagnan, Commedia dell'Arte
348
351
  ['Dell', 'D'].each do |modifier|
349
- name.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
352
+ n.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
350
353
  end
351
354
 
352
355
  # Upcase words with no vowels, e.g JPR Williams
353
- name.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
356
+ n.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
354
357
  # Except Ng
355
- name.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
358
+ n.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
356
359
 
357
- name
360
+ n
358
361
  end
359
362
 
360
363
  def parameterize string, args = {}
@@ -432,9 +435,14 @@ class NameTamer
432
435
  FILTER_RFC3987 = /[^#{ISEGMENT_NZ_NC}]/
433
436
  FILTER_COMPAT = /[^#{ALPHA}#{DIGIT}\-_#{UCSCHAR}]/
434
437
 
435
- NAME_MODIFIERS = [
436
- 'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]',
437
- 'V[ao]n', 'Of', 'St[\.]?'
438
+ NAME_MODIFIERS = [
439
+ 'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]', 'V[ao]n', 'Of', 'St[\.]?'
440
+ ]
441
+
442
+ COMPOUND_NAMES = [
443
+ 'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore', 'Holmes à Court', 'Holmes a Court',
444
+ 'Baron Cohen', 'Strang Steel',
445
+ 'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
438
446
  ]
439
447
 
440
448
  # These are the prefixes and suffixes we want to remove
@@ -497,10 +505,10 @@ class NameTamer
497
505
  patterns = {}
498
506
  adfix = ADFIXES[adfix_type]
499
507
 
500
- [:person, :organization].each do |contact_type|
501
- with_optional_spaces = adfix[contact_type].map { |p| p.gsub(ASCII_SPACE,' *') }
508
+ [:person, :organization].each do |ct|
509
+ with_optional_spaces = adfix[ct].map { |p| p.gsub(ASCII_SPACE,' *') }
502
510
  pattern_string = with_optional_spaces.join('|').gsub('.', '\.*')
503
- patterns[contact_type] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
511
+ patterns[ct] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
504
512
  end
505
513
 
506
514
  ADFIX_PATTERNS[adfix_type] = patterns
@@ -1,3 +1,3 @@
1
1
  class NameTamer
2
- VERSION = "0.0.3"
2
+ VERSION = "0.0.4"
3
3
  end
@@ -119,7 +119,12 @@ describe NameTamer do
119
119
  { n:'محمود ياسر', t: :organization, nn:'محمود ياسر', sn:'محمود ياسر', s:'محمود-ياسر' },
120
120
  { n:'קובי ביטר', t: :organization, nn:'קובי ביטר', sn:'קובי ביטר', s:'קובי-ביטר' },
121
121
  { n:'الملاك الحارس', t: :organization, nn:'الملاك الحارس', sn:'الملاك الحارس', s:'الملاك-الحارس' },
122
- { n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' }
122
+ { n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' },
123
+ { nn: '', sn: '', s: '' },
124
+ { n:'Union Square Ventures', t: 'Organization', nn:'Union Square Ventures', sn:'Union Square Ventures', s:'union-square-ventures' },
125
+ { n:'John Smith', t: 'Person', nn:'John Smith', sn:'John Smith', s:'john-smith' },
126
+ { n:'John Smith', t: :nonsense, nn:'John Smith', sn:'John Smith', s:'john-smith' },
127
+ { n:'John Smith', t: Kernel, nn:'John Smith', sn:'John Smith', s:'john-smith' },
123
128
  ]
124
129
  end
125
130
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: name-tamer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Xenapto
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-31 00:00:00.000000000 Z
11
+ date: 2014-06-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler