name-tamer 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ec6a806fda32f1cde3963a72bb194491d7f34824
4
- data.tar.gz: 9adbe554327717b744cd8bb550f4c4c259f7f0a9
3
+ metadata.gz: 9bb03b1eb2ecf3657424b2eb6d15009143783799
4
+ data.tar.gz: cb539043cf2bad1f2ce258355fb5b12995078642
5
5
  SHA512:
6
- metadata.gz: ad2cf9d1f5b8f45234bb36d9e28f31a157ba2d3932ee8d3b503c8ca8f2671f30882bc9af107cfc75fa9c147646e6431f0b9712b398fb931ffa38c178abd3870a
7
- data.tar.gz: 2db7830058a83550a0ce4adb252621d4ff2dd3c1f9c65572cbb35ee490099f6bfc9cd58ac682e778c5eedebe126d76038e2bd36483fa42c71031a43dbf62cafc
6
+ metadata.gz: 5c004130e0b5cd5f6a14de3e061cf448c0c9a40081d0f76544ae9b7e0f7311661fc767f74b742b0f76ffa28c74d067584261ea04f4e50b270e62b6c0df369fc1
7
+ data.tar.gz: 99b9a308fc495e1c8ee25452d62f264af02ba301243cd9cb80ff9cfeeb2af024dd3da38d1f4cf79e7e6c0cbf306d62208a84cad688a50df4c7ac6f688972f567
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- name-tamer (0.0.2)
4
+ name-tamer (0.0.3)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/README.md CHANGED
@@ -28,14 +28,42 @@ Examples:
28
28
  NameTamer['Mr. John Q. Smith III, MD'].simple_name # => John Smith
29
29
  ```
30
30
 
31
+ Or you can create an instance if you need several versions of the name
32
+
31
33
  ```ruby
32
- name_tamer = NameTamer['Mr. John Q. Smith III, MD']
34
+ name_tamer = NameTamer.new 'Mr. John Q. Smith III, MD'
33
35
  name_tamer.slug # => john-smith
34
36
  name_tamer.nice_name # => John Q. Smith
37
+ name_tamer.contact_type # => :person
38
+ ```
39
+
40
+ NameTamer will make an intelligent guess at the type of the name but it's not infallible. NameTamer likes it if you tell it whether the name is a person or an organization:
41
+
42
+ ```ruby
43
+ name_tamer = NameTamer.new 'Di Doo Doo d.o.o.', contact_type: :organization
44
+ name_tamer.simple_name # => Di Doo Doo
35
45
  ```
36
46
 
37
47
  ## Contributing
38
48
 
49
+ There must be lots of name suffixes and prefixes that I haven't catered for, so please get in touch if `name-tamer` doesn't recognise one that you've found.
50
+
51
+ If there are any other common two-word family names that I've missed then please let me know. `name-tamer` tries to make sure Helena Bonham Carter gets slugified to `helena-bonham-carter` and not `helena-carter`, but I'm sure there are loads of two-word family names I don't know about.
52
+
53
+ Please read all the following articles before contributing:
54
+
55
+ * [Personal names around the world](http://www.w3.org/International/questions/qa-personal-names)
56
+ * [Namae (名前)](https://github.com/berkmancenter/namae)
57
+ * [Matts Name Parser](https://github.com/mericson/people)
58
+ * [Types of business entity](http://en.wikipedia.org/wiki/Types_of_business_entity)
59
+ * [List of professional designations in the United States](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA))
60
+ * [List of post-nominal letters (United Kingdom)](http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom))
61
+ * [Nobiliary particle](http://en.wikipedia.org/wiki/Nobiliary_particle)
62
+ * [Spanish naming customs](http://en.wikipedia.org/wiki/Spanish_naming_customs)
63
+ * [Unified style sheet for linguistics](http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf) [PDF]
64
+
65
+ ### How to contribute
66
+
39
67
  1. Fork it
40
68
  1. Create your feature branch (`git checkout -b my-new-feature`)
41
69
  1. Commit your changes (`git commit -am 'Add some feature'`)
@@ -3,7 +3,7 @@
3
3
  # References:
4
4
  # http://www.w3.org/International/questions/qa-personal-names
5
5
  # https://github.com/berkmancenter/namae
6
- # https://github.com/mericson
6
+ # https://github.com/mericson/people
7
7
  # http://en.wikipedia.org/wiki/Types_of_business_entity
8
8
  # http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(USA)
9
9
  # http://en.wikipedia.org/wiki/List_of_post-nominal_letters_(United_Kingdom)
@@ -12,7 +12,7 @@
12
12
  # http://linguistlist.org/pubs/tocs/JournalUnifiedStyleSheet2007.pdf [PDF]
13
13
 
14
14
  class NameTamer
15
- attr_reader :name, :contact_type
15
+ attr_reader :name
16
16
 
17
17
  class << self
18
18
  def [](name, args = {})
@@ -21,8 +21,8 @@ class NameTamer
21
21
  end
22
22
 
23
23
  def nice_name
24
- if @nice_name.nil?
25
- @nice_name = @name.dup # Start with the name we've received
24
+ unless @nice_name
25
+ @nice_name = name.dup # Start with the name we've received
26
26
 
27
27
  tidy_spacing # " John Smith " -> "John Smith"
28
28
  consolidate_initials # "I. B. M." -> "I.B.M."
@@ -38,7 +38,7 @@ class NameTamer
38
38
  end
39
39
 
40
40
  def simple_name
41
- if @simple_name.nil?
41
+ unless @simple_name
42
42
  @simple_name = nice_name.dup # Start with nice name
43
43
 
44
44
  remove_initials # "John Q. Doe" -> "John Doe"
@@ -53,7 +53,7 @@ class NameTamer
53
53
  end
54
54
 
55
55
  def slug
56
- if @slug.nil?
56
+ unless @slug
57
57
  @slug = simple_name.dup # Start with search name
58
58
  slugify # "John Doe" -> "john-doe"
59
59
  end
@@ -66,25 +66,31 @@ class NameTamer
66
66
  contact_type_best_effort
67
67
  end
68
68
 
69
+ def contact_type= new_contact_type
70
+ ct_as_sym = new_contact_type.to_sym
71
+
72
+ unless @contact_type.nil? || @contact_type == ct_as_sym
73
+ puts "Changing contact type of #{@name} from #{@contact_type} to #{new_contact_type}"
74
+ end
75
+
76
+ @contact_type = ct_as_sym
77
+ end
78
+
69
79
  =begin These lines aren't used and aren't covered by specs
70
80
  def name=(new_name)
71
81
  initialize new_name, :contact_type => @contact_type
72
82
  end
73
83
 
74
- def contact_type=(new_contact_type)
75
- initialize @name, :contact_type => new_contact_type
76
- end
77
-
78
84
  def to_hash
79
85
  {
80
- name: @name,
81
- nice_name: @nice_name,
82
- simple_name: @simple_name,
83
- slug: @slug,
84
- contact_type: @contact_type,
85
- last_name: @last_name,
86
- remainder: @remainder,
87
- adfix_found: @adfix_found
86
+ name: name,
87
+ nice_name: nice_name,
88
+ simple_name: simple_name,
89
+ slug: slug,
90
+ contact_type: contact_type,
91
+ last_name: last_name,
92
+ remainder: remainder,
93
+ adfix_found: adfix_found
88
94
  }
89
95
  end
90
96
  =end
@@ -98,7 +104,6 @@ class NameTamer
98
104
  def tidy_spacing
99
105
  @nice_name.gsub!(/,\s*/, ', ') # Ensure commas have exactly one space after them
100
106
  @nice_name.strip! # remove leading & trailing whitespace
101
-
102
107
  @nice_name = ensure_whitespace_is_ascii_space @nice_name
103
108
  end
104
109
 
@@ -176,11 +181,7 @@ class NameTamer
176
181
  # Conjoin compound names with non-breaking spaces
177
182
  def use_nonbreaking_spaces_in_compound_names
178
183
  # Fix known last names that have spaces (not hyphens!)
179
- [
180
- 'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore',
181
- 'Holmes à Court', 'Holmes a Court', 'Baron Cohen',
182
- 'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
183
- ].each do |compound_name|
184
+ COMPOUND_NAMES.each do |compound_name|
184
185
  @nice_name.gsub!(compound_name, compound_name.tr(ASCII_SPACE, NONBREAKING_SPACE))
185
186
  end
186
187
 
@@ -197,10 +198,10 @@ class NameTamer
197
198
  # i.e. only remove initials if there's also a proper name there
198
199
  def remove_initials
199
200
  if @contact_type == :person
200
- name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
201
+ temp_name = @simple_name.gsub(/\b([a-z](?:\.*\s+|\.))/i, '')
201
202
 
202
203
  # If the name still has at least one space we're OK
203
- @simple_name = name if name.include?(ASCII_SPACE)
204
+ @simple_name = temp_name if temp_name.include?(ASCII_SPACE)
204
205
  end
205
206
  end
206
207
 
@@ -237,9 +238,17 @@ class NameTamer
237
238
  # Initialization and utilities
238
239
  #--------------------------------------------------------
239
240
 
240
- def initialize(name, args = {})
241
- @name = name || ''
242
- @contact_type = args[:contact_type].to_sym unless args[:contact_type].nil?
241
+ def initialize(new_name, args = {})
242
+ @name = new_name || ''
243
+
244
+ if args[:contact_type]
245
+ ct = args[:contact_type]
246
+ ct = ct.to_s unless [String, Symbol].include? ct.class
247
+ ct.downcase! if ct.class == String
248
+ ct = ct.to_sym
249
+ ct = nil unless [:person, :organization].include? ct
250
+ @contact_type = ct
251
+ end
243
252
 
244
253
  @nice_name = nil
245
254
  @simple_name = nil
@@ -251,12 +260,6 @@ class NameTamer
251
260
  @adfix_found = false
252
261
  end
253
262
 
254
- def set_contact_type contact_type
255
- contact_type_sym = contact_type.to_sym
256
- puts "Changing contact type of #{@name} from #{@contact_type} to #{contact_type}".red unless @contact_type.nil? || @contact_type == contact_type_sym
257
- @contact_type = contact_type_sym
258
- end
259
-
260
263
  # If we don't know the contact type, what's our best guess?
261
264
  def contact_type_best_effort
262
265
  if @contact_type
@@ -275,23 +278,23 @@ class NameTamer
275
278
  # We pass to this routine either prefixes or suffixes
276
279
  def remove_outermost_adfix adfix_type, name_part
277
280
  adfixes = ADFIX_PATTERNS[adfix_type]
278
- contact_type = contact_type_best_effort
279
- parts = name_part.partition adfixes[contact_type]
281
+ ct = contact_type_best_effort
282
+ parts = name_part.partition adfixes[ct]
280
283
  @adfix_found = !parts[1].empty?
281
284
 
282
285
  # If the contact type is indeterminate and we didn't find a diagnostic adfix
283
286
  # for a person then try again for an organization
284
287
  if @contact_type.nil?
285
288
  unless @adfix_found
286
- contact_type = :organization
287
- parts = name_part.partition adfixes[contact_type]
289
+ ct = :organization
290
+ parts = name_part.partition adfixes[ct]
288
291
  @adfix_found = !parts[1].empty?
289
292
  end
290
293
  end
291
294
 
292
295
  if @adfix_found
293
296
  # If we've found a diagnostic adfix then set the contact type
294
- set_contact_type contact_type
297
+ self.contact_type = ct
295
298
 
296
299
  # The remainder of the name will be in parts[0] or parts[2] depending
297
300
  # on whether this is a prefix or a suffix.
@@ -317,44 +320,44 @@ class NameTamer
317
320
  # Improved in several areas, also now adds non-breaking spaces for
318
321
  # compound names like "van der Pump"
319
322
  def name_case lowercase
320
- name = lowercase # We assume the name is passed already downcased
321
- name.gsub!(/\b\w/) { |first| first.upcase }
322
- name.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
323
+ n = lowercase # We assume the name is passed already downcased
324
+ n.gsub!(/\b\w/) { |first| first.upcase }
325
+ n.gsub!(/\'\w\b/) { |c| c.downcase } # Lowercase 's
323
326
 
324
327
  # Our list of terminal characters that indicate a non-celtic name used
325
328
  # to include o but we removed it because of MacMurdo.
326
- if name =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or name =~ /\bMc/
327
- name.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
329
+ if n =~ /\bMac[A-Za-z]{2,}[^acizj]\b/ or n =~ /\bMc/
330
+ n.gsub!(/\b(Ma?c)([A-Za-z]+)/) { |match| $1 + $2.capitalize }
328
331
 
329
332
  # Fix Mac exceptions
330
333
  [
331
334
  'MacEdo', 'MacEvicius', 'MacHado', 'MacHar', 'MacHin', 'MacHlin', 'MacIas', 'MacIulis', 'MacKie', 'MacKle',
332
335
  'MacKlin', 'MacKmin', 'MacKmurdo', 'MacQuarie', 'MacLise', 'MacKenzie'
333
- ].each { |mac_name| name.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
336
+ ].each { |mac_name| n.gsub!(/\b#{mac_name}/, mac_name.capitalize) }
334
337
  end
335
338
 
336
339
  # Fix ff wierdybonks
337
340
  [
338
341
  'Fforbes', 'Fforde', 'Ffinch', 'Ffrench', 'Ffoulkes'
339
- ].each { |ff_name| name.gsub!(ff_name,ff_name.downcase) }
342
+ ].each { |ff_name| n.gsub!(ff_name,ff_name.downcase) }
340
343
 
341
344
  # Fixes for name modifiers followed by space
342
345
  # Also replaces spaces with non-breaking spaces
343
346
  NAME_MODIFIERS.each do |modifier|
344
- name.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
347
+ n.gsub!(/((?:[[:space:]]|^)#{modifier})(\s+|-)/) { |match| "#{$1.rstrip.downcase}#{$2.tr(ASCII_SPACE, NONBREAKING_SPACE)}" }
345
348
  end
346
349
 
347
350
  # Fixes for name modifiers followed by an apostrophe, e.g. d'Artagnan, Commedia dell'Arte
348
351
  ['Dell', 'D'].each do |modifier|
349
- name.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
352
+ n.gsub!(/(.#{modifier}')(\w)/) { |match| "#{$1.rstrip.downcase}#{$2}" }
350
353
  end
351
354
 
352
355
  # Upcase words with no vowels, e.g JPR Williams
353
- name.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
356
+ n.gsub!(/\b([bcdfghjklmnpqrstvwxz]+)\b/i) { |match| $1.upcase }
354
357
  # Except Ng
355
- name.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
358
+ n.gsub!(/\b(NG)\b/i) { |match| $1.capitalize } # http://en.wikipedia.org/wiki/Ng
356
359
 
357
- name
360
+ n
358
361
  end
359
362
 
360
363
  def parameterize string, args = {}
@@ -432,9 +435,14 @@ class NameTamer
432
435
  FILTER_RFC3987 = /[^#{ISEGMENT_NZ_NC}]/
433
436
  FILTER_COMPAT = /[^#{ALPHA}#{DIGIT}\-_#{UCSCHAR}]/
434
437
 
435
- NAME_MODIFIERS = [
436
- 'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]',
437
- 'V[ao]n', 'Of', 'St[\.]?'
438
+ NAME_MODIFIERS = [
439
+ 'Al', 'Ap', 'Ben', 'Dell[ae]', 'D[aeiou]', 'De[lr]', 'D[ao]s', 'El', 'La', 'L[eo]', 'V[ao]n', 'Of', 'St[\.]?'
440
+ ]
441
+
442
+ COMPOUND_NAMES = [
443
+ 'Lane Fox', 'Bonham Carter', 'Pitt Rivers', 'Lloyd Webber', 'Sebag Montefiore', 'Holmes à Court', 'Holmes a Court',
444
+ 'Baron Cohen', 'Strang Steel',
445
+ 'Service Company', 'Corporation Company', 'Corporation System', 'Incorporations Limited'
438
446
  ]
439
447
 
440
448
  # These are the prefixes and suffixes we want to remove
@@ -497,10 +505,10 @@ class NameTamer
497
505
  patterns = {}
498
506
  adfix = ADFIXES[adfix_type]
499
507
 
500
- [:person, :organization].each do |contact_type|
501
- with_optional_spaces = adfix[contact_type].map { |p| p.gsub(ASCII_SPACE,' *') }
508
+ [:person, :organization].each do |ct|
509
+ with_optional_spaces = adfix[ct].map { |p| p.gsub(ASCII_SPACE,' *') }
502
510
  pattern_string = with_optional_spaces.join('|').gsub('.', '\.*')
503
- patterns[contact_type] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
511
+ patterns[ct] = /#{adfix[:before]}\(*(?:#{pattern_string})\)*#{adfix[:after]}/i
504
512
  end
505
513
 
506
514
  ADFIX_PATTERNS[adfix_type] = patterns
@@ -1,3 +1,3 @@
1
1
  class NameTamer
2
- VERSION = "0.0.3"
2
+ VERSION = "0.0.4"
3
3
  end
@@ -119,7 +119,12 @@ describe NameTamer do
119
119
  { n:'محمود ياسر', t: :organization, nn:'محمود ياسر', sn:'محمود ياسر', s:'محمود-ياسر' },
120
120
  { n:'קובי ביטר', t: :organization, nn:'קובי ביטר', sn:'קובי ביטר', s:'קובי-ביטר' },
121
121
  { n:'الملاك الحارس', t: :organization, nn:'الملاك الحارس', sn:'الملاك الحارس', s:'الملاك-الحارس' },
122
- { n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' }
122
+ { n:'কবির হাসান', t: :organization, nn:'কবির হাসান', sn:'কবির হাসান', s:'কবির-হাসান' },
123
+ { nn: '', sn: '', s: '' },
124
+ { n:'Union Square Ventures', t: 'Organization', nn:'Union Square Ventures', sn:'Union Square Ventures', s:'union-square-ventures' },
125
+ { n:'John Smith', t: 'Person', nn:'John Smith', sn:'John Smith', s:'john-smith' },
126
+ { n:'John Smith', t: :nonsense, nn:'John Smith', sn:'John Smith', s:'john-smith' },
127
+ { n:'John Smith', t: Kernel, nn:'John Smith', sn:'John Smith', s:'john-smith' },
123
128
  ]
124
129
  end
125
130
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: name-tamer
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Xenapto
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-31 00:00:00.000000000 Z
11
+ date: 2014-06-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler