nomener 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +21 -0
- data/Gemfile +3 -0
- data/LICENSE.txt +22 -0
- data/README.md +71 -0
- data/Rakefile +1 -0
- data/lib/nomener/compounders.rb +73 -0
- data/lib/nomener/helper.rb +35 -0
- data/lib/nomener/name.rb +144 -0
- data/lib/nomener/parser.rb +141 -0
- data/lib/nomener/suffixes.rb +33 -0
- data/lib/nomener/titles.rb +100 -0
- data/lib/nomener/version.rb +3 -0
- data/lib/nomener.rb +16 -0
- data/nomener.gemspec +27 -0
- data/spec/nomener/complex_spec.rb +23 -0
- data/spec/nomener/names_spec.rb +24 -0
- data/spec/nomener/nomener_componders_spec.rb +7 -0
- data/spec/nomener/nomener_helper_spec.rb +18 -0
- data/spec/nomener/nomener_name_spec.rb +112 -0
- data/spec/nomener/nomener_parser_spec.rb +110 -0
- data/spec/nomener/nomener_spec.rb +31 -0
- data/spec/nomener/nomener_suffixes_spec.rb +7 -0
- data/spec/nomener/nomener_titles_spec.rb +7 -0
- data/spec/nomener/titles_spec.rb +224 -0
- data/spec/spec_helper.rb +14 -0
- metadata +136 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA1:
|
|
3
|
+
metadata.gz: 176b7021c647d20b310c915afe0ad2ba591ef2b1
|
|
4
|
+
data.tar.gz: 72632691751109bf8a34e5ea2c96ced5178d80eb
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 5371ba96a7d39ba43d6779c9d0a1cd019d4fa81ef2456f8479ceee404d0dc94cd86076fc1ea36cd2aada1759fe96c27466bcc7ba0dcc18f2b9bcab3545b9486c
|
|
7
|
+
data.tar.gz: d3e867f148e23a8be082534509b5c0c00509d14d8f00ddc474601aba5f11bb0ce606091f44ff1cc565b3aae078cac76579e1bc5aa42bc61fc4938ea11e547e3e
|
data/.gitignore
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
*.gem
|
|
2
|
+
*.rbc
|
|
3
|
+
.rspec
|
|
4
|
+
.bundle
|
|
5
|
+
.config
|
|
6
|
+
.yardoc
|
|
7
|
+
.DS_Store
|
|
8
|
+
.rvmrc
|
|
9
|
+
Gemfile.lock
|
|
10
|
+
InstalledFiles
|
|
11
|
+
_yardoc
|
|
12
|
+
coverage
|
|
13
|
+
doc/
|
|
14
|
+
lib/bundler/man
|
|
15
|
+
pkg
|
|
16
|
+
rdoc
|
|
17
|
+
spec/reports
|
|
18
|
+
test/tmp
|
|
19
|
+
test/version_tmp
|
|
20
|
+
tmp
|
|
21
|
+
*.sublime*
|
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
Copyright (c) 2015 Dante Piombino
|
|
2
|
+
|
|
3
|
+
MIT License
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
|
6
|
+
a copy of this software and associated documentation files (the
|
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
|
11
|
+
the following conditions:
|
|
12
|
+
|
|
13
|
+
The above copyright notice and this permission notice shall be
|
|
14
|
+
included in all copies or substantial portions of the Software.
|
|
15
|
+
|
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# Nomener
|
|
2
|
+
|
|
3
|
+
Nomener assists with parsing peoples names that they give themselves (or other people). Nomener is a fork of [People](https://github.com/dan-ding/people) as it uses some code contributed there. It's currently geared towards western style name formatting, however other cultural name formatting is (or would like to be supported).Currently it attempts to parse names through pattern matching without using dictionary/library/data files (except for name decorations and suffixes, see usage). It may not be possible to do without such in all languages.
|
|
4
|
+
|
|
5
|
+
If you didn't know, parsing names can be much more difficult than it seems it should be.
|
|
6
|
+
|
|
7
|
+
## Requirements
|
|
8
|
+
|
|
9
|
+
Requires Ruby 2.1 or higher (or equivalent).
|
|
10
|
+
To use with 1.9 or 2.0 you'll need to install either [https://github.com/hsbt/string-scrub](string-scrub) or [https://github.com/jrochkind/scrub_rb](scrub_rb).
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
Add this line to your application's Gemfile:
|
|
15
|
+
|
|
16
|
+
gem 'nomener'
|
|
17
|
+
|
|
18
|
+
And then execute:
|
|
19
|
+
|
|
20
|
+
$ bundle
|
|
21
|
+
|
|
22
|
+
Or install it yourself as:
|
|
23
|
+
|
|
24
|
+
$ gem install nomener
|
|
25
|
+
|
|
26
|
+
## Basic Usage
|
|
27
|
+
|
|
28
|
+
Use Nomener directly:
|
|
29
|
+
```ruby
|
|
30
|
+
name = Nomener.parse "Joe Smith" # <Nomener::Name first="Joe" last="Smith">
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Create a new instance:
|
|
34
|
+
```ruby
|
|
35
|
+
name = Nomener::Name.new "Joe Smith" # <Nomener::Name >
|
|
36
|
+
name.parse # <Nomener::Name first="Joe" last="Smith">
|
|
37
|
+
name.first # Joe
|
|
38
|
+
name.name # Joe Smith
|
|
39
|
+
"Hi #{name}!" # Hi Joe Smith!
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## TODO
|
|
43
|
+
* optionally use web service api data to assist (and create the web service!)
|
|
44
|
+
* fantasy prefixes/suffixes
|
|
45
|
+
* multiple names from one string
|
|
46
|
+
* specifying formats to parse by
|
|
47
|
+
* many other things
|
|
48
|
+
* better non-english support
|
|
49
|
+
|
|
50
|
+
## References
|
|
51
|
+
* [http://en.wikipedia.org/wiki/Personal_name](http://en.wikipedia.org/wiki/Personal_name)
|
|
52
|
+
* [http://en.wikipedia.org/wiki/Surname](http://en.wikipedia.org/wiki/Surname)
|
|
53
|
+
* [http://en.wikipedia.org/wiki/Title](http://en.wikipedia.org/wiki/Title)
|
|
54
|
+
* [http://www.w3.org/International/questions/qa-personal-names](http://www.w3.org/International/questions/qa-personal-names)
|
|
55
|
+
* [http://heraldry.sca.org/titles.html](http://heraldry.sca.org/titles.html)
|
|
56
|
+
|
|
57
|
+
## Contributing
|
|
58
|
+
|
|
59
|
+
1. Fork it ( http://github.com/<my-github-username>/nomener/fork )
|
|
60
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
|
61
|
+
3. Ensure adequate tests (rspec) on your branch
|
|
62
|
+
4. Commit your changes (`git commit -am 'Add some feature'`)
|
|
63
|
+
5. Push to the branch (`git push origin my-new-feature`)
|
|
64
|
+
6. Create new Pull Request
|
|
65
|
+
|
|
66
|
+
## Other similar projects (and inspiration)
|
|
67
|
+
* [People](https://github.com/dan-ding/people) [Original](https://github.com/mericson/people)
|
|
68
|
+
* [Namae](https://github.com/berkmancenter/namae) (Racc based token)
|
|
69
|
+
* [Nameable](https://github.com/chorn/nameable)
|
|
70
|
+
* [Person-name](https://github.com/matthijsgroen/person-name)
|
|
71
|
+
|
data/Rakefile
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
require "bundler/gem_tasks"
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
module Nomener
|
|
2
|
+
module Compounders
|
|
3
|
+
|
|
4
|
+
# Internal: Regex last name prefixes.
|
|
5
|
+
COMPOUNDS = %r!(?<part>(?:
|
|
6
|
+
Ab
|
|
7
|
+
| Ap
|
|
8
|
+
| Abu
|
|
9
|
+
| Al
|
|
10
|
+
| Bar
|
|
11
|
+
| Bath?
|
|
12
|
+
| Bet
|
|
13
|
+
| Bint?
|
|
14
|
+
| Da
|
|
15
|
+
| De\p{Blank}Ca
|
|
16
|
+
| De\p{Blank}La
|
|
17
|
+
| De\p{Blank}Los
|
|
18
|
+
| de\p{Blank}De\p{Blank}la
|
|
19
|
+
| Degli
|
|
20
|
+
| De[lnrs]?
|
|
21
|
+
| Dele
|
|
22
|
+
| Dell[ae]
|
|
23
|
+
| D[iu]t?
|
|
24
|
+
| Dos
|
|
25
|
+
| El
|
|
26
|
+
| Fitz
|
|
27
|
+
| Gil
|
|
28
|
+
| Het
|
|
29
|
+
| in
|
|
30
|
+
| in\p{Blank}het
|
|
31
|
+
| Ibn
|
|
32
|
+
| Kil
|
|
33
|
+
| L[aeo]
|
|
34
|
+
| M[ai\']?c?
|
|
35
|
+
| Mhic
|
|
36
|
+
| Maol
|
|
37
|
+
| M[au]g
|
|
38
|
+
| Naka
|
|
39
|
+
| 中
|
|
40
|
+
| Neder
|
|
41
|
+
| N[ií]'?[cgn]?
|
|
42
|
+
| Nord
|
|
43
|
+
| Norr
|
|
44
|
+
| Ny
|
|
45
|
+
| Ó
|
|
46
|
+
| Øst
|
|
47
|
+
| Öfver
|
|
48
|
+
| Öst
|
|
49
|
+
| Öster
|
|
50
|
+
| Över
|
|
51
|
+
| Öz
|
|
52
|
+
| Pour
|
|
53
|
+
| St\.?
|
|
54
|
+
| San
|
|
55
|
+
| Stor
|
|
56
|
+
| Söder
|
|
57
|
+
| Ter?
|
|
58
|
+
| Tre
|
|
59
|
+
| U[ií]?
|
|
60
|
+
| Vd
|
|
61
|
+
| V[ao]n
|
|
62
|
+
| V[ao]n
|
|
63
|
+
| Ved\.?
|
|
64
|
+
| Vda\.?
|
|
65
|
+
| Vest
|
|
66
|
+
| Väst
|
|
67
|
+
| Väster
|
|
68
|
+
| Zu
|
|
69
|
+
| (?-i:y)
|
|
70
|
+
| 't
|
|
71
|
+
)\p{Blank}\g<part>*)*!xi
|
|
72
|
+
end
|
|
73
|
+
end
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
module Nomener
|
|
2
|
+
module Helper
|
|
3
|
+
|
|
4
|
+
# Internal: Clean up a given string. Quotes from http://en.wikipedia.org/wiki/Quotation_mark
|
|
5
|
+
# Needs to be fixed up for matching and non-english quotes
|
|
6
|
+
#
|
|
7
|
+
# name - the string to clean
|
|
8
|
+
# leftleft - the left double quote to use when replacing others
|
|
9
|
+
# rightright - the right double quote to use when replacing others
|
|
10
|
+
# left - the single left quote to use when replacing others
|
|
11
|
+
# right - the single left quote to use when replacing others
|
|
12
|
+
#
|
|
13
|
+
# Returns a string which is (ideally) pretty much the same as it was given.
|
|
14
|
+
def self.reformat(name, leftleft = '"', rightright = '"', left = "'", right = "'")
|
|
15
|
+
n = name.dup
|
|
16
|
+
n.scrub! # remove illegal characters
|
|
17
|
+
|
|
18
|
+
# translate fullwidth to typewriter
|
|
19
|
+
n.tr!("\uFF02\uFF07", "\u0022\u0027")
|
|
20
|
+
|
|
21
|
+
n.tr!("\u0022\u00AB\u201C\u201E\u2036\u300E\u301D\u301F\uFE43", leftleft) # replace left double quotes
|
|
22
|
+
n.tr!("\u0022\u00BB\u201D\u201F\u2033\u300F\u301E\uFE44", rightright) # replace right double quotes
|
|
23
|
+
|
|
24
|
+
n.tr!("\u0027\u2018\u201A\u2035\u2039\u300C\uFE41\uFF62", left) # replace left single quotes
|
|
25
|
+
n.tr!("\u0027\u2019\u201B\u2032\u203A\u300D\uFE42\uFF62", right) # replace left single quotes
|
|
26
|
+
|
|
27
|
+
n.gsub!(/[^\p{Alpha}\-\.&\/ \,#{leftleft}#{rightright}#{left}#{right}]/, "") # what others may be in a name?
|
|
28
|
+
n.gsub!(/\p{Blank}+/, " ") # compress whitespace
|
|
29
|
+
n.strip! # trim space
|
|
30
|
+
|
|
31
|
+
n
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
end
|
|
35
|
+
end
|
data/lib/nomener/name.rb
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
require "nomener/parser"
|
|
2
|
+
|
|
3
|
+
module Nomener
|
|
4
|
+
class Name < Struct.new :title, :first, :middle, :nick, :last, :suffix
|
|
5
|
+
|
|
6
|
+
# we don't want to change what we were instantiated with
|
|
7
|
+
attr_reader :original
|
|
8
|
+
|
|
9
|
+
# Public: Create an instance!
|
|
10
|
+
def initialize(nomen = '')
|
|
11
|
+
@original = Nomener::Helper.reformat(nomen.kind_of?(String) ? nomen : "")
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
# Public: Break down a string into parts of a persons name
|
|
15
|
+
#
|
|
16
|
+
# name - A string of name to parse
|
|
17
|
+
#
|
|
18
|
+
# Returns self populated with name or empty
|
|
19
|
+
def parse
|
|
20
|
+
parsed = Parser.parse(@original.dup)
|
|
21
|
+
merge(parsed) unless parsed.nil?
|
|
22
|
+
self
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
# Public: make the name proper case-like, suffix and nickname ignored
|
|
26
|
+
#
|
|
27
|
+
# Returns a string of the full name in a proper (western) case
|
|
28
|
+
def properlike
|
|
29
|
+
f = (first || "").capitalize
|
|
30
|
+
n = (nick.nil? || nick.empty?) ? "" : "\"#{nick}\""
|
|
31
|
+
m = (middle || "").capitalize
|
|
32
|
+
l = capit(last || "")
|
|
33
|
+
t = (title || "").capitalize
|
|
34
|
+
"#{t} #{f} #{n} #{m} #{l} #{suffix}".strip.gsub(/\p{Blank}+/, ' ')
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
# Internal: try to capitalize last names with Mac and Mc and D' and such
|
|
38
|
+
#
|
|
39
|
+
# last - string of the name to capitalize
|
|
40
|
+
#
|
|
41
|
+
# Returns a string of the capitalized name
|
|
42
|
+
def capit(last)
|
|
43
|
+
return "" if last.nil? || last.empty?
|
|
44
|
+
|
|
45
|
+
fix = last.dup
|
|
46
|
+
|
|
47
|
+
# if there are multiple last names separated by spaces
|
|
48
|
+
fix = fix.split(" ").map { |v| v.capitalize }.join " "
|
|
49
|
+
|
|
50
|
+
# if there are multiple last names separated by a dash
|
|
51
|
+
if !fix.index("-").nil?
|
|
52
|
+
fix = fix.split("-").map { |v|
|
|
53
|
+
v.split(" ").map { |w| w.capitalize }.join " "
|
|
54
|
+
}.join "-"
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# anything begining with Mac and not ending in [aciozj]
|
|
58
|
+
if m = fix.match(/Mac([\p{Alpha}]{2,}[^aciozj])/i)
|
|
59
|
+
unless m[1].match(%r!^
|
|
60
|
+
hin|
|
|
61
|
+
hlen|
|
|
62
|
+
har|
|
|
63
|
+
kle|
|
|
64
|
+
klin|
|
|
65
|
+
kie|
|
|
66
|
+
hado| # Portugese
|
|
67
|
+
evicius| # Lithuanian
|
|
68
|
+
iulis| # Lithuanian
|
|
69
|
+
ias # Lithuanian
|
|
70
|
+
!x)
|
|
71
|
+
fix.sub!(/Mac#{m[1]}/, "Mac#{m[1].capitalize}")
|
|
72
|
+
end
|
|
73
|
+
elsif m = fix.match(/Mc([\p{Alpha}]{2,})/i) # anything beginning with Mc
|
|
74
|
+
fix.sub!(/Mc#{m[1]}/, "Mc#{m[1].capitalize}")
|
|
75
|
+
elsif fix.match(/'\p{Alpha}/) # names like D'Angelo or Van 't Hooft
|
|
76
|
+
fix.gsub!(/('\p{Alpha})/) { |s| (s[-1] != 't') ? s.upcase : s } #no cap 't
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
fix
|
|
80
|
+
end
|
|
81
|
+
|
|
82
|
+
# Public: Make inspect ... informative
|
|
83
|
+
#
|
|
84
|
+
# Returns a nicely formatted string
|
|
85
|
+
def inspect
|
|
86
|
+
"#<Nomener::Name #{each_pair.map { |k,v| [k,v.inspect].join('=') if (!v.nil? && !v.empty?) }.compact.join(' ')}>"
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Public: Make the name a string.
|
|
90
|
+
#
|
|
91
|
+
# format - a string using symboles specifying the format of the name to return
|
|
92
|
+
# defaults to "%f %l"
|
|
93
|
+
# %f -> first name
|
|
94
|
+
# %l -> last/surname/family name
|
|
95
|
+
# %m -> middle name
|
|
96
|
+
# %n -> nick name
|
|
97
|
+
# %m -> middle name
|
|
98
|
+
# %s -> suffix
|
|
99
|
+
# %t -> title/prefix
|
|
100
|
+
#
|
|
101
|
+
# propercase - boolean on whether to (try to) fix the case of the name
|
|
102
|
+
# defaults to true
|
|
103
|
+
#
|
|
104
|
+
# Returns the name as a string
|
|
105
|
+
def name(format = "%f %l", propercase = true)
|
|
106
|
+
nomen = to_h
|
|
107
|
+
nomen[:nick] = (nick.nil? || nick.empty?) ? "" : "\"#{nick}\""
|
|
108
|
+
format.gsub! /\%f/, '%{first}'
|
|
109
|
+
format.gsub! /\%l/, '%{last}'
|
|
110
|
+
format.gsub! /\%m/, '%{middle}'
|
|
111
|
+
format.gsub! /\%n/, '%{nick}'
|
|
112
|
+
format.gsub! /\%s/, '%{suffix}'
|
|
113
|
+
format.gsub! /\%t/, '%{title}'
|
|
114
|
+
(format % nomen).strip.gsub /\p{Blank}+/, " "
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
# Public: Shortcut for name format
|
|
118
|
+
# can also be called by the method fullname
|
|
119
|
+
#
|
|
120
|
+
# Returns the full name
|
|
121
|
+
def full
|
|
122
|
+
name("%f %m %l")
|
|
123
|
+
end
|
|
124
|
+
alias :fullname :full
|
|
125
|
+
|
|
126
|
+
# Public: See name
|
|
127
|
+
#
|
|
128
|
+
# Returns the name as a string
|
|
129
|
+
def to_s
|
|
130
|
+
name("%f %l")
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
# Internal: merge another Nomener::Name to this one
|
|
134
|
+
#
|
|
135
|
+
# other - hash to merge into self
|
|
136
|
+
#
|
|
137
|
+
# Returns nothing
|
|
138
|
+
def merge(other)
|
|
139
|
+
return self unless other.kind_of?(Hash)
|
|
140
|
+
each_pair { |k, v| self[k] = other[k] }
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
end
|
|
144
|
+
end
|
|
@@ -0,0 +1,141 @@
|
|
|
1
|
+
require "nomener/name"
|
|
2
|
+
require "nomener/titles"
|
|
3
|
+
require "nomener/suffixes"
|
|
4
|
+
require "nomener/compounders"
|
|
5
|
+
require "nomener/helper"
|
|
6
|
+
|
|
7
|
+
module Nomener
|
|
8
|
+
class Parser
|
|
9
|
+
include Nomener::Titles
|
|
10
|
+
include Nomener::Suffixes
|
|
11
|
+
include Nomener::Compounders
|
|
12
|
+
|
|
13
|
+
# Public: parse a string into name parts
|
|
14
|
+
#
|
|
15
|
+
# name - a string to get the name from
|
|
16
|
+
# format - a hash of options to parse name (default {:order => :fl, :spacelimit => 0})
|
|
17
|
+
# :order - format the name. defaults to "last first" of the available
|
|
18
|
+
# :fl - presumes the name is in the format of "first last"
|
|
19
|
+
# :lf - presumes the name is in the format of "last first"
|
|
20
|
+
# :lcf - presumes the name is in the format of "last, first"
|
|
21
|
+
# :spacelimit - the number of spaces to consider in the first name
|
|
22
|
+
#
|
|
23
|
+
# Returns a Nomener::Name object hopefully a parsed name of the string or nil
|
|
24
|
+
def self.parse(name, format = {:order => :fl, :spacelimit => 0})
|
|
25
|
+
begin
|
|
26
|
+
self.parse!(name, format)
|
|
27
|
+
rescue
|
|
28
|
+
nil
|
|
29
|
+
end
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
# Public: parse a string into name parts
|
|
33
|
+
#
|
|
34
|
+
# name - string to parse a name from
|
|
35
|
+
# format - has of options to parse name. See parse()
|
|
36
|
+
#
|
|
37
|
+
# Returns a hash of name parts or nil
|
|
38
|
+
# Raises ArgumentError if 'name' is not a string or is empty
|
|
39
|
+
def self.parse!(name, format = {:order => :fl, :spacelimit => 1})
|
|
40
|
+
raise ArgumentError, 'Name to parse not provided' unless (name.kind_of?(String) && !name.empty?)
|
|
41
|
+
|
|
42
|
+
name = Nomener::Helper.reformat(name)
|
|
43
|
+
|
|
44
|
+
title = self.parse_title(name)
|
|
45
|
+
suffix = self.parse_suffix(name)
|
|
46
|
+
nick = self.parse_nick(name)
|
|
47
|
+
last = self.parse_last(name, format[:order])
|
|
48
|
+
first, middle = self.parse_first(name, format[:spacelimit])
|
|
49
|
+
|
|
50
|
+
{
|
|
51
|
+
:title => title,
|
|
52
|
+
:suffix => suffix,
|
|
53
|
+
:nick => nick,
|
|
54
|
+
:first => first,
|
|
55
|
+
:last => last,
|
|
56
|
+
:middle => middle
|
|
57
|
+
}
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
# Internal: pull off a title if we can
|
|
61
|
+
#
|
|
62
|
+
# nm - string of the name to parse
|
|
63
|
+
#
|
|
64
|
+
# Returns string of the title found or and empty string
|
|
65
|
+
def self.parse_title(nm)
|
|
66
|
+
title = ""
|
|
67
|
+
if m = TITLES.match(nm)
|
|
68
|
+
title = m[1].strip
|
|
69
|
+
nm.sub!(title, "").strip!
|
|
70
|
+
title.gsub!('.', '')
|
|
71
|
+
end
|
|
72
|
+
title
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
# Internal: pull off what suffixes we can
|
|
76
|
+
#
|
|
77
|
+
# nm - string of the name to parse
|
|
78
|
+
#
|
|
79
|
+
# Returns string of the suffixes found or and empty string
|
|
80
|
+
def self.parse_suffix(nm)
|
|
81
|
+
suffixes = []
|
|
82
|
+
suffixes = nm.scan(SUFFIXES).flatten
|
|
83
|
+
suffixes.each { |s|
|
|
84
|
+
nm.gsub!(/#{s}/, "").strip!
|
|
85
|
+
s.strip!
|
|
86
|
+
}
|
|
87
|
+
suffixes.join " "
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
# Internal: parse nickname out of string. presuming it's in quotes
|
|
91
|
+
#
|
|
92
|
+
# nm - string of the name to parse
|
|
93
|
+
#
|
|
94
|
+
# Returns string of the nickname found or and empty string
|
|
95
|
+
def self.parse_nick(nm)
|
|
96
|
+
nicks = []
|
|
97
|
+
nicks = nm.scan(/([\(\"][\p{Alpha}\-\ ']+[\)\"])/).flatten
|
|
98
|
+
nicks.each { |n|
|
|
99
|
+
nm.gsub!(/#{n}/, "").strip!
|
|
100
|
+
n.gsub!(/["\(\)]/, ' ')
|
|
101
|
+
}
|
|
102
|
+
nicks.join(" ").strip
|
|
103
|
+
end
|
|
104
|
+
|
|
105
|
+
# Internal: parse last name from string
|
|
106
|
+
#
|
|
107
|
+
# nm - string to get the last name from
|
|
108
|
+
# format - symbol defaulting to "first last". See parse()
|
|
109
|
+
#
|
|
110
|
+
# Returns string of the last name found or an empty string
|
|
111
|
+
def self.parse_last(nm, format = :fl)
|
|
112
|
+
last = ''
|
|
113
|
+
if format == :fl && n = nm.match(/\p{Blank}(?<fam>#{COMPOUNDS}[\p{L}\-\']+)\z/i)
|
|
114
|
+
last = n[:fam]
|
|
115
|
+
nm.sub!(last, "").strip!
|
|
116
|
+
elsif format == :lf && n = nm.match(/\A(?<fam>#{COMPOUNDS}[\p{Alpha}\-\']+)\p{Blank}/i)
|
|
117
|
+
last = n[:fam]
|
|
118
|
+
nm.sub!(last, "").strip!
|
|
119
|
+
elsif format == :lcf && n = nm.match(/\A(?<fam>#{COMPOUNDS}[\p{Alpha}\-\'\p{Blank}]+),/i)
|
|
120
|
+
last = n[:fam]
|
|
121
|
+
nm.sub!(last, "").strip!
|
|
122
|
+
nm.sub!(',', "").strip!
|
|
123
|
+
end
|
|
124
|
+
last
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
# Internal: parse the first name, and middle name if any
|
|
128
|
+
#
|
|
129
|
+
# nm - the string to get the first name from
|
|
130
|
+
# namecount - the number of spaces in the first name to consider
|
|
131
|
+
#
|
|
132
|
+
# Returns an array containing the first name and middle name if any
|
|
133
|
+
def self.parse_first(nm, namecount = 0)
|
|
134
|
+
nm.tr! '.', ' '
|
|
135
|
+
first, middle = nm.split ' ', namecount
|
|
136
|
+
|
|
137
|
+
[first || "", middle || ""]
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
end
|
|
141
|
+
end
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
module Nomener
|
|
2
|
+
module Suffixes
|
|
3
|
+
|
|
4
|
+
# Internal: Regex to match suffixes or honorifics after names
|
|
5
|
+
SUFFIXES = %r!\b(?:
|
|
6
|
+
AB # Bachelor of Arts
|
|
7
|
+
| APC
|
|
8
|
+
| Attorney[\p{Blank}\-]at[\p{Blank}\-]Law\.? # Attorney at Law, Attorney-at-Law
|
|
9
|
+
| B[AS]c? # Bachelor of Arts, Bachelor of Science
|
|
10
|
+
| C\.?P\.?A\.?
|
|
11
|
+
| CHB
|
|
12
|
+
| D\.?[DMOPV]\.?[SMD]?\.? # DMD, DO, DPM, DDM, DVM
|
|
13
|
+
| DSC
|
|
14
|
+
| Esq(?:\.|uire\.?)? # Esq, Esquire
|
|
15
|
+
| FAC(?:P|S) # FACP, FACS
|
|
16
|
+
| [(?:X{1,3})(?:IX|IV|V)(?:I{0,3})]{1,}\b # roman numbers I - XXXXVIII, if they're written proper
|
|
17
|
+
| Jn?r\.?
|
|
18
|
+
| Junior
|
|
19
|
+
| LLB
|
|
20
|
+
| M\.?[BDS]\.?ed? # MB, MD, MS, MSed
|
|
21
|
+
| MPH
|
|
22
|
+
| P\.?\p{Blank}?A\.?
|
|
23
|
+
| PC
|
|
24
|
+
| Ph\.?\p{Blank}?D\.?
|
|
25
|
+
| RN
|
|
26
|
+
| SC
|
|
27
|
+
| Sn?r\.? # Snr, Sr
|
|
28
|
+
| Senior
|
|
29
|
+
| V\.?M\.?D\.?
|
|
30
|
+
)!xi
|
|
31
|
+
end
|
|
32
|
+
end
|
|
33
|
+
|
|
@@ -0,0 +1,100 @@
|
|
|
1
|
+
module Nomener
|
|
2
|
+
module Titles
|
|
3
|
+
|
|
4
|
+
# Internal: Regex for matching name prefixes such as honorifics and other formalities
|
|
5
|
+
TITLES = %r!(
|
|
6
|
+
خانم # Persian Mrs ?
|
|
7
|
+
| (?:רעב|'ר) # Yiddish Mr.
|
|
8
|
+
| አቶ # Amharic Mr.
|
|
9
|
+
| Air\p{Blank}(?:Commander|Commodore|Marshall) # Air Commander, Commodore, Marshall
|
|
10
|
+
| Ald(?:erman|\.)?
|
|
11
|
+
| (?:Arch)?Du(?:ke|chess) # Duke, Archduke, Duchess, Archduchess
|
|
12
|
+
| Ato # Amharic Mr.
|
|
13
|
+
| Baron(?:ess)?
|
|
14
|
+
| Bishop
|
|
15
|
+
| Brig(?:adier)?
|
|
16
|
+
| Brother
|
|
17
|
+
| Capt(?:ain|\.)?
|
|
18
|
+
| Cdr\.? # Commander
|
|
19
|
+
| Chaplain
|
|
20
|
+
| Colonel
|
|
21
|
+
| Comm(?:ander|odore) # Commander, Commodore
|
|
22
|
+
| Count(?:ess)?
|
|
23
|
+
| Dame
|
|
24
|
+
| Det\.?
|
|
25
|
+
| Dhr\.?
|
|
26
|
+
| Doctor
|
|
27
|
+
| Dr\.?
|
|
28
|
+
| Dona
|
|
29
|
+
| Do[mn] # Dom, Don
|
|
30
|
+
| Erzherzog(?:in)? # Erzherzog, Erzherzogin
|
|
31
|
+
| Father
|
|
32
|
+
| Field\p{Blank}Marshall
|
|
33
|
+
| Flt?\.?(?:\p{Blank}(?:Lt|Off)\.?) # Fl Lt, Flt Lt, Fl Off, Flt Off
|
|
34
|
+
| Flight(?:\p{Blank}(?:Lieutenant|Officer)) # Flight Lieutenant, Flight Officer
|
|
35
|
+
| Frau
|
|
36
|
+
| Fr\.?
|
|
37
|
+
| Gen(?:eral|\.)? # General
|
|
38
|
+
| H[äe]rra # Estonian, Finnish Mr
|
|
39
|
+
| Herr
|
|
40
|
+
| Hra?\.? # Finnish
|
|
41
|
+
| (?:Rt\.?|Right)?\p{Blank}?Hon\.?(?:ourable)? # Honourable, Right Honourable
|
|
42
|
+
| Insp\.?(?:ector)? # Inspector
|
|
43
|
+
| Judge
|
|
44
|
+
| Justice
|
|
45
|
+
| Khaanom # Persian Mrs
|
|
46
|
+
| Lady
|
|
47
|
+
| Lieutenant(?:\p{Blank}(?:Commander|Colonel|General))? # Lieutenant, Lieutenant Commander, Lieutenant Colonel, Lieutenant General
|
|
48
|
+
| Lt\.?(?:\p{Blank}(?:Cdr|Col|Gen)\.?)? # Lt, Lt Col, Lt Cdr, Lt Gen
|
|
49
|
+
| (?:Lt|Leut|Lieut)\.?
|
|
50
|
+
| Lord
|
|
51
|
+
| Madam(?:e)?
|
|
52
|
+
| Maid
|
|
53
|
+
| Major(?:\p{Blank}General)? # Major, Major General
|
|
54
|
+
| Maj\.?(?:\p{Blank}Gen\.?)? # Maj, Maj Gen
|
|
55
|
+
| (?:Master|Technical|Staff)?\p{Blank}?Sergeant
|
|
56
|
+
| [MTS]?Sgt\.? # Master, Staff, Technical, or just Sergeant
|
|
57
|
+
| Mast(?:er|\.)?
|
|
58
|
+
| Matron
|
|
59
|
+
| Menina
|
|
60
|
+
| Messrs
|
|
61
|
+
| Meneer
|
|
62
|
+
| Miss\.?
|
|
63
|
+
| Mister
|
|
64
|
+
| Mn[er]\.? # Mne (Mnr) Afrikaans Mr.
|
|
65
|
+
| Mons(?:ignor|\.?) # Monsignor
|
|
66
|
+
| Most\p{Blank}Rever[e|a]nd
|
|
67
|
+
| Mother(?:\p{Blank}Superior)? # Mother, Mother Superior
|
|
68
|
+
| Mrs?\.?
|
|
69
|
+
| Msgr\.? # Monsignor
|
|
70
|
+
| M\/?s\.? # Ms, M/s
|
|
71
|
+
| Mt\.?\p{Blank}Revd?\.?
|
|
72
|
+
| Mx\.?
|
|
73
|
+
| (?-i:ông) # Vietnamese Mr. must be lowercase
|
|
74
|
+
| Pastor
|
|
75
|
+
| Private
|
|
76
|
+
| Prof(?:essor|\.)? # Professor, Prof
|
|
77
|
+
| Pte\.? # Private
|
|
78
|
+
| Pvt\.? # Private
|
|
79
|
+
| PFC # Private first class
|
|
80
|
+
| Rabbi
|
|
81
|
+
| Reb\.? # Yiddish Mr.
|
|
82
|
+
| Rever[e|a]nd
|
|
83
|
+
| Revd?\.?
|
|
84
|
+
| Se[nñ]h?orita # senorita, señorita, senhorita
|
|
85
|
+
| Se[nñ]h?ora # senora, señora, senhora
|
|
86
|
+
| Se[nñ][hy]?or(?:\p{Blank}Dom)? # senor, señor, senhor, senyor, senor dom
|
|
87
|
+
| Sénher
|
|
88
|
+
| Seigneur
|
|
89
|
+
| Signor(?:a|e)
|
|
90
|
+
| Sig(?:a|ra)?\.?
|
|
91
|
+
| Sioro # Ido Mr.
|
|
92
|
+
| Sro\.? # Ido Mr.
|
|
93
|
+
| Sir
|
|
94
|
+
| Sister
|
|
95
|
+
| Sr(?:a|ta)?\.?
|
|
96
|
+
| V\.?\ Revd?\.?
|
|
97
|
+
| Very\ Rever[e|a]nd
|
|
98
|
+
)!xi
|
|
99
|
+
end
|
|
100
|
+
end
|
data/lib/nomener.rb
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
require "nomener/version"
|
|
2
|
+
require "nomener/name"
|
|
3
|
+
require "nomener/parser"
|
|
4
|
+
|
|
5
|
+
module Nomener
|
|
6
|
+
|
|
7
|
+
# Public: Convenience method to parse a name
|
|
8
|
+
#
|
|
9
|
+
# name - a string of a name to parse
|
|
10
|
+
#
|
|
11
|
+
# Returns a <Nomener::Name> or nil if it couldn't be parsed
|
|
12
|
+
def self.parse(name)
|
|
13
|
+
Name.new(name).parse
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
end
|