RubyGems - namae - Versions diffs - 0.1.0 - Mend

namae 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

data/.autotest +21 -0
data/.document +6 -0
data/.rspec +2 -0
data/.simplecov +2 -0
data/.travis.yml +11 -0
data/.yardopts +3 -0
data/Gemfile +25 -0
data/LICENSE +661 -0
data/README.md +133 -0
data/Rakefile +62 -0
data/cucumber.yml +1 -0
data/features/bibtex.feature +78 -0
data/features/examples.feature +24 -0
data/features/step_definitions/namae_steps.rb +22 -0
data/features/support/env.rb +19 -0
data/lib/namae.rb +5 -0
data/lib/namae/name.rb +119 -0
data/lib/namae/parser.rb +470 -0
data/lib/namae/parser.y +175 -0
data/lib/namae/utility.rb +47 -0
data/lib/namae/version.rb +10 -0
data/namae.gemspec +80 -0
data/spec/namae/name_spec.rb +65 -0
data/spec/namae/parser_spec.rb +107 -0
data/spec/namae/utility_spec.rb +21 -0
data/spec/spec_helper.rb +19 -0
metadata +147 -0

data/README.md ADDED

@@ -0,0 +1,133 @@
+Namae
+=====
+Namae is a parser for human names. It recognizes personal names of various
+cultural backgrounds and tries to split them into their component parts
+(e.g., given and family names, honorifics etc.).
+[![Build Status](https://secure.travis-ci.org/berkmancenter/namae.png)](http://travis-ci.org/berkmancenter/namae)
+Quickstart
+----------
+1. Install the namae gem (or add it to your Gemfile):
+        $ gem install namae
+2. Start parsing names! Namae expects you to pass in a string and it returns
+   a list of parsed names:
+        require 'namae'
+        names = Namae.parse 'Yukihiro "Matz" Matsumoto'
+        #-> [#<Name family="Matsumoto" given="Yukihiro" nick="Matz">]
+3. Use the name objects to access the individual parts:
+        matz = names[0]
+        matz.nick
+        #-> "Matz"
+        matz.family
+        #-> "Matsumoto"
+        matz.initials
+        #-> "Y.M."
+        matz.initials :expand => true
+        #-> "Y. Matsumoto"
+        matz.initials :dots => false
+        #-> "YM"
+Format and Examples
+-------------------
+Namae recognizes names in a wide variety of two basic formats, internally
+referred to as display-order and sort-order. For example, the following
+names are written in display-order:
+    Namae.parse 'Charles Babbage'
+    #-> [#<Name family="Babbage" given="Charles">]]
+    Namae.parse 'Mr. Alan M. Turing'
+    #-> [#<Name family="Turing" given="Alan M." appellation="Mr.">]
+    Namae.parse 'Yukihiro "Matz" Matsumoto'
+    #-> [#<Name family="Matsumoto" given="Yukihiro" nick="Matz">]
+    Namae.parse 'Augusta Ada King and Lord Byron'
+    #-> [#<Name family="King" given="Augusta Ada">, #<Name family="Byron" title="Lord">]
+    Namae.parse 'Sir Isaac Newton'
+    #-> [#<Name family="Newton" given="Isaac" title="Sir">]
+    Namae.parse 'Prof. Donald Ervin Knuth'
+    #-> [#<Name family="Knuth" given="Donald Ervin" title="Prof.">]
+Or in sort-order:
+    Namae.parse 'Turing, Alan M.'
+    #-> [#<Name family="Turing" given="Alan M.">]
+You can also mix sort- and display-order in the same expression:
+    Namae.parse 'Torvalds, Linus and Alan Cox'
+    #-> [#<Name family="Torvalds" given="Linus">, #<Name family="Cox" given="Alan">]
+Typically, sort-order names are easier to parse, because the syntax is less
+ambiguous. For example, multiple family names are always possible in sort-order:
+    Namae.parse 'Carreño Quiñones, María-Jose'
+    #-> [#<Name family="Carreño Quiñones" given="María-Jose">]
+Whilst in display-order, multiple family names are only supported when the
+name contains a particle or a nickname.
+Rationale
+---------
+Parsing human names is at once too easy and too hard. When working in the
+confines of a single language or culture it is often a trivial task that
+does not warrant a dedicated software package; when working across different
+cultures, languages, or scripts, however, it may quickly become unrealistic
+to devise a satisfying, one-size-fits-all solution. In languages like
+Japanese or Chinese, for instance, the issue of word segmentation alone is
+probably more difficult than name parsing itself.
+Having said that, Namae is based on the rules used by BibTeX to format names
+and can therefore be used to parse names of most languages using latin
+script with the long-time goal to support as many languages and scripts as
+possible without the need for sophisticated or large dictionary based
+language-detection or word segmentation features.
+For further reading, see the W3C's primer on
+[Personal Names Around the World](http://www.w3.org/International/questions/qa-personal-names).
+Development
+-----------
+The Namae source code is [hosted on GitHub](https://github.com/berkmancenter/namae).
+You can check out a copy of the latest code using Git:
+    $ git clone https://github.com/berkmancenter/namae.git
+To get started, generate the parser and run all tests:
+    $ cd namae
+    $ bundle install
+    $ rake racc
+    $ rake features
+    $ rake spec
+If you've found a bug or have a question, please open an issue on the
+[issue tracker](https://github.com/berkmancenter/namae/issues). Or, for extra
+credit, clone the Namae repository, write a failing example, fix the bug
+and submit a pull request.
+Contributors
+------------
+* [Sylvester Keil](http://sylvester.keil.or.at)
+* Dan Collis-Puro
+Copyright
+---------
+Copyright (c) 2012 President and Fellows of Harvard College.
+Please see LICENSE for further details.

data/Rakefile ADDED

@@ -0,0 +1,62 @@
+# encoding: utf-8
+require 'bundler'
+begin
+  Bundler.setup(:default, :development, :debug, :test)
+rescue Bundler::BundlerError => e
+  $stderr.puts e.message
+  $stderr.puts "Run `bundle install` to install missing gems"
+  exit e.status_code
+end
+require 'rake'
+$:.unshift(File.join(File.dirname(__FILE__), './lib'))
+require 'namae'
+begin
+  require 'jeweler'
+  Jeweler::Tasks.new do |gem|
+    gem.name = 'namae'
+    gem.version = Namae::Version::STRING.dup
+    gem.homepage = 'https://github.com/berkmancenter/namae'
+    gem.email = ['sylvester@keil.or.at', 'dan@collispuro.com']
+    gem.authors = ['Sylvester Keil', 'Dan Collis-Puro']
+    gem.license = 'AGPL'
+    gem.summary =
+      'Namae parses personal names and splits them into their component parts.'
+    gem.description = %q{
+      Namae is a parser for human names. It recognizes personal names of various
+      cultural backgrounds and tries to split them into their component parts
+      (e.g., given and family names, honorifics etc.).
+    }.gsub(/\s+/, ' ')
+  end
+  Jeweler::RubygemsDotOrgTasks.new
+rescue LoadError
+  warn 'failed to load jeweler'
+end
+desc 'Generate the name parser'
+task :racc => ['lib/namae/parser.rb']
+file 'lib/namae/parser.rb' => ['lib/namae/parser.y'] do
+  sh 'bundle exec racc -o lib/namae/parser.rb lib/namae/parser.y'
+end
+require 'rspec/core'
+require 'rspec/core/rake_task'
+RSpec::Core::RakeTask.new(:spec) do |spec|
+  spec.pattern = FileList['spec/**/*_spec.rb']
+end
+require 'cucumber/rake/task'
+Cucumber::Rake::Task.new(:features)
+task :default => :spec
+require 'yard'
+YARD::Rake::YardocTask.new

data/cucumber.yml ADDED

	@@ -0,0 +1 @@
1	+ default: --format progress --require features --color

data/features/bibtex.feature ADDED

@@ -0,0 +1,78 @@
+Feature: Parse BibTeX-style names
+	As a hacker who works with bibliographies
+	I want to be able to parse BibTeX-style names
+  Scenario Outline: Name splitting
+    When I parse the name "<name>"
+    Then the BibTeX parts should be:
+      |  first  |  von  |  last  |  jr  |
+      | <first> | <von> | <last> | <jr> |
+  @names @display
+  Scenarios: Decoret test suite (display order)
+    | name            | first      | von        | last    | jr |
+    | AA BB           | AA         |            | BB      |    |
+    | AA BB CC        | AA BB      |            | CC      |    |
+#   | AA              |            |            | AA      |    |
+    | AA bb           | AA         |            | bb      |    |
+#   | aa              |            |            | aa      |    |
+    | aa bb           |            | aa         | bb      |    |
+    | aa BB           |            | aa         | BB      |    |
+    | AA bb CC        | AA         | bb         | CC      |    |
+    | AA bb CC dd EE  | AA         | bb CC dd   | EE      |    |
+#    | AA 1B cc dd     | AA 1B      | cc         | dd      |    |
+#    | AA 1b cc dd     | AA         | 1b cc      | dd      |    |
+    | AA {b}B cc dd   | AA {b}B    | cc         | dd      |    |
+    | AA {b}b cc dd   | AA         | {b}b cc    | dd      |    |
+    | AA {B}b cc dd   | AA         | {B}b cc    | dd      |    |
+    | AA {B}B cc dd   | AA {B}B    | cc         | dd      |    |
+    | AA \BB{b} cc dd | AA \\BB{b} | cc         | dd      |    |
+    | AA \bb{b} cc dd | AA \\bb{b} | cc         | dd      |    |
+    | AA {bb} cc DD   | AA {bb}    | cc         | DD      |    |
+    | AA bb {cc} DD   | AA         | bb         | {cc} DD |    |
+    | AA {bb} CC      | AA {bb}    |            | CC      |    |
+  @names @sort
+  Scenarios: Decoret test suite (sort order)
+    | name            | first      | von        | last    | jr |
+    | bb CC, AA       | AA         | bb         | CC      |    |
+    | bb CC, aa       | aa         | bb         | CC      |    |
+    | bb CC dd EE, AA | AA         | bb CC dd   | EE      |    |
+    | bb, AA          | AA         |            | bb      |    |
+    | BB,             |            |            | BB      |    |
+    | bb CC,XX, AA    | AA         | bb         | CC      | XX |
+    | bb CC,xx, AA    | AA         | bb         | CC      | xx |
+    | BB,, AA         | AA         |            | BB      |    |
+    | CC dd BB, AA    | AA         | CC dd      | BB      |    |
+    | BB, AA          | AA         |            | BB      |    |
+  @names @sort
+  Scenarios: Long von parts
+    | name            | first      | von        | last    | jr |
+    | bb cc dd CC, AA | AA         | bb cc dd   | CC      |    |
+    | bb CC dd CC, AA | AA         | bb CC dd   | CC      |    |
+    | BB cc dd CC, AA | AA         | BB cc dd   | CC      |    |
+    | BB CC dd CC, AA | AA         | BB CC dd   | CC      |    |
+  @names
+  Scenarios: Decoret further remarks
+    | name                               | first                | von            | last                    | jr |
+    | Dominique Galouzeau de Villepin    | Dominique Galouzeau  | de             | Villepin                |    |
+    | Dominique {G}alouzeau de Villepin  | Dominique            | {G}alouzeau de | Villepin                |    |
+    | Galouzeau {de} Villepin, Dominique | Dominique            |                | Galouzeau {de} Villepin |    |
+   @names
+  Scenarios: Some actual names
+    | name                              | first                   | von            | last                           | jr  |
+    | John Paul Jones                   | John Paul               |                | Jones                          |     |
+    | Ludwig von Beethoven              | Ludwig                  | von            | Beethoven                      |     |
+    | von Beethoven, Ludwig             | Ludwig                  | von            | Beethoven                      |     |
+    | {von Beethoven}, Ludwig           | Ludwig                  |                | {von Beethoven}                |     |
+    | {{von} Beethoven}, Ludwig         | Ludwig                  |                | {{von} Beethoven}              |     |
+    | John {}Paul Jones                 | John {}Paul             |                | Jones                          |     |
+    | Ford, Jr., Henry                  | Henry                   |                | Ford                           | Jr. |
+    | Brinch Hansen, Per                | Per                     |                | Brinch Hansen                  |     |
+#    | {Barnes and Noble, Inc.}          |                         |                | {Barnes and Noble, Inc.}       |     |
+    | {Barnes and} {Noble, Inc.}        | {Barnes and}            |                | {Noble, Inc.}                  |     |
+    | {Barnes} {and} {Noble,} {Inc.}    | {Barnes} {and} {Noble,} |                | {Inc.}                         |     |
+    | Charles Louis Xavier Joseph de la Vallee Poussin | Charles Louis Xavier Joseph | de la | Vallee Poussin       |     |

data/features/examples.feature ADDED

@@ -0,0 +1,24 @@
+Feature: Parse the names in the Readme file
+	As a hacker who works with Namae
+	I want to be able to parse all the examples in the Readme file
+  Scenario Outline: Names Parsing
+    When I parse the name "<name>"
+    Then the parts should be:
+      |  given  |  particle  |  family  |  suffix  |  title  |  appellation  |  nick  |
+      | <given> | <particle> | <family> | <suffix> | <title> | <appellation> | <nick> |
+    @readme @display
+    Scenarios: Readme examples (display-order)
+      | name                         | given        | particle | family    | suffix | title | appellation | nick |
+      | Charles Babbage              | Charles      |          | Babbage   |        |       |             |      |
+      | Mr. Alan M. Turing           | Alan M.      |          | Turing    |        |       | Mr.         |      |
+      | Yukihiro "Matz" Matsumoto    | Yukihiro     |          | Matsumoto |        |       |             | Matz |
+      | Sir Isaac Newton             | Isaac        |          | Newton    |        | Sir   |             |      |
+      | Prof. Donald Ervin Knuth     | Donald Ervin |          | Knuth     |        | Prof. |             |      |
+      | Lord Byron                   |              |          | Byron     |        | Lord  |             |      |
+    @readme @sort
+    Scenarios: Readme examples (sort-order)
+      | name                         | given        | particle | family           | suffix | title | appellation | nick |
+      | Carreño Quiñones, María-Jose | María-Jose   |          | Carreño Quiñones |        |       |             |      |

data/features/step_definitions/namae_steps.rb ADDED

@@ -0,0 +1,22 @@
+When /^I parse the name "(.*)"$/ do |string|
+  @name = Namae.parse!(string)[0]
+end
+When /^I parse the names "(.*)"$/ do |string|
+  @names = Namae.parse!(string)
+end
+Then /^the BibTeX parts should be:$/ do |table|
+  table.hashes.each do |row|
+    @name.values_at(:given, :particle, :family, :suffix).map(&:to_s).should ==
+      row.values_at('first', 'von', 'last', 'jr')
+  end
+end
+Then /^the parts should be:$/ do |table|
+  table.hashes.each do |row|
+    @name.values_at(:given, :particle, :family, :suffix, :title, :appellation, :nick).map(&:to_s).should ==
+      row.values_at('given', 'particle', 'family', 'suffix', 'title', 'appellation', 'nick')
+  end
+end

data/features/support/env.rb ADDED

@@ -0,0 +1,19 @@
+require 'bundler'
+begin
+  Bundler.setup(:default, :development)
+rescue Bundler::BundlerError => e
+  $stderr.puts e.message
+  $stderr.puts "Run `bundle install` to install missing gems"
+  exit e.status_code
+end
+begin
+  require 'simplecov'
+rescue LoadError
+  # ignore
+end
+$LOAD_PATH.unshift(File.dirname(__FILE__) + '/../../lib')
+require 'namae'
+require 'rspec/expectations'

data/lib/namae.rb ADDED

@@ -0,0 +1,5 @@
+require 'namae/version'
+require 'namae/name'
+require 'namae/parser'
+require 'namae/utility'

data/lib/namae/name.rb ADDED

@@ -0,0 +1,119 @@
+module Namae
+  # A Name represents a single personal name, exposing its constituent
+  # parts (e.g., family name, given name etc.). Name instances are typically
+  # created and returned from {Namae.parse Namae.parse}.
+  #
+  #     name = Namae.parse('Yukihiro "Matz" Matsumoto')[0]
+  #
+  #     name.family #=> Matsumoto
+  #     name.nick #=> Matz
+  #     name.given #=> Yukihiro
+  #
+  class Name < Struct.new :family, :given, :suffix, :particle,
+    :dropping_particle, :nick, :appellation, :title
+    # rbx compatibility
+    @parts = members.map(&:to_sym).freeze
+    @defaults = {
+      :initials => {
+        :expand => false,
+        :dots => true,
+        :spaces => false
+      }
+    }
+    class << self
+      attr_reader :parts, :defaults
+    end
+    # @param attributes [Hash] the individual parts of the name
+    # @example
+    #   Name.new(:family => 'Matsumoto')
+    def initialize(attributes = {})
+      super(*attributes.values_at(*Name.parts))
+    end
+    # True if all the name components are nil.
+    def empty?
+      values.compact.empty?
+    end
+    # Merges the name with the passed-in name or hash.
+    #
+    # @param other [#each_pair] the other name or hash
+    # @return [self]
+    def merge(other)
+      raise ArgumentError, "failed to merge #{other.class} into Name" unless
+        other.respond_to?(:each_pair)
+      other.each_pair do |part, value|
+        writer = "#{part}="
+        send(writer, value) if !value.nil? && respond_to?(writer)
+      end
+      self
+    end
+    # @param options [Hash] the options to create the initials
+    #
+    # @option options [true,false] :expand (false) whether or not to expand the family name
+    # @option options [true,false] :dots (true) whether or not to print dots between the initials
+    # @option options [true,false] :spaces (false) whether or not to print spaces between the initals
+    #
+    # @return [String] the name's initials.
+    def initials(options = {})
+      options = Name.defaults[:initials].merge(options)
+      if options[:expand]
+        [initials_of(given_part, options), family].compact.join(' ')
+      else
+        initials_of([given_part, family_part].join(' '), options)
+      end
+    end
+    # @overload values_at(selector, ... )
+    #   Returns an array containing the elements in self corresponding to
+    #   the given selector(s). The selectors may be either integer indices,
+    #   ranges (functionality inherited from Struct) or symbols
+    #   idenifying valid keys.
+    #
+    # @example
+    #   name.values_at(:family, :nick) #=> ['Matsumoto', 'Matz']
+    #
+    # @see Struct#values_at
+    # @return [Array] the list of values
+    def values_at(*arguments)
+      super(*arguments.flatten.map { |k| k.is_a?(Symbol) ? Name.parts.index(k) : k })
+    end
+    # Describe the contents of this name in a string.
+    def inspect
+      "#<Name #{each_pair.map { |k,v| [k,v.inspect].join('=') if v }.compact.join(' ')}>"
+    end
+    private
+    def family_part
+      [particle, family].compact.join(' ')
+    end
+    def given_part
+      [given, dropping_particle].compact.join(' ')
+    end
+    # @param name [String] a name or part of a name
+    # @return [String] the initials of the passed-in name
+    def initials_of(name, options = {})
+      i = name.gsub(/([[:upper:]])[[:lower:]]+/, options[:dots] ? '\1.' : '\1')
+      i.gsub!(/\s+/, '') unless options[:spaces]
+      i
+    end
+  end
+end