RubyGems - stanford-core-nlp-abstractor - Versions diffs - 0.5.3 - Mend

stanford-core-nlp-abstractor 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml +7 -0
data/LICENSE +18 -0
data/README.md +208 -0
data/bin/AnnotationBridge.java +22 -0
data/bin/bridge.jar +0 -0
data/lib/stanford-core-nlp.rb +238 -0
data/lib/stanford-core-nlp/bridge.rb +57 -0
data/lib/stanford-core-nlp/config.rb +392 -0
metadata +96 -0

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: aaef5a50a17996fda21974a9c41edcacba2b27f1
+  data.tar.gz: a16c9702bf92e93d69e66be79d0743d1df98f020
+SHA512:
+  metadata.gz: 9e0061a31396b5564500e8264a4b68f5d57f8fb62e943838feb843018178bc6ec6047a9aca218a5727d34c6a52efa7f843ee897ec433e005ec59520a7be41b20
+  data.tar.gz: 866e1c5ab898820a181e25f495136ec4a57e8bcc6b7a0d8d870aee452c049de54b8f5a3b5338e072cc09f3e454ccedbd7e24c9a73c3f3b16c116ccd6864bca25

data/LICENSE ADDED

@@ -0,0 +1,18 @@
+Ruby bindings for the Stanford CoreNLP package
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+This license also applies to the included Stanford CoreNLP files.
+You should have received a copy of the GNU General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>.
+Author: Louis-Antoine Mullie (louis.mullie@gmail.com). Copyright 2012.

data/README.md ADDED

@@ -0,0 +1,208 @@
+[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
+**About**
+This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English.
+This gem is compatible with Ruby 1.9.2 and 1.9.3 as well as JRuby 1.7.1. It is tested on both Java 6 and Java 7.
+**Installing**
+First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Two packages are available:
+* A [minimal package](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with the default tagger and parser models for English, French and German.
+* A [full package](http://louismullie.com/treat/stanford-core-nlp-full.zip), with all of the tagger and parser models for English, French and German, as well as named entity and coreference resolution models for English.
+Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/).
+**Configuration**
+You may want to set some optional configuration options. Here are some examples:
+```ruby
+# Set an alternative path to look for the JAR files
+# Default is gem's bin folder.
+StanfordCoreNLP.jar_path = '/path_to_jars/'
+# Set an alternative path to look for the model files
+# Default is gem's bin folder.
+StanfordCoreNLP.model_path = '/path_to_models/'
+# Pass some alternative arguments to the Java VM.
+# Default is ['-Xms512M', '-Xmx1024M'] (be prepared
+# to take a coffee break).
+StanfordCoreNLP.jvm_args = ['-option1', '-option2']
+# Redirect VM output to log.txt
+StanfordCoreNLP.log_file = 'log.txt'
+# Change a specific model file.
+StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
+```
+**Using the gem**
+```ruby
+# Use the model files for a different language than English.
+StanfordCoreNLP.use :french # or :german
+text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
+   'Berlin to discuss a new austerity package. Sarkozy ' +
+   'looked pleased, but Merkel was dismayed.'
+pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
+text = StanfordCoreNLP::Annotation.new(text)
+pipeline.annotate(text)
+text.get(:sentences).each do |sentence|
+  # Syntatical dependencies
+  puts sentence.get(:basic_dependencies).to_s
+  sentence.get(:tokens).each do |token|
+    # Default annotations for all tokens
+    puts token.get(:value).to_s
+    puts token.get(:original_text).to_s
+    puts token.get(:character_offset_begin).to_s
+    puts token.get(:character_offset_end).to_s
+    # POS returned by the tagger
+    puts token.get(:part_of_speech).to_s
+    # Lemma (base form of the token)
+    puts token.get(:lemma).to_s
+    # Named entity tag
+    puts token.get(:named_entity_tag).to_s
+    # Coreference
+    puts token.get(:coref_cluster_id).to_s
+    # Also of interest: coref, coref_chain,
+    # coref_cluster, coref_dest, coref_graph.
+  end
+end
+```
+> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Annotation class.
+The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class is the `snake_case` of the class name, with 'Annotation' at the end removed. For example, `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
+A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the `config.rb` file inside the gem.
+**Loading specific classes**
+You may want to load additional Java classes (including any class from the Stanford NLP packages). The gem provides an API for this:
+```ruby
+# Default base class is edu.stanford.nlp.pipeline.
+StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
+puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
+  # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
+# Here, we specify another base class.
+StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
+puts StanfordCoreNLP::MaxentTagger.inspect
+  # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
+```
+**List of annotator classes**
+Here is a full list of annotator classes provided by the Stanford Core NLP package. You can load these classes individually using `StanfordCoreNLP.load_class` (see above). Once this is done, you can use them like you would from a Java program. Refer to the Java documentation for a list of functions provided by each of these classes.
+* PTBTokenizerAnnotator - tokenizes the text following Penn Treebank conventions.
+* WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
+* POSTaggerAnnotator - annotates the text with part-of-speech tags.
+* MorphaAnnotator - morphological normalizer (generates lemmas).
+* NERAnnotator - annotates the text with named-entity labels.
+* NERCombinerAnnotator - combines several NER models.
+* TrueCaseAnnotator - detects the true case of words in free text.
+* ParserAnnotator - generates constituent and dependency trees.
+* NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
+* TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
+* QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
+* SRLAnnotator - annotates predicates and their semantic roles.
+* DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model.
+* NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
+**List of model files**
+Here is a full list of the default models for the Stanford Core NLP pipeline. You can change these models individually using `StanfordCoreNLP.set_model` (see above).
+* 'pos.model' - 'english-left3words-distsim.tagger'
+* 'ner.model' - 'all.3class.distsim.crf.ser.gz'
+* 'parse.model' - 'englishPCFG.ser.gz'
+* 'dcoref.demonym' - 'demonyms.txt'
+* 'dcoref.animate' - 'animate.unigrams.txt'
+* 'dcoref.female' - 'female.unigrams.txt'
+* 'dcoref.inanimate' - 'inanimate.unigrams.txt'
+* 'dcoref.male' - 'male.unigrams.txt'
+* 'dcoref.neutral' - 'neutral.unigrams.txt'
+* 'dcoref.plural' - 'plural.unigrams.txt'
+* 'dcoref.singular' - 'singular.unigrams.txt'
+* 'dcoref.states' - 'state-abbreviations.txt'
+* 'dcoref.extra.gender' - 'namegender.combine.txt'
+**Testing**
+To run the specs for each language (after copying the JARs into the `bin` folder):
+    rake spec[english]
+    rake spec[german]
+    rake spec[french]
+**Using the latest version of the Stanford CoreNLP**
+Using the latest version of the Stanford CoreNLP (version 3.5.0 as of 31/10/2014) requires some additional manual steps:
+* Download [Stanford CoreNLP version 3.5.0](http://nlp.stanford.edu/software/stanford-corenlp-full-2014-10-31.zip) from http://nlp.stanford.edu/.
+* Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/) or inside the directory location configured by setting StanfordCoreNLP.jar_path.
+* Download [the full Stanford Tagger version 3.5.0](http://nlp.stanford.edu/software/stanford-postagger-full-2014-10-26.zip) from http://nlp.stanford.edu/.
+* Make a directory named 'taggers' inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/) or inside the directory configured by setting StanfordCoreNLP.jar_path.
+* Place the contents of the extracted archive inside taggers directory.
+* Download [the bridge.jar file](https://github.com/louismullie/stanford-core-nlp/blob/master/bin/bridge.jar?raw=true) from https://github.com/louismullie/stanford-core-nlp.
+* Place the downloaded bridger.jar file inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/taggers/) or inside the directory configured by setting StanfordCoreNLP.jar_path.
+* Configure your setup (for English) as follows:
+```ruby
+StanfordCoreNLP.use :english
+StanfordCoreNLP.model_files = {}
+StanfordCoreNLP.default_jars = [
+  'joda-time.jar',
+  'xom.jar',
+  'stanford-corenlp-3.5.0.jar',
+  'stanford-corenlp-3.5.0-models.jar',
+  'jollyday.jar',
+  'bridge.jar'
+]
+end
+```
+Or configure your setup (for French) as follows:
+```ruby
+StanfordCoreNLP.use :french
+StanfordCoreNLP.model_files = {}
+StanfordCoreNLP.set_model('pos.model', 'french.tagger')
+StanfordCoreNLP.default_jars = [
+  'joda-time.jar',
+  'xom.jar',
+  'stanford-corenlp-3.5.0.jar',
+  'stanford-corenlp-3.5.0-models.jar',
+  'jollyday.jar',
+  'bridge.jar'
+]
+end
+```
+Or configure your setup (for German) as follows:
+```ruby
+StanfordCoreNLP.use :german
+StanfordCoreNLP.model_files = {}
+StanfordCoreNLP.set_model('pos.model', 'german-fast.tagger')
+StanfordCoreNLP.default_jars = [
+  'joda-time.jar',
+  'xom.jar',
+  'stanford-corenlp-3.5.0.jar',
+  'stanford-corenlp-3.5.0-models.jar',
+  'jollyday.jar',
+  'bridge.jar'
+]
+end
+```
+**Contributing**
+Simple.
+1. Fork the project.
+2. Send me a pull request!

data/bin/AnnotationBridge.java ADDED

@@ -0,0 +1,22 @@
+import edu.stanford.nlp.ling.CoreAnnotation;
+import edu.stanford.nlp.util.ArrayCoreMap;
+import java.util.Properties;
+import edu.stanford.nlp.pipeline.StanfordCoreNLP;
+// export JAVA_HOME='/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home'
+// javac -cp '.:stanford-corenlp.jar' AnnotationBridge.java
+// jar cf bridge.jar AnnotationBridge.class
+public class AnnotationBridge {
+    public static Object getAnnotation(Object entity, String name) throws ClassNotFoundException {
+      Class<CoreAnnotation> klass;
+      klass = (Class<CoreAnnotation>) Class.forName(name);
+      Object object = ((ArrayCoreMap) entity).get(klass);
+      return object;
+    }
+    public static Object getPipelineWithProperties(Properties properties) {
+      StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);
+      return pipeline;
+    }
+}

data/bin/bridge.jar ADDED

Binary file

data/lib/stanford-core-nlp.rb ADDED

@@ -0,0 +1,238 @@
+require 'stanford-core-nlp/config'
+module StanfordCoreNLP
+  VERSION = '0.5.3'
+  require 'bind-it'
+  extend BindIt::Binding
+  # ############################ #
+  # BindIt Configuration Options #
+  # ############################ #
+  # The default path for the JAR files
+  # is the gem's bin folder.
+  self.jar_path = File.dirname(__FILE__).gsub(/\/lib\z/, '') + '/bin/'
+  # Default namespace is the Stanford pipeline namespace.
+  self.default_namespace = 'edu.stanford.nlp.pipeline'
+  # Load the JVM with a minimum heap size of 512MB,
+  # and a maximum heap size of 1024MB.
+  StanfordCoreNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
+  # Turn logging off by default.
+  StanfordCoreNLP.log_file = nil
+  # Default JAR files to load.
+  StanfordCoreNLP.default_jars = [
+    'joda-time.jar',
+    'xom.jar',
+    'stanford-corenlp.jar',
+    'jollyday.jar',
+    'bridge.jar'
+  ]
+  # Default classes to load.
+  StanfordCoreNLP.default_classes = [
+    ['StanfordCoreNLP', 'edu.stanford.nlp.pipeline', 'CoreNLP'],
+    ['Annotation', 'edu.stanford.nlp.pipeline'],
+    ['Word', 'edu.stanford.nlp.ling'],
+    ['CoreLabel', 'edu.stanford.nlp.ling'],
+    ['MaxentTagger', 'edu.stanford.nlp.tagger.maxent'],
+    ['CRFClassifier', 'edu.stanford.nlp.ie.crf'],
+    ['LexicalizedParser', 'edu.stanford.nlp.parser.lexparser'],
+    ['Options', 'edu.stanford.nlp.parser.lexparser'],
+    ['Properties', 'java.util'],
+    ['ArrayList', 'java.util'],
+    ['AnnotationBridge', '']
+  ]
+  # ########################### #
+  # Stanford Core NLP bindings  #
+  # ########################### #
+  require 'stanford-core-nlp/bridge'
+  extend StanfordCoreNLP::Bridge
+  class << self
+    # The model file names for a given language.
+    attr_accessor :model_files
+    # The folder in which to look for models.
+    attr_accessor :model_path
+    # Store the language currently being used.
+    attr_accessor :language
+    #Custom properties
+    attr_accessor :custom_properties
+  end
+  self.custom_properties = {}
+  # The path to the main folder containing the folders
+  # with the individual models inside. By default, this
+  # is the same as the JAR path.
+  self.model_path = self.jar_path
+  # ########################### #
+  # Public configuration params #
+  # ########################### #
+  # Use models for a given language. Language can be
+  # supplied as full-length, or ISO-639 2 or 3 letter
+  # code (e.g. :english, :eng or :en will work).
+  def self.use(language)
+    lang = nil
+    self.model_files = {}
+    Config::LanguageCodes.each do |l,codes|
+      lang = codes[2] if codes.include?(language)
+    end
+    self.language = lang
+    Config::Models.each do |n, languages|
+      models = languages[lang]
+      folder = Config::ModelFolders[n]
+      if models.is_a?(Hash)
+        n = n.to_s
+        models.each do |m, file|
+          self.model_files["#{n}.#{m}"] = folder + file
+        end
+      elsif models.is_a?(String)
+        self.model_files["#{n}.model"] = folder + models
+      end
+    end
+  end
+  # Use english by default.
+  self.use :english
+  # Set a model file.
+  def self.set_model(name, file)
+    n = name.split('.')[0].intern
+    self.model_files[name] = Config::ModelFolders[n] + file
+  end
+  # ########################### #
+  #    Public API methods       #
+  # ########################### #
+  def self.bind
+    # Take care of Windows users.
+    if self.running_on_windows?
+      self.jar_path.gsub!('/', '\\')
+      self.model_path.gsub!('/', '\\')
+    end
+    # Make the bindings.
+    super
+    # Bind annotation bridge.
+    self.default_classes.each do |info|
+      klass = const_get(info.first)
+      self.inject_get_method(klass)
+    end
+  end
+  # Load a StanfordCoreNLP pipeline with the
+  # specified JVM flags and StanfordCoreNLP
+  # properties.
+  def self.load(*annotators)
+    self.bind unless self.bound
+    # Prepend the JAR path to the model files.
+    properties = {}
+    self.model_files.each do |k,v|
+      found = false
+      annotators.each do |annotator|
+        found = true if k.index(annotator.to_s)
+        break if found
+      end
+      next unless found
+      f = self.model_path + v
+      unless File.readable?(f)
+        raise "Model file #{f} could not be found. " +
+        "You may need to download this file manually " +
+        "and/or set paths properly."
+      end
+      properties[k] = f
+    end
+    properties['annotators'] = annotators.map { |x| x.to_s }.join(', ')
+    unless self.language == :english
+      # Bug fix for French/German parsers.
+      # Otherwise throws "IllegalArgumentException:
+      # Unknown option: -retainTmpSubcategories"
+      properties['parse.flags'] = ''
+      # Bug fix for French/German parsers.
+      # Otherswise throws java.lang.NullPointerException: null.
+      properties['parse.buildgraphs'] = 'false'
+    end
+    # Bug fix for NER system. Otherwise throws:
+    # Error initializing binder 1 at edu.stanford.
+    # nlp.time.Options.<init>(Options.java:88)
+    properties['sutime.binders'] = '0'
+    # Manually include SUTime models.
+    if annotators.include?(:ner)
+      properties['sutime.rules'] =
+      self.model_path + 'sutime/defs.sutime.txt, ' +
+      self.model_path + 'sutime/english.sutime.txt'
+    end
+    props = get_properties(properties)
+    # Hack for Java7 compatibility.
+    bridge = const_get(:AnnotationBridge)
+    bridge.getPipelineWithProperties(props)
+  end
+  # Hack in order not to break backwards compatibility.
+  def self.const_missing(const)
+    if const == :Text
+      puts "WARNING: StanfordCoreNLP::Text has been deprecated." +
+      "Please use StanfordCoreNLP::Annotation instead."
+      Annotation
+    else
+      super(const)
+    end
+  end
+  private
+  # Create a java.util.Properties object from a hash.
+  def self.get_properties(properties)
+    properties = properties.merge(self.custom_properties)
+    props = Properties.new
+    properties.each do |property, value|
+      props.set_property(property.to_s, value.to_s)
+    end
+    props
+  end
+  # Get a Java ArrayList binding to pass lists
+  # of tokens to the Stanford Core NLP process.
+  def self.get_list(tokens)
+    list = StanfordCoreNLP::ArrayList.new
+    tokens.each do |t|
+      list.add(Word.new(t.to_s))
+    end
+    list
+  end
+  # Returns true if we're running on Windows.
+  def self.running_on_windows?
+    RUBY_PLATFORM.split("-")[1] == 'mswin32'
+  end
+  # camel_case which also support dot as separator
+  def self.camel_case(s)
+    s = s.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }
+    s.gsub(/(?:^|_|\.)(.)/) { $1.upcase }
+  end
+end

data/lib/stanford-core-nlp/bridge.rb ADDED

@@ -0,0 +1,57 @@
+module StanfordCoreNLP::Bridge
+  def inject_get_method(klass)
+    klass.class_eval do
+      if RUBY_PLATFORM =~ /java/
+        return unless method_defined?(:get)
+        alias_method :get_without_casting, :get
+      end
+      # Dynamically defined on all proxied annotation classes.
+      # Get an annotation using the annotation bridge.
+      def get(annotation, anno_base = nil)
+        unless RUBY_PLATFORM =~ /java/
+          return unless java_methods.include?('get(Ljava.lang.Class;)')
+        end
+        anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
+        if anno_base
+          unless StanfordNLP::Config::Annotations[anno_base]
+            raise "The path #{anno_base} doesn't exist."
+          end
+          anno_bases = [anno_base]
+        else
+          anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
+          raise "The annotation #{anno_class} doesn't exist." unless anno_bases
+        end
+        if anno_bases.size > 1
+          msg = "There are many different annotations bearing the name #{anno_class}." +
+          "\nPlease specify one of the following base classes as second parameter to disambiguate: "
+          msg << anno_bases.join(',')
+          raise msg
+        else
+          base_class = anno_bases[0]
+        end
+        if RUBY_PLATFORM =~ /java/
+          fqcn = "edu.stanford.#{base_class}"
+          class_path = fqcn.split(".")
+          class_name = class_path.pop
+          path = StanfordCoreNLP.camel_case(class_path.join("."))
+          jruby_class = "Java::#{path}::#{class_name}::#{anno_class}"
+          get_without_casting(Object.module_eval(jruby_class))
+        else
+          url = "edu.stanford.#{base_class}$#{anno_class}"
+          StanfordCoreNLP::AnnotationBridge.getAnnotation(self, url)
+        end
+      end
+    end
+  end
+end

data/lib/stanford-core-nlp/config.rb ADDED

@@ -0,0 +1,392 @@
+module StanfordCoreNLP
+  class Config
+    # A hash of language codes in humanized,
+    # 2 and 3-letter ISO639 codes.
+    LanguageCodes = {
+      :english => [:en, :eng, :english],
+      :german => [:de, :ger, :german],
+      :french => [:fr, :fre, :french]
+    }
+    # Folders inside the JAR path for the models.
+    ModelFolders = {
+      :pos => 'taggers/',
+      :parse => 'grammar/',
+      :ner => 'classifiers/',
+      :dcoref => 'dcoref/'
+    }
+    # Tag sets used by Stanford for each language.
+    TagSets = {
+      :english => :penn,
+      :german => :stutgart,
+      :french => :paris7
+    }
+    # Default models for all languages.
+    Models = {
+      :pos => {
+        :english => 'english-left3words-distsim.tagger',
+        :german => 'german-fast.tagger',
+        :french  => 'french.tagger'
+      },
+      :parse => {
+        :english => 'englishPCFG.ser.gz',
+        :german => 'germanPCFG.ser.gz',
+        :french  => 'frenchFactored.ser.gz'
+      },
+      :ner => {
+        :english => 'english.all.3class.distsim.crf.ser.gz'
+        # :german => {} # Add this at some point.
+      },
+      :dcoref => {
+        :english => {
+          'demonym' => 'demonyms.txt',
+          'animate' => 'animate.unigrams.txt',
+          'female' => 'female.unigrams.txt',
+          'inanimate' => 'inanimate.unigrams.txt',
+          'male' => 'male.unigrams.txt',
+          'neutral' => 'neutral.unigrams.txt',
+          'plural' => 'plural.unigrams.txt',
+          'singular' => 'singular.unigrams.txt',
+          'states' => 'state-abbreviations.txt',
+          'countries' => 'countries',
+          'states.provinces' => 'statesandprovinces',
+          'extra.gender' => 'namegender.combine.txt',
+          'singleton.predictor' => 'singleton.predictor.ser'
+        },
+        :german => {},
+        :french  => {}
+      }
+      # Models to add.
+      #"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
+      #"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
+      #"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
+      #"nfl.gazetteer" - path towards the gazetteer for the NFL domain
+      #"nfl.relation.model" - path towards the NFL relation extraction model
+    }
+    # List of annotations by JAVA class path.
+    Annotations = {
+      'nlp.dcoref.CoNLL2011DocumentReader' => [
+        'CorefMentionAnnotation',
+        'NamedEntityAnnotation'
+      ],
+      'nlp.ling.CoreAnnotations' => [
+        'AbbrAnnotation',
+        'AbgeneAnnotation',
+        'AbstrAnnotation',
+        'AfterAnnotation',
+        'AnswerAnnotation',
+        'AnswerObjectAnnotation',
+        'AntecedentAnnotation',
+        'ArgDescendentAnnotation',
+        'ArgumentAnnotation',
+        'BagOfWordsAnnotation',
+        'BeAnnotation',
+        'BeforeAnnotation',
+        'BeginIndexAnnotation',
+        'BestCliquesAnnotation',
+        'BestFullAnnotation',
+        'CalendarAnnotation',
+        'CategoryAnnotation',
+        'CategoryFunctionalTagAnnotation',
+        'CharacterOffsetBeginAnnotation',
+        'CharacterOffsetEndAnnotation',
+        'CharAnnotation',
+        'ChineseCharAnnotation',
+        'ChineseIsSegmentedAnnotation',
+        'ChineseOrigSegAnnotation',
+        'ChineseSegAnnotation',
+        'ChunkAnnotation',
+        'CoarseTagAnnotation',
+        'CommonWordsAnnotation',
+        'CoNLLDepAnnotation',
+        'CoNLLDepParentIndexAnnotation',
+        'CoNLLDepTypeAnnotation',
+        'CoNLLPredicateAnnotation',
+        'CoNLLSRLAnnotation',
+        'ContextsAnnotation',
+        'CopyAnnotation',
+        'CostMagnificationAnnotation',
+        'CovertIDAnnotation',
+        'D2_LBeginAnnotation',
+        'D2_LEndAnnotation',
+        'D2_LMiddleAnnotation',
+        'DayAnnotation',
+        'DependentsAnnotation',
+        'DictAnnotation',
+        'DistSimAnnotation',
+        'DoAnnotation',
+        'DocDateAnnotation',
+        'DocIDAnnotation',
+        'DomainAnnotation',
+        'EndIndexAnnotation',
+        'EntityClassAnnotation',
+        'EntityRuleAnnotation',
+        'EntityTypeAnnotation',
+        'FeaturesAnnotation',
+        'FemaleGazAnnotation',
+        'FirstChildAnnotation',
+        'ForcedSentenceEndAnnotation',
+        'FreqAnnotation',
+        'GazAnnotation',
+        'GazetteerAnnotation',
+        'GenericTokensAnnotation',
+        'GeniaAnnotation',
+        'GoldAnswerAnnotation',
+        'GovernorAnnotation',
+        'GrandparentAnnotation',
+        'HaveAnnotation',
+        'HeadWordStringAnnotation',
+        'HeightAnnotation',
+        'IDAnnotation',
+        'IDFAnnotation',
+        'INAnnotation',
+        'IndexAnnotation',
+        'InterpretationAnnotation',
+        'IsDateRangeAnnotation',
+        'IsURLAnnotation',
+        'LabelAnnotation',
+        'LastGazAnnotation',
+        'LastTaggedAnnotation',
+        'LBeginAnnotation',
+        'LeftChildrenNodeAnnotation',
+        'LeftTermAnnotation',
+        'LemmaAnnotation',
+        'LEndAnnotation',
+        'LengthAnnotation',
+        'LMiddleAnnotation',
+        'MaleGazAnnotation',
+        'MarkingAnnotation',
+        'MonthAnnotation',
+        'MorphoCaseAnnotation',
+        'MorphoGenAnnotation',
+        'MorphoNumAnnotation',
+        'MorphoPersAnnotation',
+        'NamedEntityTagAnnotation',
+        'NeighborsAnnotation',
+        'NERIDAnnotation',
+        'NormalizedNamedEntityTagAnnotation',
+        'NotAnnotation',
+        'NumericCompositeObjectAnnotation',
+        'NumericCompositeTypeAnnotation',
+        'NumericCompositeValueAnnotation',
+        'NumericObjectAnnotation',
+        'NumericTypeAnnotation',
+        'NumericValueAnnotation',
+        'NumerizedTokensAnnotation',
+        'NumTxtSentencesAnnotation',
+        'OriginalAnswerAnnotation',
+        'OriginalCharAnnotation',
+        'OriginalTextAnnotation',
+        'ParagraphAnnotation',
+        'ParagraphsAnnotation',
+        'ParaPositionAnnotation',
+        'ParentAnnotation',
+        'PartOfSpeechAnnotation',
+        'PercentAnnotation',
+        'PhraseWordsAnnotation',
+        'PhraseWordsTagAnnotation',
+        'PolarityAnnotation',
+        'PositionAnnotation',
+        'PossibleAnswersAnnotation',
+        'PredictedAnswerAnnotation',
+        'PrevChildAnnotation',
+        'PriorAnnotation',
+        'ProjectedCategoryAnnotation',
+        'ProtoAnnotation',
+        'RoleAnnotation',
+        'SectionAnnotation',
+        'SemanticHeadTagAnnotation',
+        'SemanticHeadWordAnnotation',
+        'SemanticTagAnnotation',
+        'SemanticWordAnnotation',
+        'SentenceIDAnnotation',
+        'SentenceIndexAnnotation',
+        'SentencePositionAnnotation',
+        'SentencesAnnotation',
+        'ShapeAnnotation',
+        'SpaceBeforeAnnotation',
+        'SpanAnnotation',
+        'SpeakerAnnotation',
+        'SRL_ID',
+        'SRLIDAnnotation',
+        'SRLInstancesAnnotation',
+        'StackedNamedEntityTagAnnotation',
+        'StateAnnotation',
+        'StemAnnotation',
+        'SubcategorizationAnnotation',
+        'TagLabelAnnotation',
+        'TextAnnotation',
+        'TokenBeginAnnotation',
+        'TokenEndAnnotation',
+        'TokensAnnotation',
+        'TopicAnnotation',
+        'TrueCaseAnnotation',
+        'TrueCaseTextAnnotation',
+        'TrueTagAnnotation',
+        'UBlockAnnotation',
+        'UnaryAnnotation',
+        'UnknownAnnotation',
+        'UtteranceAnnotation',
+        'UTypeAnnotation',
+        'ValueAnnotation',
+        'VerbSenseAnnotation',
+        'WebAnnotation',
+        'WordFormAnnotation',
+        'WordnetSynAnnotation',
+        'WordPositionAnnotation',
+        'WordSenseAnnotation',
+        'XmlContextAnnotation',
+        'XmlElementAnnotation',
+        'YearAnnotation'
+      ],
+      'nlp.dcoref.CorefCoreAnnotations' => [
+        'CorefAnnotation',
+        'CorefChainAnnotation',
+        'CorefClusterAnnotation',
+        'CorefClusterIdAnnotation',
+        'CorefDestAnnotation',
+        'CorefGraphAnnotation'
+      ],
+      'nlp.ling.CoreLabel' => [
+        'GenericAnnotation'
+      ],
+      'nlp.trees.EnglishGrammaticalRelations' => [
+        'AbbreviationModifierGRAnnotation',
+        'AdjectivalComplementGRAnnotation',
+        'AdjectivalModifierGRAnnotation',
+        'AdvClauseModifierGRAnnotation',
+        'AdverbialModifierGRAnnotation',
+        'AgentGRAnnotation',
+        'AppositionalModifierGRAnnotation',
+        'ArgumentGRAnnotation',
+        'AttributiveGRAnnotation',
+        'AuxModifierGRAnnotation',
+        'AuxPassiveGRAnnotation',
+        'ClausalComplementGRAnnotation',
+        'ClausalPassiveSubjectGRAnnotation',
+        'ClausalSubjectGRAnnotation',
+        'ComplementGRAnnotation',
+        'ComplementizerGRAnnotation',
+        'ConjunctGRAnnotation',
+        'ControllingSubjectGRAnnotation',
+        'CoordinationGRAnnotation',
+        'CopulaGRAnnotation',
+        'DeterminerGRAnnotation',
+        'DirectObjectGRAnnotation',
+        'ExpletiveGRAnnotation',
+        'IndirectObjectGRAnnotation',
+        'InfinitivalModifierGRAnnotation',
+        'MarkerGRAnnotation',
+        'ModifierGRAnnotation',
+        'MultiWordExpressionGRAnnotation',
+        'NegationModifierGRAnnotation',
+        'NominalPassiveSubjectGRAnnotation',
+        'NominalSubjectGRAnnotation',
+        'NounCompoundModifierGRAnnotation',
+        'NpAdverbialModifierGRAnnotation',
+        'NumberModifierGRAnnotation',
+        'NumericModifierGRAnnotation',
+        'ObjectGRAnnotation',
+        'ParataxisGRAnnotation',
+        'ParticipialModifierGRAnnotation',
+        'PhrasalVerbParticleGRAnnotation',
+        'PossessionModifierGRAnnotation',
+        'PossessiveModifierGRAnnotation',
+        'PreconjunctGRAnnotation',
+        'PredeterminerGRAnnotation',
+        'PredicateGRAnnotation',
+        'PrepositionalComplementGRAnnotation',
+        'PrepositionalModifierGRAnnotation',
+        'PrepositionalObjectGRAnnotation',
+        'PunctuationGRAnnotation',
+        'PurposeClauseModifierGRAnnotation',
+        'QuantifierModifierGRAnnotation',
+        'ReferentGRAnnotation',
+        'RelativeClauseModifierGRAnnotation',
+        'RelativeGRAnnotation',
+        'SemanticDependentGRAnnotation',
+        'SubjectGRAnnotation',
+        'TemporalModifierGRAnnotation',
+        'XClausalComplementGRAnnotation'
+      ],
+      'nlp.trees.GrammaticalRelation' => [
+        'DependentGRAnnotation',
+        'GovernorGRAnnotation',
+        'GrammaticalRelationAnnotation',
+        'KillGRAnnotation',
+        'Language',
+        'RootGRAnnotation'
+      ],
+      'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
+        'DependencyAnnotation',
+        'DocumentDirectoryAnnotation',
+        'DocumentIdAnnotation',
+        'EntityMentionsAnnotation',
+        'EventMentionsAnnotation',
+        'GenderAnnotation',
+        'RelationMentionsAnnotation',
+        'TriggerAnnotation'
+      ],
+      'nlp.parser.lexparser.ParserAnnotations' => [
+        'ConstraintAnnotation'
+      ],
+      'nlp.semgraph.SemanticGraphCoreAnnotations' => [
+        'BasicDependenciesAnnotation',
+        'CollapsedCCProcessedDependenciesAnnotation',
+        'CollapsedDependenciesAnnotation'
+      ],
+      'nlp.time.TimeAnnotations' => [
+        'TimexAnnotation',
+        'TimexAnnotations'
+      ],
+      'nlp.time.TimeExpression' => [
+        'Annotation',
+        'ChildrenAnnotation',
+        'TimeIndexAnnotation'
+      ],
+      'nlp.trees.TreeCoreAnnotations' => [
+        'TreeHeadTagAnnotation',
+        'TreeHeadWordAnnotation',
+        'TreeAnnotation'
+      ]
+    }
+    # Create a list of annotation names => paths.
+    annotations_by_name = {}
+    Annotations.each do |base_class, annotation_classes|
+      annotation_classes.each do |annotation_class|
+        annotations_by_name[annotation_class] ||= []
+        annotations_by_name[annotation_class] << base_class
+      end
+    end
+    # Hash of name => path.
+    AnnotationsByName = annotations_by_name
+  end
+end

metadata ADDED

@@ -0,0 +1,96 @@
+--- !ruby/object:Gem::Specification
+name: stanford-core-nlp-abstractor
+version: !ruby/object:Gem::Version
+  version: 0.5.3
+platform: ruby
+authors:
+- Louis Mullie
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2015-02-23 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: bind-it
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.2.7
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.2.7
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+description: " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
+  language processing\ntools that provides tokenization, part-of-speech tagging and
+  parsing for several languages, as well as named entity\nrecognition and coreference
+  resolution for English. "
+email:
+- louis.mullie@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- bin/AnnotationBridge.java
+- bin/bridge.jar
+- lib/stanford-core-nlp.rb
+- lib/stanford-core-nlp/bridge.rb
+- lib/stanford-core-nlp/config.rb
+homepage: https://github.com/louismullie/stanford-core-nlp
+licenses: []
+metadata: {}
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 2.4.3
+signing_key:
+specification_version: 4
+summary: Ruby bindings to the Stanford Core NLP tools.
+test_files: []
+has_rdoc: