RubyGems - stanford-core-nlp - Versions diffs - 0.2.1 → 0.3.0 - Mend

stanford-core-nlp 0.2.1 → 0.3.0

Files changed (6) hide show

data/README.md +139 -0
data/lib/stanford-core-nlp.rb +56 -172
data/lib/stanford-core-nlp/{java_wrapper.rb → bridge.rb} +9 -19
metadata +13 -9
data/README.markdown +0 -100
data/lib/stanford-core-nlp/jar_loader.rb +0 -55

data/README.md ADDED Viewed

@@ -0,0 +1,139 @@
+[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
+**About**
+This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, part-of-speech tagging, lemmatization, and parsing of several languages, as well as named entity recognition and coreference resolution in English. This gem is compatible with Ruby 1.9.2 and above.
+**Installing**
+First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
+* A [minimal package for English](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with one tagger model  and one parser model for English.
+* A [full package for English](http://louismullie.com/treat/stanford-core-nlp-english.zip), with all tagger and parser models for English, plus the coreference resolution and named entity recognition models.
+* A [full package for all languages](http://louismullie.com/treat/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
+Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/).
+**Configuration**
+After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some optional configuration options. Here are some examples:
+```ruby
+# Set an alternative path to look for the JAR files
+# Default is gem's bin folder.
+StanfordCoreNLP.jar_path = '/path_to_jars/'
+# Set an alternative path to look for the model files
+# Default is gem's bin folder.
+StanfordCoreNLP.model_path = '/path_to_models/'
+# Pass some alternative arguments to the Java VM.
+# Default is ['-Xms512M', '-Xmx1024M'] (be prepared
+# to take a coffee break).
+StanfordCoreNLP.jvm_args = ['-option1', '-option2']
+# Redirect VM output to log.txt
+StanfordCoreNLP.log_file = 'log.txt'
+# Use the model files for a different language than English.
+StanfordCoreNLP.use(:french)
+# Change a specific model file.
+StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
+```
+**Using the gem**
+```ruby
+text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
+   'Berlin to discuss a new austerity package. Sarkozy ' +
+   'looked pleased, but Merkel was dismayed.'
+pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
+text = StanfordCoreNLP::Text.new(text)
+pipeline.annotate(text)
+text.get(:sentences).each do |sentence|
+  # Syntatical dependencies
+  puts sentence.get(:basic_dependencies).to_s
+  sentence.get(:tokens).each do |token|
+    # Default annotations for all tokens
+    puts token.get(:value).to_s
+    puts token.get(:original_text).to_s
+    puts token.get(:character_offset_begin).to_s
+    puts token.get(:character_offset_end).to_s
+    # POS returned by the tagger
+    puts token.get(:part_of_speech).to_s
+    # Lemma (base form of the token)
+    puts token.get(:lemma).to_s
+    # Named entity tag
+    puts token.get(:named_entity_tag).to_s
+    # Coreference
+   puts token.get(:coref_cluster_id).to_s
+    # Also of interest: coref, coref_chain, coref_cluster, coref_dest, coref_graph.
+  end
+end
+```
+> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
+A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
+**Loading specific classes**
+You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
+```ruby
+# Default base class is edu.stanford.nlp.pipeline.
+StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
+puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
+  # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
+# Here, we specify another base class.
+StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
+puts StanfordCoreNLP::MaxentTagger.inspect
+  # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
+```
+**List of annotator classes**
+Here is a full list of annotator classes provided by the Stanford Core NLP package. You can load these classes individually using `StanfordCoreNLP.load_class` (see above). Once this is done, you can use them like you would from a Java program. Refer to the Java documentation for a list of functions provided by each of these classes.
+* PTBTokenizerAnnotator - tokenizes the text following Penn Treebank conventions.
+* WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
+* POSTaggerAnnotator - annotates the text with part-of-speech tags.
+* MorphaAnnotator - morphological normalizer (generates lemmas).
+* NERAnnotator - annotates the text with named-entity labels.
+* NERCombinerAnnotator - combines several NER models.
+* TrueCaseAnnotator - detects the true case of words in free text.
+* ParserAnnotator - generates constituent and dependency trees.
+* NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
+* TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
+* QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
+* SRLAnnotator - annotates predicates and their semantic roles.
+* DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model.
+* NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
+**List of model files**
+Here is a full list of the default models for the Stanford Core NLP pipeline. You can change these models individually using `StanfordCoreNLP.set_model` (see above).
+* 'pos.model' - 'english-left3words-distsim.tagger'
+* 'ner.model.3class' - 'all.3class.distsim.crf.ser.gz'
+* 'ner.model.7class' - 'muc.7class.distsim.crf.ser.gz'
+* 'ner.model.MISCclass' -- 'conll.4class.distsim.crf.ser.gz'
+* 'parser.model' - 'englishPCFG.ser.gz'
+* 'dcoref.demonym' - 'demonyms.txt'
+* 'dcoref.animate' - 'animate.unigrams.txt'
+* 'dcoref.female' - 'female.unigrams.txt'
+* 'dcoref.inanimate' - 'inanimate.unigrams.txt'
+* 'dcoref.male' - 'male.unigrams.txt'
+* 'dcoref.neutral' - 'neutral.unigrams.txt'
+* 'dcoref.plural' - 'plural.unigrams.txt'
+* 'dcoref.singular' - 'singular.unigrams.txt'
+* 'dcoref.states' - 'state-abbreviations.txt'
+* 'dcoref.extra.gender' - 'namegender.combine.txt'
+**Contributing**
+Feel free to fork the project and send me a pull request!

data/lib/stanford-core-nlp.rb CHANGED Viewed

@@ -1,58 +1,62 @@
 module StanfordCoreNLP
-  VERSION = '0.2.1'
+  VERSION = '0.3.0'
+  require 'bind-it'
+  extend BindIt::Binding
+  # ############################ #
+  # BindIt Configuration Options #
+  # ############################ #
+  # The path in which to look for the Stanford JAR files,
+  # with a trailing slash.
+  self.jar_path = File.dirname(__FILE__) + '/../bin/'
+  # Load the JVM with a minimum heap size of 512MB,
+  # and a maximum heap size of 1024MB.
+  self.jvm_args = ['-Xms512M', '-Xmx1024M']
+  # Turn logging off by default.
+  self.log_file = nil
+  # Default JAR files to load.
+  self.default_jars = [
+    'joda-time.jar',
+    'xom.jar',
+    'stanford-corenlp.jar',
+    'bridge.jar'
+  ]
+  # Default classes to load.
+  self.default_classes = [
+    ['StanfordCoreNLP', 'edu.stanford.nlp.pipeline', 'CoreNLP'],
+    ['Annotation', 'edu.stanford.nlp.pipeline', 'Text'],
+    ['Word', 'edu.stanford.nlp.ling'],
+    ['MaxentTagger', 'edu.stanford.nlp.tagger.maxent'],
+    ['CRFClassifier', 'edu.stanford.nlp.ie.crf'],
+    ['Properties', 'java.util'],
+    ['ArrayList', 'java.util'],
+    ['AnnotationBridge', '']
+  ]
+  # Default namespace is the Stanford pipeline namespace.
+  self.default_namespace = 'edu.stanford.nlp.pipeline'
-  require 'stanford-core-nlp/jar_loader'
-  require 'stanford-core-nlp/java_wrapper'
   require 'stanford-core-nlp/config'
+  require 'stanford-core-nlp/bridge'
   class << self
-    # The path in which to look for the Stanford JAR files,
-    # with a trailing slash.
-    #
-    # The structure of the JAR folder must be as follows:
-    #
-    # Files:
-    #
-    #  /stanford-core-nlp.jar
-    #  /joda-time.jar
-    #  /xom.jar
-    #  /bridge.jar*
-    #
-    # Folders:
-    #
-    #  /classifiers         # Models for the NER system.
-    #  /dcoref              # Models for the coreference resolver.
-    #  /taggers             # Models for the POS tagger.
-    #  /grammar             # Models for the parser.
-    #
-    # *The file bridge.jar is a thin JAVA wrapper over the
-    # Stanford Core NLP get() function, which allows to
-    # retrieve annotations using static classes as names.
-    # This works around one of the lacunae of Rjb.
-    attr_accessor :jar_path
-    # The path to the main folder containing the folders
-    # with the individual models inside. By default, this
-    # is the same as the JAR path.
-    attr_accessor :model_path
-    # The flags for starting the JVM machine. The parser
-    # and named entity recognizer are very memory consuming.
-    attr_accessor :jvm_args
-    # A file to redirect JVM output to.
-    attr_accessor :log_file
-    # The model files for a given language.
+    # The model file names for a given language.
     attr_accessor :model_files
+    # The folder in which to look for models.
+    attr_accessor :model_path
   end
-  # The default JAR path is the gem's bin folder.
-  self.jar_path = File.dirname(__FILE__) + '/../bin/'
-  # The default model path is the same as the JAR path.
+  # The path to the main folder containing the folders
+  # with the individual models inside. By default, this
+  # is the same as the JAR path.
   self.model_path = self.jar_path
-  # Load the JVM with a minimum heap size of 512MB and a
-  # maximum heap size of 1024MB.
-  self.jvm_args = ['-Xms512M', '-Xmx1024M']
-  # Turn logging off by default.
-  self.log_file = nil
   # Use models for a given language. Language can be
   # supplied as full-length, or ISO-639 2 or 3 letter
@@ -83,49 +87,20 @@ module StanfordCoreNLP
   # Use english by default.
   self.use(:english)
-  # Set a model file. Here are the default models for English:
-  #
-  #    'pos.model' => 'english-left3words-distsim.tagger',
-  #    'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
-  #    'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
-  #    'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
-  #    'parser.model' => 'englishPCFG.ser.gz',
-  #    'dcoref.demonym' => 'demonyms.txt',
-  #    'dcoref.animate' => 'animate.unigrams.txt',
-  #    'dcoref.female' => 'female.unigrams.txt',
-  #    'dcoref.inanimate' => 'inanimate.unigrams.txt',
-  #    'dcoref.male' => 'male.unigrams.txt',
-  #    'dcoref.neutral' => 'neutral.unigrams.txt',
-  #    'dcoref.plural' => 'plural.unigrams.txt',
-  #    'dcoref.singular' => 'singular.unigrams.txt',
-  #    'dcoref.states' => 'state-abbreviations.txt',
-  #    'dcoref.extra.gender' => 'namegender.combine.txt'
-  #
+  # Set a model file.
   def self.set_model(name, file)
     n = name.split('.')[0].intern
     self.model_files[name] =
     Config::ModelFolders[n] + file
   end
-  # Whether the classes are initialized or not.
-  @@initialized = false
-  # Load the JARs, create the classes.
-  def self.init
-    unless @@initialized
-      self.load_jars
-      self.load_default_classes
-    end
-    @@initialized = true
-  end
   # Load a StanfordCoreNLP pipeline with the
   # specified JVM flags and StanfordCoreNLP
   # properties.
   def self.load(*annotators)
-    self.init unless @@initialized
+    # Make the bindings.
+    self.bind
     # Prepend the JAR path to the model files.
     properties = {}
     self.model_files.each do |k,v|
@@ -135,15 +110,12 @@ module StanfordCoreNLP
         break if found
       end
       next unless found
       f = self.model_path + v
       unless File.readable?(f)
         raise "Model file #{f} could not be found. " +
         "You may need to download this file manually "+
         " and/or set paths properly."
       end
       properties[k] = f
     end
@@ -152,81 +124,7 @@ module StanfordCoreNLP
     CoreNLP.new(get_properties(properties))
   end
-  # Once it loads a specific annotator model once,
-  # the program always loads the same models when
-  # you make new pipelines and request the annotator
-  # again, ignoring the changes in models.
-  #
-  # This function kills the JVM and reloads everything
-  # if you need to create a new pipeline with different
-  # models for the same annotators.
-  #def self.reload
-  #  raise 'Not implemented.'
-  #end
-  # Load the jars.
-  def self.load_jars
-    JarLoader.log(self.log_file)
-    JarLoader.jvm_args = self.jvm_args
-    JarLoader.jar_path = self.jar_path
-    JarLoader.load('joda-time.jar')
-    JarLoader.load('xom.jar')
-    JarLoader.load('stanford-corenlp.jar')
-    JarLoader.load('bridge.jar')
-  end
-  # Create the Ruby classes corresponding to the StanfordNLP
-  # core classes.
-  def self.load_default_classes
-    const_set(:CoreNLP,
-    Rjb::import('edu.stanford.nlp.pipeline.StanfordCoreNLP')
-    )
-    self.load_klass 'Annotation'
-    self.load_klass 'Word', 'edu.stanford.nlp.ling'
-    self.load_klass 'MaxentTagger', 'edu.stanford.nlp.tagger.maxent'
-    self.load_klass 'CRFClassifier', 'edu.stanford.nlp.ie.crf'
-    self.load_klass 'Properties', 'java.util'
-    self.load_klass 'ArrayList', 'java.util'
-    self.load_klass 'AnnotationBridge', ''
-    const_set(:Text, Annotation)
-  end
-  # Load a class (e.g. PTBTokenizerAnnotator) in a specific
-  # class path (default is 'edu.stanford.nlp.pipeline').
-  # The class is then accessible under the StanfordCoreNLP
-  # namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
-  #
-  # List of annotators:
-  #
-  #  - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
-  #  - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
-  #  - POSTaggerAnnotator - annotates the text with part-of-speech tags.
-  #  - MorphaAnnotator - morphological normalizer (generates lemmas).
-  #  - NERAnnotator - annotates the text with named-entity labels.
-  #  - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
-  #  - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
-  #  - ParserAnnotator - generates constituent and dependency trees.
-  #  - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
-  #  - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
-  #  - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
-  #  - SRLAnnotator - annotates predicates and their semantic roles.
-  #  - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
-  #  - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
-  #  - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
-  def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
-    self.init unless @@initialized
-    self.load_klass(klass, base)
-  end
-  # HCreate a java.util.Properties object from a hash.
+  # Create a java.util.Properties object from a hash.
   def self.get_properties(properties)
     props = Properties.new
     properties.each do |property, value|
@@ -245,18 +143,4 @@ module StanfordCoreNLP
     list
   end
-  # Under_case -> CamelCase.
-  def self.camel_case(text)
-    text.to_s.gsub(/^[a-z]|_[a-z]/) do |a|
-      a.upcase
-    end.gsub('_', '')
-  end
-  private
-  def self.load_klass(klass, base = 'edu.stanford.nlp.pipeline')
-    base += '.' unless base == ''
-    const_set(klass.intern,
-    Rjb::import("#{base}#{klass}"))
-  end
 end

data/lib/stanford-core-nlp/{java_wrapper.rb → bridge.rb} RENAMED Viewed

@@ -1,23 +1,9 @@
 module StanfordCoreNLP
-  # Modify the Rjb JavaProxy class to add our own methods to every Java object.
+  # Modify the Rjb JavaProxy class to add our
+  # own methods to every Java object.
   Rjb::Rjb_JavaProxy.class_eval do
-    # Dynamically defined on all proxied Java objects.
-    # Shorthand for to_string defined by Java classes.
-    def to_s; to_string; end
-    # Dynamically defined on all proxied Java iterators.
-    # Provide Ruby-style iterators to wrap Java iterators.
-    def each
-      if !java_methods.include?('iterator()')
-        raise 'This object cannot be iterated.'
-      else
-        i = self.iterator
-        while i.has_next; yield i.next; end
-      end
-    end
     # Dynamically defined on all proxied annotation classes.
     # Get an annotation using the annotation bridge.
     def get(annotation, anno_base = nil)
@@ -26,15 +12,19 @@ module StanfordCoreNLP
       else
         anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
         if anno_base
-          raise "The path #{anno_base} doesn't exist." unless StanfordNLP::Config::Annotations[anno_base]
+          unless StanfordNLP::Config::Annotations[anno_base]
+            raise "The path #{anno_base} doesn't exist."
+          end
           anno_bases = [anno_base]
         else
           anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
           raise "The annotation #{anno_class} doesn't exist." unless anno_bases
         end
         if anno_bases.size > 1
-          msg = "There are many different annotations bearing the name #{anno_class}. "
-          msg << "Please specify one of the following base classes as second parameter to disambiguate: "
+          msg = "There are many different annotations " +
+          "bearing the name #{anno_class}. \nPlease specify " +
+          "one of the following base classes as second " +
+          "parameter to disambiguate: "
           msg << anno_bases.join(',')
           raise msg
         else

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: stanford-core-nlp
 version: !ruby/object:Gem::Version
-  version: 0.2.1
+  version: 0.3.0
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-03-06 00:00:00.000000000 Z
+date: 2012-04-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: rjb
-  requirement: &70138662664620 !ruby/object:Gem::Requirement
+  name: bind-it
+  requirement: !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,7 +21,12 @@ dependencies:
         version: '0'
   type: :runtime
   prerelease: false
-  version_requirements: *70138662664620
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
 description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
   language processing \ntools that provides tokenization, part-of-speech tagging and
   parsing for several languages, as well as named entity \nrecognition and coreference
@@ -32,12 +37,11 @@ executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- lib/stanford-core-nlp/bridge.rb
 - lib/stanford-core-nlp/config.rb
-- lib/stanford-core-nlp/jar_loader.rb
-- lib/stanford-core-nlp/java_wrapper.rb
 - lib/stanford-core-nlp.rb
 - bin/bridge.jar
-- README.markdown
+- README.md
 - LICENSE
 homepage: https://github.com/louismullie/stanford-core-nlp
 licenses: []
@@ -59,7 +63,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 1.8.15
+rubygems_version: 1.8.21
 signing_key:
 specification_version: 3
 summary: Ruby bindings to the Stanford Core NLP tools.

data/README.markdown DELETED Viewed

@@ -1,100 +0,0 @@
-[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
-**About**
-This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that provides tokenization, part-of-speech tagging, lemmatization, and parsing for several languages, as well as named entity recognition and coreference resolution for English. This gem is compatible with Ruby 1.9.2 and above.
-**Installing**
-Firs, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
-* A [minimal package for English](http://louismullie.com/stanford-core-nlp-minimal.zip) with one tagger model and one parser model for English.
-* A [full package for English](http://louismullie.com/stanford-core-nlp-english.zip), with all tagger and parser models for English, plus the coreference resolution and named entity recognition models.
-* A [full package for all languages](http://louismullie.com/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
-Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. /usr/local/lib/ruby/gems/1.X.x/gems/stanford-core-nlp-0.x/bin/).
-**Configuration**
-After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some optional configuration options. Here are some examples:
-```ruby
-# Set an alternative path to look for the JAR files
-# Default is gem's bin folder.
-StanfordCoreNLP.jar_path = '/path_to_jars/'
-# Set an alternative path to look for the model files
-# Default is gem's bin folder.
-StanfordCoreNLP.model_path = '/path_to_models/'
-# Pass some alternative arguments to the Java VM.
-# Default is ['-Xms512M', '-Xmx1024M'] (be prepared
-# to take a coffee break).
-StanfordCoreNLP.jvm_args = ['-option1', '-option2']
-# Redirect VM output to log.txt
-StanfordCoreNLP.log_file = 'log.txt'
-# Use the model files for a different language than English.
-StanfordCoreNLP.use(:french)
-# Change a specific model file.
-StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
-```
-**Using the gem**
-```ruby
-text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
-   'Berlin to discuss a new austerity package. Sarkozy ' +
-   'looked pleased, but Merkel was dismayed.'
-pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
-text = StanfordCoreNLP::Text.new(text)
-pipeline.annotate(text)
-text.get(:sentences).each do |sentence|
-  # Syntatical dependencies
-  puts sentence.get(:basic_dependencies).to_s
-  sentence.get(:tokens).each do |token|
-    # Default annotations for all tokens
-    puts token.get(:value).to_s
-    puts token.get(:original_text).to_s
-    puts token.get(:character_offset_begin).to_s
-    puts token.get(:character_offset_end).to_s
-    # POS returned by the tagger
-    puts token.get(:part_of_speech).to_s
-    # Lemma (base form of the token)
-    puts token.get(:lemma).to_s
-    # Named entity tag
-    puts token.get(:named_entity_tag).to_s
-    # Coreference
-   puts token.get(:coref_cluster_id).to_s
-    # Also of interest: coref, coref_chain, coref_cluster, coref_dest, coref_graph.
-  end
-end
-```
-> Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
-A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
-**Loading specific classes**
-You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
-```ruby
-# Default base class is edu.stanford.nlp.pipeline.
-StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
-puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
-  # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
-# Here, we specify another base class.
-StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
-puts StanfordCoreNLP::MaxentTagger.inspect
-  # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
-```
-**Contributing**
-Feel free to fork the project and send me a pull request!

data/lib/stanford-core-nlp/jar_loader.rb DELETED Viewed

@@ -1,55 +0,0 @@
-module StanfordCoreNLP
-  class JarLoader
-    require 'rjb'
-    # Configuration options.
-    class << self
-      # An array of flags to pass to the JVM machine.
-      attr_accessor :jvm_args
-      attr_accessor :jar_path
-      attr_accessor :log_file
-    end
-    # An array of string flags to supply to the JVM, e.g. ['-Xms512M', '-Xmx1024M']
-    self.jvm_args = []
-    # The path in which to look for Jars.
-    self.jar_path = ''
-    # By default, disable logging.
-    self.log_file = nil
-    # Load Rjb and create Java VM.
-    def self.rjb_initialize
-      return if ::Rjb::loaded?
-      ::Rjb::load(nil, self.jvm_args)
-      set_java_logging if self.log_file
-    end
-    # Enable logging.
-    def self.log(file = 'log.txt')
-      self.log_file = file
-    end
-    # Redirect the output of the JVM to supplied log file.
-    def self.set_java_logging
-      const_set(:System, Rjb::import('java.lang.System'))
-      const_set(:PrintStream, Rjb::import('java.io.PrintStream'))
-      const_set(:File2, Rjb::import('java.io.File'))
-      ps = PrintStream.new(File2.new(self.log_file))
-      ps.write(::Time.now.strftime("[%m/%d/%Y at %I:%M%p]\n\n"))
-      System.setOut(ps)
-      System.setErr(ps)
-    end
-    # Load a jar.
-    def self.load(jar)
-      self.rjb_initialize
-      jar = self.jar_path + jar
-      if !::File.readable?(jar)
-        raise "Could not find JAR file (looking in #{jar})."
-      end
-      ::Rjb::add_jar(jar)
-    end
-  end
-end