RubyGems - stanford-core-nlp - Versions diffs - 0.2.1 → 0.3.0 - Mend

stanford-core-nlp 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

data/README.md +139 -0
data/lib/stanford-core-nlp.rb +56 -172
data/lib/stanford-core-nlp/{java_wrapper.rb → bridge.rb} +9 -19
metadata +13 -9
data/README.markdown +0 -100
data/lib/stanford-core-nlp/jar_loader.rb +0 -55

data/README.md ADDED Viewed

@@ -0,0 +1,139 @@
+[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
+**About**
+This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, part-of-speech tagging, lemmatization, and parsing of several languages, as well as named entity recognition and coreference resolution in English. This gem is compatible with Ruby 1.9.2 and above.
+**Installing**
+First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
+* A [minimal package for English](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with one tagger model  and one parser model for English.
+* A [full package for English](http://louismullie.com/treat/stanford-core-nlp-english.zip), with all tagger and parser models for English, plus the coreference resolution and named entity recognition models.
+* A [full package for all languages](http://louismullie.com/treat/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
+Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/).
+**Configuration**
+After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some optional configuration options. Here are some examples:
+```ruby
+# Set an alternative path to look for the JAR files
+# Default is gem's bin folder.
+StanfordCoreNLP.jar_path = '/path_to_jars/'
+# Set an alternative path to look for the model files
+# Default is gem's bin folder.
+StanfordCoreNLP.model_path = '/path_to_models/'
+# Pass some alternative arguments to the Java VM.
+# Default is ['-Xms512M', '-Xmx1024M'] (be prepared
+# to take a coffee break).
+StanfordCoreNLP.jvm_args = ['-option1', '-option2']
+# Redirect VM output to log.txt
+StanfordCoreNLP.log_file = 'log.txt'
+# Use the model files for a different language than English.
+StanfordCoreNLP.use(:french)
+# Change a specific model file.
+StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
+```
+**Using the gem**
+```ruby
+text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
+   'Berlin to discuss a new austerity package. Sarkozy ' +
+   'looked pleased, but Merkel was dismayed.'
+pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
+text = StanfordCoreNLP::Text.new(text)
+pipeline.annotate(text)
+text.get(:sentences).each do |sentence|
+  # Syntatical dependencies
+  puts sentence.get(:basic_dependencies).to_s
+  sentence.get(:tokens).each do |token|
+    # Default annotations for all tokens
+    puts token.get(:value).to_s
+    puts token.get(:original_text).to_s
+    puts token.get(:character_offset_begin).to_s
+    puts token.get(:character_offset_end).to_s
+    # POS returned by the tagger
+    puts token.get(:part_of_speech).to_s
+    # Lemma (base form of the token)
+    puts token.get(:lemma).to_s
+    # Named entity tag
+    puts token.get(:named_entity_tag).to_s
+    # Coreference
+   puts token.get(:coref_cluster_id).to_s
+    # Also of interest: coref, coref_chain, coref_cluster, coref_dest, coref_graph.
+  end
+end
+```
+> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
+A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
+**Loading specific classes**
+You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
+```ruby
+# Default base class is edu.stanford.nlp.pipeline.
+StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
+puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
+  # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
+# Here, we specify another base class.
+StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
+puts StanfordCoreNLP::MaxentTagger.inspect
+  # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
+```
+**List of annotator classes**
+Here is a full list of annotator classes provided by the Stanford Core NLP package. You can load these classes individually using `StanfordCoreNLP.load_class` (see above). Once this is done, you can use them like you would from a Java program. Refer to the Java documentation for a list of functions provided by each of these classes.
+* PTBTokenizerAnnotator - tokenizes the text following Penn Treebank conventions.
+* WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
+* POSTaggerAnnotator - annotates the text with part-of-speech tags.
+* MorphaAnnotator - morphological normalizer (generates lemmas).
+* NERAnnotator - annotates the text with named-entity labels.
+* NERCombinerAnnotator - combines several NER models.
+* TrueCaseAnnotator - detects the true case of words in free text.
+* ParserAnnotator - generates constituent and dependency trees.
+* NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
+* TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
+* QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
+* SRLAnnotator - annotates predicates and their semantic roles.
+* DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model.
+* NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
+**List of model files**
+Here is a full list of the default models for the Stanford Core NLP pipeline. You can change these models individually using `StanfordCoreNLP.set_model` (see above).
+* 'pos.model' - 'english-left3words-distsim.tagger'
+* 'ner.model.3class' - 'all.3class.distsim.crf.ser.gz'
+* 'ner.model.7class' - 'muc.7class.distsim.crf.ser.gz'
+* 'ner.model.MISCclass' -- 'conll.4class.distsim.crf.ser.gz'
+* 'parser.model' - 'englishPCFG.ser.gz'
+* 'dcoref.demonym' - 'demonyms.txt'
+* 'dcoref.animate' - 'animate.unigrams.txt'
+* 'dcoref.female' - 'female.unigrams.txt'
+* 'dcoref.inanimate' - 'inanimate.unigrams.txt'
+* 'dcoref.male' - 'male.unigrams.txt'
+* 'dcoref.neutral' - 'neutral.unigrams.txt'
+* 'dcoref.plural' - 'plural.unigrams.txt'
+* 'dcoref.singular' - 'singular.unigrams.txt'
+* 'dcoref.states' - 'state-abbreviations.txt'
+* 'dcoref.extra.gender' - 'namegender.combine.txt'
+**Contributing**
+Feel free to fork the project and send me a pull request!

data/lib/stanford-core-nlp.rb CHANGED Viewed

@@ -1,58 +1,62 @@
 module StanfordCoreNLP
-  VERSION = '0.2.1'
+  VERSION = '0.3.0'
+  require 'bind-it'
+  extend BindIt::Binding
+  # ############################ #
+  # BindIt Configuration Options #
+  # ############################ #
+  # The path in which to look for the Stanford JAR files,
+  # with a trailing slash.
+  self.jar_path = File.dirname(__FILE__) + '/../bin/'
+  # Load the JVM with a minimum heap size of 512MB,
+  # and a maximum heap size of 1024MB.
+  self.jvm_args = ['-Xms512M', '-Xmx1024M']
+  # Turn logging off by default.
+  self.log_file = nil
+  # Default JAR files to load.
+  self.default_jars = [
+    'joda-time.jar',
+    'xom.jar',
+    'stanford-corenlp.jar',
+    'bridge.jar'
+  ]
+  # Default classes to load.
+  self.default_classes = [
+    ['StanfordCoreNLP', 'edu.stanford.nlp.pipeline', 'CoreNLP'],
+    ['Annotation', 'edu.stanford.nlp.pipeline', 'Text'],
+    ['Word', 'edu.stanford.nlp.ling'],
+    ['MaxentTagger', 'edu.stanford.nlp.tagger.maxent'],
+    ['CRFClassifier', 'edu.stanford.nlp.ie.crf'],
+    ['Properties', 'java.util'],
+    ['ArrayList', 'java.util'],
+    ['AnnotationBridge', '']
+  ]
+  # Default namespace is the Stanford pipeline namespace.
+  self.default_namespace = 'edu.stanford.nlp.pipeline'
-  require 'stanford-core-nlp/jar_loader'
-  require 'stanford-core-nlp/java_wrapper'
   require 'stanford-core-nlp/config'
+  require 'stanford-core-nlp/bridge'
   class << self
-    # The path in which to look for the Stanford JAR files,
-    # with a trailing slash.
-    #
-    # The structure of the JAR folder must be as follows:
-    #
-    # Files:
-    #
-    #  /stanford-core-nlp.jar
-    #  /joda-time.jar
-    #  /xom.jar
-    #  /bridge.jar*
-    #
-    # Folders:
-    #
-    #  /classifiers         # Models for the NER system.
-    #  /dcoref              # Models for the coreference resolver.
-    #  /taggers             # Models for the POS tagger.
-    #  /grammar             # Models for the parser.
-    #
-    # *The file bridge.jar is a thin JAVA wrapper over the
-    # Stanford Core NLP get() function, which allows to
-    # retrieve annotations using static classes as names.
-    # This works around one of the lacunae of Rjb.
-    attr_accessor :jar_path
-    # The path to the main folder containing the folders
-    # with the individual models inside. By default, this
-    # is the same as the JAR path.
-    attr_accessor :model_path
-    # The flags for starting the JVM machine. The parser
-    # and named entity recognizer are very memory consuming.
-    attr_accessor :jvm_args
-    # A file to redirect JVM output to.
-    attr_accessor :log_file
-    # The model files for a given language.
+    # The model file names for a given language.
     attr_accessor :model_files
+    # The folder in which to look for models.
+    attr_accessor :model_path
   end
-  # The default JAR path is the gem's bin folder.
-  self.jar_path = File.dirname(__FILE__) + '/../bin/'
-  # The default model path is the same as the JAR path.
+  # The path to the main folder containing the folders
+  # with the individual models inside. By default, this
+  # is the same as the JAR path.
   self.model_path = self.jar_path
-  # Load the JVM with a minimum heap size of 512MB and a
-  # maximum heap size of 1024MB.
-  self.jvm_args = ['-Xms512M', '-Xmx1024M']
-  # Turn logging off by default.
-  self.log_file = nil
   # Use models for a given language. Language can be
   # supplied as full-length, or ISO-639 2 or 3 letter
@@ -83,49 +87,20 @@ module StanfordCoreNLP
   # Use english by default.
   self.use(:english)
-  # Set a model file. Here are the default models for English:
-  #
-  #    'pos.model' => 'english-left3words-distsim.tagger',
-  #    'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
-  #    'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
-  #    'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
-  #    'parser.model' => 'englishPCFG.ser.gz',
-  #    'dcoref.demonym' => 'demonyms.txt',
-  #    'dcoref.animate' => 'animate.unigrams.txt',
-  #    'dcoref.female' => 'female.unigrams.txt',
-  #    'dcoref.inanimate' => 'inanimate.unigrams.txt',
-  #    'dcoref.male' => 'male.unigrams.txt',
-  #    'dcoref.neutral' => 'neutral.unigrams.txt',
-  #    'dcoref.plural' => 'plural.unigrams.txt',
-  #    'dcoref.singular' => 'singular.unigrams.txt',
-  #    'dcoref.states' => 'state-abbreviations.txt',
-  #    'dcoref.extra.gender' => 'namegender.combine.txt'
-  #
+  # Set a model file.
   def self.set_model(name, file)
     n = name.split('.')[0].intern
     self.model_files[name] =
     Config::ModelFolders[n] + file
   end
-  # Whether the classes are initialized or not.
-  @@initialized = false
-  # Load the JARs, create the classes.
-  def self.init
-    unless @@initialized
-      self.load_jars
-      self.load_default_classes
-    end
-    @@initialized = true
-  end
   # Load a StanfordCoreNLP pipeline with the
   # specified JVM flags and StanfordCoreNLP
   # properties.
   def self.load(*annotators)
-    self.init unless @@initialized
+    # Make the bindings.
+    self.bind
     # Prepend the JAR path to the model files.
     properties = {}
     self.model_files.each do |k,v|
@@ -135,15 +110,12 @@ module StanfordCoreNLP
         break if found
       end
       next unless found
       f = self.model_path + v
       unless File.readable?(f)
         raise "Model file #{f} could not be found. " +
         "You may need to download this file manually "+
         " and/or set paths properly."
       end
       properties[k] = f
     end
@@ -152,81 +124,7 @@ module StanfordCoreNLP
     CoreNLP.new(get_properties(properties))
   end
-  # Once it loads a specific annotator model once,
-  # the program always loads the same models when
-  # you make new pipelines and request the annotator
-  # again, ignoring the changes in models.
-  #
-  # This function kills the JVM and reloads everything
-  # if you need to create a new pipeline with different
-  # models for the same annotators.
-  #def self.reload
-  #  raise 'Not implemented.'
-  #end
-  # Load the jars.
-  def self.load_jars
-    JarLoader.log(self.log_file)
-    JarLoader.jvm_args = self.jvm_args
-    JarLoader.jar_path = self.jar_path
-    JarLoader.load('joda-time.jar')
-    JarLoader.load('xom.jar')
-    JarLoader.load('stanford-corenlp.jar')
-    JarLoader.load('bridge.jar')
-  end
-  # Create the Ruby classes corresponding to the StanfordNLP
-  # core classes.
-  def self.load_default_classes
-    const_set(:CoreNLP,
-    Rjb::import('edu.stanford.nlp.pipeline.StanfordCoreNLP')
-    )
-    self.load_klass 'Annotation'
-    self.load_klass 'Word', 'edu.stanford.nlp.ling'
-    self.load_klass 'MaxentTagger', 'edu.stanford.nlp.tagger.maxent'
-    self.load_klass 'CRFClassifier', 'edu.stanford.nlp.ie.crf'
-    self.load_klass 'Properties', 'java.util'
-    self.load_klass 'ArrayList', 'java.util'
-    self.load_klass 'AnnotationBridge', ''
-    const_set(:Text, Annotation)
-  end
-  # Load a class (e.g. PTBTokenizerAnnotator) in a specific
-  # class path (default is 'edu.stanford.nlp.pipeline').
-  # The class is then accessible under the StanfordCoreNLP
-  # namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
-  #
-  # List of annotators:
-  #
-  #  - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
-  #  - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
-  #  - POSTaggerAnnotator - annotates the text with part-of-speech tags.
-  #  - MorphaAnnotator - morphological normalizer (generates lemmas).
-  #  - NERAnnotator - annotates the text with named-entity labels.
-  #  - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
-  #  - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
-  #  - ParserAnnotator - generates constituent and dependency trees.
-  #  - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
-  #  - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
-  #  - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
-  #  - SRLAnnotator - annotates predicates and their semantic roles.
-  #  - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
-  #  - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
-  #  - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
-  def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
-    self.init unless @@initialized
-    self.load_klass(klass, base)
-  end
-  # HCreate a java.util.Properties object from a hash.
+  # Create a java.util.Properties object from a hash.
   def self.get_properties(properties)
     props = Properties.new
     properties.each do |property, value|
@@ -245,18 +143,4 @@ module StanfordCoreNLP
     list
   end
-  # Under_case -> CamelCase.
-  def self.camel_case(text)
-    text.to_s.gsub(/^[a-z]|_[a-z]/) do |a|
-      a.upcase
-    end.gsub('_', '')
-  end
-  private
-  def self.load_klass(klass, base = 'edu.stanford.nlp.pipeline')
-    base += '.' unless base == ''
-    const_set(klass.intern,
-    Rjb::import("#{base}#{klass}"))
-  end
 end

data/lib/stanford-core-nlp/{java_wrapper.rb → bridge.rb} RENAMED Viewed

@@ -1,23 +1,9 @@
 module StanfordCoreNLP
-  # Modify the Rjb JavaProxy class to add our own methods to every Java object.
+  # Modify the Rjb JavaProxy class to add our
+  # own methods to every Java object.
   Rjb::Rjb_JavaProxy.class_eval do
-    # Dynamically defined on all proxied Java objects.
-    # Shorthand for to_string defined by Java classes.
-    def to_s; to_string; end
-    # Dynamically defined on all proxied Java iterators.
-    # Provide Ruby-style iterators to wrap Java iterators.
-    def each
-      if !java_methods.include?('iterator()')
-        raise 'This object cannot be iterated.'
-      else
-        i = self.iterator
-        while i.has_next; yield i.next; end
-      end
-    end
     # Dynamically defined on all proxied annotation classes.
     # Get an annotation using the annotation bridge.
     def get(annotation, anno_base = nil)
@@ -26,15 +12,19 @@ module StanfordCoreNLP
       else
         anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
         if anno_base
-          raise "The path #{anno_base} doesn't exist." unless StanfordNLP::Config::Annotations[anno_base]
+          unless StanfordNLP::Config::Annotations[anno_base]
+            raise "The path #{anno_base} doesn't exist."
+          end
           anno_bases = [anno_base]
         else
           anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
           raise "The annotation #{anno_class} doesn't exist." unless anno_bases
         end
         if anno_bases.size > 1
-          msg = "There are many different annotations bearing the name #{anno_class}. "
-          msg << "Please specify one of the following base classes as second parameter to disambiguate: "
+          msg = "There are many different annotations " +
+          "bearing the name #{anno_class}. \nPlease specify " +
+          "one of the following base classes as second " +
+          "parameter to disambiguate: "
           msg << anno_bases.join(',')
           raise msg
         else

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: stanford-core-nlp
 version: !ruby/object:Gem::Version
-  version: 0.2.1
+  version: 0.3.0
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-03-06 00:00:00.000000000 Z
+date: 2012-04-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
-  name: rjb
-  requirement: &70138662664620 !ruby/object:Gem::Requirement
+  name: bind-it
+  requirement: !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,7 +21,12 @@ dependencies:
         version: '0'
   type: :runtime
   prerelease: false
-  version_requirements: *70138662664620
+  version_requirements: !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
 description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
   language processing \ntools that provides tokenization, part-of-speech tagging and
   parsing for several languages, as well as named entity \nrecognition and coreference
@@ -32,12 +37,11 @@ executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- lib/stanford-core-nlp/bridge.rb
 - lib/stanford-core-nlp/config.rb
-- lib/stanford-core-nlp/jar_loader.rb
-- lib/stanford-core-nlp/java_wrapper.rb
 - lib/stanford-core-nlp.rb
 - bin/bridge.jar
-- README.markdown
+- README.md
 - LICENSE
 homepage: https://github.com/louismullie/stanford-core-nlp
 licenses: []
@@ -59,7 +63,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 1.8.15
+rubygems_version: 1.8.21
 signing_key:
 specification_version: 3
 summary: Ruby bindings to the Stanford Core NLP tools.

data/README.markdown DELETED Viewed

@@ -1,100 +0,0 @@
-[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
-**About**
-This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that provides tokenization, part-of-speech tagging, lemmatization, and parsing for several languages, as well as named entity recognition and coreference resolution for English. This gem is compatible with Ruby 1.9.2 and above.
-**Installing**
-Firs, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
-* A [minimal package for English](http://louismullie.com/stanford-core-nlp-minimal.zip) with one tagger model and one parser model for English.
-* A [full package for English](http://louismullie.com/stanford-core-nlp-english.zip), with all tagger and parser models for English, plus the coreference resolution and named entity recognition models.
-* A [full package for all languages](http://louismullie.com/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
-Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. /usr/local/lib/ruby/gems/1.X.x/gems/stanford-core-nlp-0.x/bin/).
-**Configuration**
-After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some optional configuration options. Here are some examples:
-```ruby
-# Set an alternative path to look for the JAR files
-# Default is gem's bin folder.
-StanfordCoreNLP.jar_path = '/path_to_jars/'
-# Set an alternative path to look for the model files
-# Default is gem's bin folder.
-StanfordCoreNLP.model_path = '/path_to_models/'
-# Pass some alternative arguments to the Java VM.
-# Default is ['-Xms512M', '-Xmx1024M'] (be prepared
-# to take a coffee break).
-StanfordCoreNLP.jvm_args = ['-option1', '-option2']
-# Redirect VM output to log.txt
-StanfordCoreNLP.log_file = 'log.txt'
-# Use the model files for a different language than English.
-StanfordCoreNLP.use(:french)
-# Change a specific model file.
-StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
-```
-**Using the gem**
-```ruby
-text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
-   'Berlin to discuss a new austerity package. Sarkozy ' +
-   'looked pleased, but Merkel was dismayed.'
-pipeline =  StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
-text = StanfordCoreNLP::Text.new(text)
-pipeline.annotate(text)
-text.get(:sentences).each do |sentence|
-  # Syntatical dependencies
-  puts sentence.get(:basic_dependencies).to_s
-  sentence.get(:tokens).each do |token|
-    # Default annotations for all tokens
-    puts token.get(:value).to_s
-    puts token.get(:original_text).to_s
-    puts token.get(:character_offset_begin).to_s
-    puts token.get(:character_offset_end).to_s
-    # POS returned by the tagger
-    puts token.get(:part_of_speech).to_s
-    # Lemma (base form of the token)
-    puts token.get(:lemma).to_s
-    # Named entity tag
-    puts token.get(:named_entity_tag).to_s
-    # Coreference
-   puts token.get(:coref_cluster_id).to_s
-    # Also of interest: coref, coref_chain, coref_cluster, coref_dest, coref_graph.
-  end
-end
-```
-> Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
-A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
-**Loading specific classes**
-You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
-```ruby
-# Default base class is edu.stanford.nlp.pipeline.
-StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
-puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
-  # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
-# Here, we specify another base class.
-StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
-puts StanfordCoreNLP::MaxentTagger.inspect
-  # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
-```
-**Contributing**
-Feel free to fork the project and send me a pull request!

data/lib/stanford-core-nlp/jar_loader.rb DELETED Viewed

@@ -1,55 +0,0 @@
-module StanfordCoreNLP
-  class JarLoader
-    require 'rjb'
-    # Configuration options.
-    class << self
-      # An array of flags to pass to the JVM machine.
-      attr_accessor :jvm_args
-      attr_accessor :jar_path
-      attr_accessor :log_file
-    end
-    # An array of string flags to supply to the JVM, e.g. ['-Xms512M', '-Xmx1024M']
-    self.jvm_args = []
-    # The path in which to look for Jars.
-    self.jar_path = ''
-    # By default, disable logging.
-    self.log_file = nil
-    # Load Rjb and create Java VM.
-    def self.rjb_initialize
-      return if ::Rjb::loaded?
-      ::Rjb::load(nil, self.jvm_args)
-      set_java_logging if self.log_file
-    end
-    # Enable logging.
-    def self.log(file = 'log.txt')
-      self.log_file = file
-    end
-    # Redirect the output of the JVM to supplied log file.
-    def self.set_java_logging
-      const_set(:System, Rjb::import('java.lang.System'))
-      const_set(:PrintStream, Rjb::import('java.io.PrintStream'))
-      const_set(:File2, Rjb::import('java.io.File'))
-      ps = PrintStream.new(File2.new(self.log_file))
-      ps.write(::Time.now.strftime("[%m/%d/%Y at %I:%M%p]\n\n"))
-      System.setOut(ps)
-      System.setErr(ps)
-    end
-    # Load a jar.
-    def self.load(jar)
-      self.rjb_initialize
-      jar = self.jar_path + jar
-      if !::File.readable?(jar)
-        raise "Could not find JAR file (looking in #{jar})."
-      end
-      ::Rjb::add_jar(jar)
-    end
-  end
-end