RubyGems - stanford-core-nlp - Versions diffs - 0.1.7 → 0.2.0 - Mend

stanford-core-nlp 0.1.7 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

data/README.markdown +13 -5
data/lib/stanford-core-nlp.rb +84 -44
data/lib/stanford-core-nlp/java_wrapper.rb +5 -4
metadata +4 -4

data/README.markdown CHANGED Viewed

@@ -1,16 +1,22 @@
+[![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
 **About**
-This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that provides tokenization, part-of-speech tagging, lemmatization, and parsing for five languages (English, French, German, Arabic and Chinese), as well as named entity recognition and coreference resolution for English.
+This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that provides tokenization, part-of-speech tagging, lemmatization, and parsing for several languages, as well as named entity recognition and coreference resolution for English. This gem is compatible with Ruby 1.9.2 and above.
 **Installing**
-1. Install the gem: `gem install stanford-core-nlp`.
+Firs, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
+* A [minimal package for English](http://louismullie.com/stanford-core-nlp-minimal.zip) with one tagger model and one parser model for English.
+* A [full package for English](http://louismullie.com/stanford-core-nlp-english.zip), with all tagger and parser models for English, plus the coreference resolution and named entity recognition models.
+* A [full package for all languages](http://louismullie.com/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
-2. Download the Stanford Core NLP JAR and model files. Two package are available with the necessary files: a package for [English only](http://louismullie.com/stanford-core-nlp-english.zip), or a package with models for [all languages](http://louismullie.com/stanford-core-nlp-all.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/).
+Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. /usr/local/lib/ruby/gems/1.X.x/gems/stanford-core-nlp-0.x/bin/).
 **Configuration**
-After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some configuration options (this, however, is not necessary). Here are some examples:
+After installing and requiring the gem (`require 'stanford-core-nlp'`), you may want to set some optional configuration options. Here are some examples:
 ```ruby
 # Set an alternative path to look for the JAR files
@@ -19,7 +25,7 @@ StanfordCoreNLP.jar_path = '/path_to_jars/'
 # Set an alternative path to look for the model files
 # Default is gem's bin folder.
-StanfordCoreNLP.jar_path = '/path_to_models/'
+StanfordCoreNLP.model_path = '/path_to_models/'
 # Pass some alternative arguments to the Java VM.
 # Default is ['-Xms512M', '-Xmx1024M'] (be prepared
@@ -48,6 +54,8 @@ text = StanfordCoreNLP::Text.new(text)
 pipeline.annotate(text)
 text.get(:sentences).each do |sentence|
+  # Syntatical dependencies
+  puts sentence.get(:basic_dependencies).to_s
   sentence.get(:tokens).each do |token|
     # Default annotations for all tokens
     puts token.get(:value).to_s

data/lib/stanford-core-nlp.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 module StanfordCoreNLP
-  VERSION = '0.1.7'
+  VERSION = '0.2.0'
   require 'stanford-core-nlp/jar_loader'
   require 'stanford-core-nlp/java_wrapper'
   require 'stanford-core-nlp/config'
@@ -11,31 +11,31 @@ module StanfordCoreNLP
     # with a trailing slash.
     #
     # The structure of the JAR folder must be as follows:
-    #
+    #
     # Files:
-    #
+    #
     #  /stanford-core-nlp.jar
     #  /joda-time.jar
-    #  /xom.jar
+    #  /xom.jar
     #  /bridge.jar*
-    #
+    #
     # Folders:
     #
     #  /classifiers         # Models for the NER system.
     #  /dcoref              # Models for the coreference resolver.
     #  /taggers             # Models for the POS tagger.
     #  /grammar             # Models for the parser.
-    #
+    #
     # *The file bridge.jar is a thin JAVA wrapper over the
-    # Stanford Core NLP get() function, which allows to
+    # Stanford Core NLP get() function, which allows to
     # retrieve annotations using static classes as names.
     # This works around one of the lacunae of Rjb.
     attr_accessor :jar_path
-    # The path to the main folder containing the folders
+    # The path to the main folder containing the folders
     # with the individual models inside. By default, this
     # is the same as the JAR path.
     attr_accessor :model_path
-    # The flags for starting the JVM machine. The parser
+    # The flags for starting the JVM machine. The parser
     # and named entity recognizer are very memory consuming.
     attr_accessor :jvm_args
     # A file to redirect JVM output to.
@@ -54,8 +54,8 @@ module StanfordCoreNLP
   # Turn logging off by default.
   self.log_file = nil
-  # Use models for a given language. Language can be
-  # supplied as full-length, or ISO-639 2 or 3 letter
+  # Use models for a given language. Language can be
+  # supplied as full-length, or ISO-639 2 or 3 letter
   # code (e.g. :english, :eng or :en will work).
   def self.use(language)
     lang = nil
@@ -70,19 +70,19 @@ module StanfordCoreNLP
         n = n.to_s
         n += '.model' if n == 'ner'
         models.each do |m, file|
-          self.model_files["#{n}.#{m}"] =
+          self.model_files["#{n}.#{m}"] =
           folder + file
         end
       elsif models.is_a?(String)
-        self.model_files["#{n}.model"] =
+        self.model_files["#{n}.model"] =
         folder + models
       end
     end
   end
   # Use english by default.
-  self.use(:english)
+  self.use(:english)
   # Set a model file. Here are the default models for English:
   #
   #    'pos.model' => 'english-left3words-distsim.tagger',
@@ -103,32 +103,40 @@ module StanfordCoreNLP
   #
   def self.set_model(name, file)
     n = name.split('.')[0].intern
-    self.model_files[name] =
+    self.model_files[name] =
     Config::ModelFolders[n] + file
   end
   # Whether the classes are initialized or not.
   @@initialized = false
-  # Whether the JAR files are loaded or not.
-  @@loaded = false
   # Load the JARs, create the classes.
   def self.init
-    self.load_jars unless @@loaded
-    self.create_classes
+    unless @@initialized
+      self.load_jars
+      self.load_default_classes
+    end
     @@initialized = true
   end
-  # Load a StanfordCoreNLP pipeline with the
-  # specified JVM flags and StanfordCoreNLP
+  # Load a StanfordCoreNLP pipeline with the
+  # specified JVM flags and StanfordCoreNLP
   # properties.
   def self.load(*annotators)
-    JarLoader.log(self.log_file)
     self.init unless @@initialized
     # Prepend the JAR path to the model files.
     properties = {}
-    self.model_files.each do |k,v|
-      f = self.model_path + v
+    self.model_files.each do |k,v|
+      found = false
+      annotators.each do |annotator|
+        found = true if k.index(annotator.to_s)
+        break if found
+      end
+      next unless found
+      f = self.model_path + v
+      puts f
       unless File.readable?(f)
         raise "Model file #{f} could not be found. " +
         "You may need to download this file manually "+
@@ -137,14 +145,15 @@ module StanfordCoreNLP
         properties[k] = f
       end
     end
     properties['annotators'] =
     annotators.map { |x| x.to_s }.join(', ')
     CoreNLP.new(get_properties(properties))
   end
-  # Once it loads a specific annotator model once,
-  # the program always loads the same models when
-  # you make new pipelines and request the annotator
+  # Once it loads a specific annotator model once,
+  # the program always loads the same models when
+  # you make new pipelines and request the annotator
   # again, ignoring the changes in models.
   #
   # This function kills the JVM and reloads everything
@@ -153,26 +162,40 @@ module StanfordCoreNLP
   #def self.reload
   #  raise 'Not implemented.'
   #end
   # Load the jars.
   def self.load_jars
+    JarLoader.log(self.log_file)
     JarLoader.jvm_args = self.jvm_args
     JarLoader.jar_path = self.jar_path
     JarLoader.load('joda-time.jar')
     JarLoader.load('xom.jar')
     JarLoader.load('stanford-corenlp.jar')
     JarLoader.load('bridge.jar')
-    @@loaded = true
   end
   # Create the Ruby classes corresponding to the StanfordNLP
   # core classes.
-  def self.create_classes
-    const_set(:CoreNLP, Rjb::import('edu.stanford.nlp.pipeline.StanfordCoreNLP'))
-    const_set(:Annotation, Rjb::import('edu.stanford.nlp.pipeline.Annotation'))
-    const_set(:Text, Annotation) # A more intuitive alias.
-    const_set(:Properties, Rjb::import('java.util.Properties'))
-    const_set(:AnnotationBridge, Rjb::import('AnnotationBridge'))
+  def self.load_default_classes
+    const_set(:CoreNLP,
+    Rjb::import('edu.stanford.nlp.pipeline.StanfordCoreNLP')
+    )
+    self.load_klass 'Annotation'
+    self.load_klass 'Word', 'edu.stanford.nlp.ling'
+    self.load_klass 'MaxentTagger', 'edu.stanford.nlp.tagger.maxent'
+    self.load_klass 'CRFClassifier', 'edu.stanford.nlp.ie.crf'
+    self.load_klass 'Properties', 'java.util'
+    self.load_klass 'ArrayList', 'java.util'
+    self.load_klass 'AnnotationBridge', ''
+    const_set(:Text, Annotation)
   end
   # Load a class (e.g. PTBTokenizerAnnotator) in a specific
@@ -198,12 +221,10 @@ module StanfordCoreNLP
   #  - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
   #  - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
   def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
-    self.load_jars unless @@loaded
-    const_set(klass.intern, Rjb::import("#{base}.#{klass}"))
+    self.init unless @@initialized
+    self.load_klass(klass, base)
   end
-# Private helper functions.
-  private
   # HCreate a java.util.Properties object from a hash.
   def self.get_properties(properties)
     props = Properties.new
@@ -213,9 +234,28 @@ module StanfordCoreNLP
     props
   end
+  # Get a Java ArrayList binding to pass lists
+  # of tokens to the Stanford Core NLP process.
+  def self.get_list(tokens)
+    list = StanfordCoreNLP::ArrayList.new
+    tokens.each do |t|
+      list.add(StanfordCoreNLP::Word.new(t.to_s))
+    end
+    list
+  end
   # Under_case -> CamelCase.
   def self.camel_case(text)
-    text.to_s.gsub(/^[a-z]|_[a-z]/) { |a| a.upcase }.gsub('_', '')
+    text.to_s.gsub(/^[a-z]|_[a-z]/) do |a|
+      a.upcase
+    end.gsub('_', '')
+  end
+  private
+  def self.load_klass(klass, base = 'edu.stanford.nlp.pipeline')
+    base += '.' unless base == ''
+    const_set(klass.intern,
+    Rjb::import("#{base}#{klass}"))
   end
 end

data/lib/stanford-core-nlp/java_wrapper.rb CHANGED Viewed

@@ -22,14 +22,14 @@ module StanfordCoreNLP
     # Get an annotation using the annotation bridge.
     def get(annotation, anno_base = nil)
       if !java_methods.include?('get(Ljava.lang.Class;)')
-        raise'No annotation can be retrieved on this object.'
+        raise 'No annotation can be retrieved on this object.'
       else
         anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
         if anno_base
-          raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
+          raise "The path #{anno_base} doesn't exist." unless StanfordNLP::Config::Annotations[anno_base]
           anno_bases = [anno_base]
         else
-          anno_bases = Config::AnnotationsByName[anno_class]
+          anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
           raise "The annotation #{anno_class} doesn't exist." unless anno_bases
         end
         if anno_bases.size > 1
@@ -41,9 +41,10 @@ module StanfordCoreNLP
           base_class = anno_bases[0]
         end
         url = "edu.stanford.#{base_class}$#{anno_class}"
-        AnnotationBridge.getAnnotation(self, url)
+        StanfordCoreNLP::AnnotationBridge.getAnnotation(self, url)
       end
     end
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: stanford-core-nlp
 version: !ruby/object:Gem::Version
-  version: 0.1.7
+  version: 0.2.0
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-02-22 00:00:00.000000000 Z
+date: 2012-03-04 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rjb
-  requirement: &70107443631860 !ruby/object:Gem::Requirement
+  requirement: &70252071542960 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,7 +21,7 @@ dependencies:
         version: '0'
   type: :runtime
   prerelease: false
-  version_requirements: *70107443631860
+  version_requirements: *70252071542960
 description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
   language processing \ntools that provides tokenization, part-of-speech tagging and
   parsing for several languages, as well as named entity \nrecognition and coreference