RubyGems - stanfordparser - Versions diffs - 1.1.0 → 1.2.0 - Mend

stanfordparser 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

data/README CHANGED Viewed

@@ -4,17 +4,21 @@ This module is a wrapper for the {Stanford Natural Language Parser}[http://nlp.s
 The Stanford Natural Language Parser is a Java implementation of a probabilistic PCFG and dependency parser for English, German, Chinese, and Arabic.  This module provides a thin wrapper around the Java code to make it accessible from Ruby.
-= Installation
+= Installation and Configuration
 To run this module you must install the following additional software
 * The {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml]
 * The {Ruby Java Bridge}[http://rjb.rubyforge.org/] gem.
-This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory.  This is the directory that contains the <tt>stanford-parser.jar</tt> file.  An alternate directory can be specified with a <tt>/etc/ruby-stanford-parser.yaml</tt> configuration file.  This file is in the Ruby YAML[http://www.ruby-doc.org/core/classes/YAML.html] format, and contains a single <tt>root</tt> value, for example:
+Note that the Stanford Parser is not a Ruby application and is therefore not a Ruby gem and must be manually installed.
-	root: /usr/local/stanford-parser/other/location
+This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory.  This is the directory that contains the <tt>stanford-parser.jar</tt> file.  When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
+These defaults can be overridden by creating a configuration file in <tt>/etc/ruby_stanford_parser.yaml</tt>.  This file is in the Ruby YAML[http://www.ruby-doc.org/core/classes/YAML.html] format, and may contain two values: <tt>root</tt> and <tt>jvmargs</tt>. For example, the file might look like the following:
+	root: /usr/local/stanford-parser/other/location
+	jvmargs: -Xmx100m -verbose
 =Usage
@@ -36,9 +40,9 @@ Use the StanfordParser::LexicalizedParser class to parse sentences.
 Use the StanfordParser::DocumentPreprocessor class to tokenize text and files into words or sentences.
 	irb(main):004:0> preproc = StanfordParser::DocumentPreprocessor.new
-	irb(main):008:0> puts preproc.getSentencesFromString("This is a sentence.  So is this.")
-	This is a sentence .
-	So is this .
+	irb(main):008:0> puts preproc.getSentencesFromString("This is a sentence.  So is this.")
+	This is a sentence .
+	So is this .
 For complete details about the use of these classes, see the documentation on the Stanford Natural Language Parser website.
@@ -46,7 +50,8 @@ For complete details about the use of these classes, see the documentation on th
 = History
 1.0.0:: Initial release
-1.1.0:: Make module initialization function private
+1.1.0:: Make module initialization function private.  Add example code.
+1.2.0:: Read Java VM arguments from the configuration file.  Add Word class.
 = Copyright

data/lib/stanfordparser.rb CHANGED Viewed

@@ -104,8 +104,9 @@ module Rjb
       end # if
     end # wrap_java_object
-    # By default, all RJB classes other than <tt>java.util.ArrayList</tt> go
-    # in a generic wrapper.  Derived classes may change this behavior.
+    # By default, all RJB classes other than <tt>java.util.ArrayList</tt> and
+    # <tt>java.util.HashSet</tt> go in a generic wrapper.  Derived classes may
+    # change this behavior.
     def wrap_rjb_object(object)
       JavaObjectWrapper.new(object)
     end
@@ -131,24 +132,36 @@ end # Rjb
 # Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
 module StanfordParser
-  VERSION = "1.1.0"
-  # This function is executed once when the module is loaded.  It adds the
-  # Stanford parser jarfile to the JVM classpath and return the root of the
-  # parser installation.  The root of the installation may be written in a
-  # YAML file in <tt>/etc/ruby_stanford_parser.yaml</tt>.  If this file is not
-  # present, the default root <tt>/usr/local/stanford-parser/current</tt> is
-  # used.
+  VERSION = "1.2.0"
+  # Path to an English PCFG model that comes with the Stanford Parser.  The
+  # location is relative to the parser root directory.  This is a valid value
+  # for the <em>grammar</em> parameter of the LexicalizedParser constructor.
+  ENGLISH_PCFG_MODEL = "$(ROOT)/englishPCFG.ser.gz"
+  # This function is executed once when the module is loaded.  It initializes
+  # the Java virtual machine in which the Stanford parser will run.  By
+  # default, it adds the parser installation root
+  # <tt>/usr/local/stanford-parser/current</tt> to the Java classpath and
+  # launches the VM with the arguments <tt>-server -Xmx150m</tt>.  Different
+  # values may be specified with the <tt>/etc/ruby-stanford-parser.yaml</tt>
+  # configuration file.
+  #
+  # This function returns the path of the parser installation root.
   def StanfordParser.initialize_on_load
     root = Pathname.new("/usr/local/stanford-parser/current")
+    jvmargs = ["-server", "-Xmx150m"]
     config = Pathname.new("/etc/ruby-stanford-parser.yaml")
     if config.file?
       configuration = open(config) {|f| YAML.load(f)}
-      if configuration.key?("root")
+      if configuration.key?("root") and not configuration["root"].nil?
         root = Pathname.new(configuration["root"])
       end
+      if configuration.key?("jvmargs") and not configuration["jvmargs"].nil?
+        jvmargs = configuration["jvmargs"].split
+      end
     end
-    Rjb::load(classpath = (root + "stanford-parser.jar").to_s)
+    Rjb::load(classpath = (root + "stanford-parser.jar").to_s, jvmargs)
     root
   end
@@ -189,13 +202,13 @@ module StanfordParser
     # Create the parser given a grammar and options.  The <em>grammar</em>
     # argument is a path to a grammar file.  This path may contain the string
     # <tt>$(ROOT)</tt>, which will be replaced with the root directory of the
-    # Stanford Parser. By default, an English grammar is loaded.
+    # Stanford Parser. By default, an English PCFG grammar is loaded.
     #
     # The <em>options</em> argument is a list of string arguments as they
     # would appear on a command line.  See the documentaion of
     # <tt>edu.stanford.nlp.parser.lexparser.Options.setOptions</tt> for more
     # details.
-    def initialize(grammar = "$(ROOT)/englishPCFG.ser.gz", options = [])
+    def initialize(grammar = ENGLISH_PCFG_MODEL, options = [])
       @grammar = Pathname.new(grammar.gsub(/\$\(ROOT\)/, ROOT))
       super("edu.stanford.nlp.parser.lexparser.LexicalizedParser", @grammar.to_s)
       @java_object.setOptionFlags(options)
@@ -207,10 +220,9 @@ module StanfordParser
   end # LexicalizedParser
-  # A parse tree that supports preorder enumeration via the Enumerable mixin.
-  #
-  # This is a wrapper for the
-  # <tt>edu.stanford.nlp.trees.Tree</tt> objects.
+  # This is a wrapper for
+  # <tt>edu.stanford.nlp.trees.Tree</tt> objects.  It customizes
+  # stringification.
   class Tree < JavaObjectWrapper
     def initialize(obj = "edu.stanford.nlp.trees.Tree")
       super(obj)
@@ -230,6 +242,26 @@ module StanfordParser
   end # Tree
+  # This is a wrapper for
+  # <tt>edu.stanford.nlp.ling.Word</tt> objects.  It customizes
+  # stringification and adds an equivalence operator.
+  class Word < JavaObjectWrapper
+    def initialize(obj = "edu.stanford.nlp.ling.Word", *args)
+      super(obj, *args)
+    end
+    # See the word values.
+    def inspect
+      to_s
+    end
+    # Equivalence is defined relative to the word value.
+    def ==(other)
+      word == other
+    end
+  end # Word
   # Tokenizes documents into words and sentences.
   #
   # This is a wrapper for the

data/test/test_stanfordparser.rb CHANGED Viewed

@@ -101,3 +101,14 @@ class DocumentPreprocessorTestCase < Test::Unit::TestCase
     assert_equal @preproc.map, []
   end
 end # DocumentPreprocessorTestCase
+class MiscPreprocessorTestCase < Test::Unit::TestCase
+  def test_model_location
+    assert_equal "$(ROOT)/englishPCFG.ser.gz", StanfordParser::ENGLISH_PCFG_MODEL
+  end
+  def test_word
+    assert StanfordParser::Word.new("edu.stanford.nlp.ling.Word", "dog") ==  "dog"
+  end
+end # MiscPreprocessorTestCase

metadata CHANGED Viewed

@@ -3,8 +3,8 @@ rubygems_version: 0.9.2
 specification_version: 1
 name: stanfordparser
 version: !ruby/object:Gem::Version
-  version: 1.1.0
-date: 2007-11-05 00:00:00 -08:00
+  version: 1.2.0
+date: 2007-12-18 00:00:00 -08:00
 summary: Ruby wrapper for the Stanford Natural Language Parser
 require_paths:
 - lib