RubyGems - stanfordparser - Versions diffs - 1.1.0 → 1.2.0 - Mend

stanfordparser 1.1.0 → 1.2.0

Files changed (4) hide show

data/README CHANGED Viewed

@@ -4,17 +4,21 @@ This module is a wrapper for the {Stanford Natural Language Parser}[http://nlp.s
 The Stanford Natural Language Parser is a Java implementation of a probabilistic PCFG and dependency parser for English, German, Chinese, and Arabic.  This module provides a thin wrapper around the Java code to make it accessible from Ruby.
-= Installation
+= Installation and Configuration
 To run this module you must install the following additional software
 * The {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml]
 * The {Ruby Java Bridge}[http://rjb.rubyforge.org/] gem.
-This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory.  This is the directory that contains the <tt>stanford-parser.jar</tt> file.  An alternate directory can be specified with a <tt>/etc/ruby-stanford-parser.yaml</tt> configuration file.  This file is in the Ruby YAML[http://www.ruby-doc.org/core/classes/YAML.html] format, and contains a single <tt>root</tt> value, for example:
+Note that the Stanford Parser is not a Ruby application and is therefore not a Ruby gem and must be manually installed.
-	root: /usr/local/stanford-parser/other/location
+This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory.  This is the directory that contains the <tt>stanford-parser.jar</tt> file.  When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
+These defaults can be overridden by creating a configuration file in <tt>/etc/ruby_stanford_parser.yaml</tt>.  This file is in the Ruby YAML[http://www.ruby-doc.org/core/classes/YAML.html] format, and may contain two values: <tt>root</tt> and <tt>jvmargs</tt>. For example, the file might look like the following:
+	root: /usr/local/stanford-parser/other/location
+	jvmargs: -Xmx100m -verbose
 =Usage
@@ -36,9 +40,9 @@ Use the StanfordParser::LexicalizedParser class to parse sentences.
 Use the StanfordParser::DocumentPreprocessor class to tokenize text and files into words or sentences.
 	irb(main):004:0> preproc = StanfordParser::DocumentPreprocessor.new
-	irb(main):008:0> puts preproc.getSentencesFromString("This is a sentence.  So is this.")
-	This is a sentence .
-	So is this .
+	irb(main):008:0> puts preproc.getSentencesFromString("This is a sentence.  So is this.")
+	This is a sentence .
+	So is this .
 For complete details about the use of these classes, see the documentation on the Stanford Natural Language Parser website.
@@ -46,7 +50,8 @@ For complete details about the use of these classes, see the documentation on th
 = History
 1.0.0:: Initial release
-1.1.0:: Make module initialization function private
+1.1.0:: Make module initialization function private.  Add example code.
+1.2.0:: Read Java VM arguments from the configuration file.  Add Word class.
 = Copyright

data/lib/stanfordparser.rb CHANGED Viewed

@@ -104,8 +104,9 @@ module Rjb
       end # if
     end # wrap_java_object
-    # By default, all RJB classes other than <tt>java.util.ArrayList</tt> go
-    # in a generic wrapper.  Derived classes may change this behavior.
+    # By default, all RJB classes other than <tt>java.util.ArrayList</tt> and
+    # <tt>java.util.HashSet</tt> go in a generic wrapper.  Derived classes may
+    # change this behavior.
     def wrap_rjb_object(object)
       JavaObjectWrapper.new(object)
     end
@@ -131,24 +132,36 @@ end # Rjb
 # Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
 module StanfordParser
-  VERSION = "1.1.0"
-  # This function is executed once when the module is loaded.  It adds the
-  # Stanford parser jarfile to the JVM classpath and return the root of the
-  # parser installation.  The root of the installation may be written in a
-  # YAML file in <tt>/etc/ruby_stanford_parser.yaml</tt>.  If this file is not
-  # present, the default root <tt>/usr/local/stanford-parser/current</tt> is
-  # used.
+  VERSION = "1.2.0"
+  # Path to an English PCFG model that comes with the Stanford Parser.  The
+  # location is relative to the parser root directory.  This is a valid value
+  # for the <em>grammar</em> parameter of the LexicalizedParser constructor.
+  ENGLISH_PCFG_MODEL = "$(ROOT)/englishPCFG.ser.gz"
+  # This function is executed once when the module is loaded.  It initializes
+  # the Java virtual machine in which the Stanford parser will run.  By
+  # default, it adds the parser installation root
+  # <tt>/usr/local/stanford-parser/current</tt> to the Java classpath and
+  # launches the VM with the arguments <tt>-server -Xmx150m</tt>.  Different
+  # values may be specified with the <tt>/etc/ruby-stanford-parser.yaml</tt>
+  # configuration file.
+  #
+  # This function returns the path of the parser installation root.
   def StanfordParser.initialize_on_load
     root = Pathname.new("/usr/local/stanford-parser/current")
+    jvmargs = ["-server", "-Xmx150m"]
     config = Pathname.new("/etc/ruby-stanford-parser.yaml")
     if config.file?
       configuration = open(config) {|f| YAML.load(f)}
-      if configuration.key?("root")
+      if configuration.key?("root") and not configuration["root"].nil?
         root = Pathname.new(configuration["root"])
       end
+      if configuration.key?("jvmargs") and not configuration["jvmargs"].nil?
+        jvmargs = configuration["jvmargs"].split
+      end
     end
-    Rjb::load(classpath = (root + "stanford-parser.jar").to_s)
+    Rjb::load(classpath = (root + "stanford-parser.jar").to_s, jvmargs)
     root
   end
@@ -189,13 +202,13 @@ module StanfordParser
     # Create the parser given a grammar and options.  The <em>grammar</em>
     # argument is a path to a grammar file.  This path may contain the string
     # <tt>$(ROOT)</tt>, which will be replaced with the root directory of the
-    # Stanford Parser. By default, an English grammar is loaded.
+    # Stanford Parser. By default, an English PCFG grammar is loaded.
     #
     # The <em>options</em> argument is a list of string arguments as they
     # would appear on a command line.  See the documentaion of
     # <tt>edu.stanford.nlp.parser.lexparser.Options.setOptions</tt> for more
     # details.
-    def initialize(grammar = "$(ROOT)/englishPCFG.ser.gz", options = [])
+    def initialize(grammar = ENGLISH_PCFG_MODEL, options = [])
       @grammar = Pathname.new(grammar.gsub(/\$\(ROOT\)/, ROOT))
       super("edu.stanford.nlp.parser.lexparser.LexicalizedParser", @grammar.to_s)
       @java_object.setOptionFlags(options)
@@ -207,10 +220,9 @@ module StanfordParser
   end # LexicalizedParser
-  # A parse tree that supports preorder enumeration via the Enumerable mixin.
-  #
-  # This is a wrapper for the
-  # <tt>edu.stanford.nlp.trees.Tree</tt> objects.
+  # This is a wrapper for
+  # <tt>edu.stanford.nlp.trees.Tree</tt> objects.  It customizes
+  # stringification.
   class Tree < JavaObjectWrapper
     def initialize(obj = "edu.stanford.nlp.trees.Tree")
       super(obj)
@@ -230,6 +242,26 @@ module StanfordParser
   end # Tree
+  # This is a wrapper for
+  # <tt>edu.stanford.nlp.ling.Word</tt> objects.  It customizes
+  # stringification and adds an equivalence operator.
+  class Word < JavaObjectWrapper
+    def initialize(obj = "edu.stanford.nlp.ling.Word", *args)
+      super(obj, *args)
+    end
+    # See the word values.
+    def inspect
+      to_s
+    end
+    # Equivalence is defined relative to the word value.
+    def ==(other)
+      word == other
+    end
+  end # Word
   # Tokenizes documents into words and sentences.
   #
   # This is a wrapper for the

data/test/test_stanfordparser.rb CHANGED Viewed

@@ -101,3 +101,14 @@ class DocumentPreprocessorTestCase < Test::Unit::TestCase
     assert_equal @preproc.map, []
   end
 end # DocumentPreprocessorTestCase
+class MiscPreprocessorTestCase < Test::Unit::TestCase
+  def test_model_location
+    assert_equal "$(ROOT)/englishPCFG.ser.gz", StanfordParser::ENGLISH_PCFG_MODEL
+  end
+  def test_word
+    assert StanfordParser::Word.new("edu.stanford.nlp.ling.Word", "dog") ==  "dog"
+  end
+end # MiscPreprocessorTestCase

metadata CHANGED Viewed

@@ -3,8 +3,8 @@ rubygems_version: 0.9.2
 specification_version: 1
 name: stanfordparser
 version: !ruby/object:Gem::Version
-  version: 1.1.0
-date: 2007-11-05 00:00:00 -08:00
+  version: 1.2.0
+date: 2007-12-18 00:00:00 -08:00
 summary: Ruby wrapper for the Stanford Natural Language Parser
 require_paths:
 - lib