stanfordparser 2.0.0 → 2.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +3 -3
- data/lib/stanfordparser.rb +19 -10
- metadata +2 -2
data/README
CHANGED
@@ -9,9 +9,9 @@ The Stanford Natural Language Parser is a Java implementation of a probabilistic
|
|
9
9
|
|
10
10
|
In addition to the Ruby gems it requires, to run this module you must manually install the {Stanford Natural Language Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
|
11
11
|
|
12
|
-
This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory. This is the directory that contains the <tt>stanford-parser.jar</tt> file. When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
|
12
|
+
This module expects the parser to be installed in the <tt>/usr/local/stanford-parser/current</tt> directory on UNIX platforms and in the <tt>C:\stanford-parser\current</tt> directory on Windows platforms. This is the directory that contains the <tt>stanford-parser.jar</tt> file. When the module is loaded, it adds this directory to the Java classpath and launches the Java VM with the arguments <tt>-server -Xmx150m</tt>.
|
13
13
|
|
14
|
-
These defaults can be overridden by creating
|
14
|
+
These defaults can be overridden by creating the configuration file <tt>/etc/ruby_stanford_parser.yaml</tt> on UNIX platforms and <tt>C:\stanford-parser\ruby-stanford-parser.yaml</tt> on Windows platforms. This file is in the Ruby YAML[http://ruby-doc.org/stdlib/libdoc/yaml/rdoc/index.html] format, and may contain two values: <tt>root</tt> and <tt>jvmargs</tt>. For example, the file might look like the following:
|
15
15
|
|
16
16
|
root: /usr/local/stanford-parser/other/location
|
17
17
|
jvmargs: -Xmx100m -verbose
|
@@ -108,7 +108,7 @@ Unlike their parents StanfordParser::DocumentPreprocessor and StanfordParser::Le
|
|
108
108
|
1.1.0:: Make module initialization function private. Add example code.
|
109
109
|
1.2.0:: Read Java VM arguments from the configuration file. Add Word class.
|
110
110
|
2.0.0:: Add support for standoff parsing. Change the way Rjb::JavaObjectWrapper wraps returned values: see wrap_java_object for details. Rjb::JavaObjectWrapper supports static members. Minor changes to stanford-sentence-parser script.
|
111
|
-
|
111
|
+
2.1.0:: Different default paths for Windows machines; Minor changes to StandoffToken definition
|
112
112
|
|
113
113
|
= Copyright
|
114
114
|
|
data/lib/stanfordparser.rb
CHANGED
@@ -34,7 +34,7 @@ require "java_object.rb"
|
|
34
34
|
# Parser}[http://nlp.stanford.edu/downloads/lex-parser.shtml].
|
35
35
|
module StanfordParser
|
36
36
|
|
37
|
-
VERSION = "2.
|
37
|
+
VERSION = "2.1.0"
|
38
38
|
|
39
39
|
# The default sentence segmenter and tokenizer. This is an English-language
|
40
40
|
# tokenizer with support for Penn Treebank markup.
|
@@ -47,17 +47,28 @@ module StanfordParser
|
|
47
47
|
|
48
48
|
# This function is executed once when the module is loaded. It initializes
|
49
49
|
# the Java virtual machine in which the Stanford parser will run. By
|
50
|
-
# default, it adds the parser installation root
|
51
|
-
# <tt>/usr/local/stanford-parser/current</tt> to the Java classpath and
|
50
|
+
# default, it adds the parser installation root to the Java classpath and
|
52
51
|
# launches the VM with the arguments <tt>-server -Xmx150m</tt>. Different
|
53
|
-
# values may be specified with the <tt
|
52
|
+
# values may be specified with the <tt>ruby-stanford-parser.yaml</tt>
|
54
53
|
# configuration file.
|
55
54
|
#
|
55
|
+
# This function determines which operating system we are running on and sets
|
56
|
+
# default pathnames accordingly:
|
57
|
+
#
|
58
|
+
# UNIX:: /usr/local/stanford-parser/current, /etc/ruby-stanford-parser.yaml
|
59
|
+
# Windows:: C:\stanford-parser\current,
|
60
|
+
# C:\stanford-parser\ruby-stanford-parser.yaml
|
61
|
+
#
|
56
62
|
# This function returns the path of the parser installation root.
|
57
63
|
def StanfordParser.initialize_on_load
|
58
|
-
|
64
|
+
if RUBY_PLATFORM =~ /(win|w)32$/
|
65
|
+
root = Pathname.new("C:\\stanford-parser\\current")
|
66
|
+
config = Pathname.new("C:\\stanford-parser\\ruby-stanford-parser.yaml")
|
67
|
+
else
|
68
|
+
root = Pathname.new("/usr/local/stanford-parser/current")
|
69
|
+
config = Pathname.new("/etc/ruby-stanford-parser.yaml")
|
70
|
+
end
|
59
71
|
jvmargs = ["-server", "-Xmx150m"]
|
60
|
-
config = Pathname.new("/etc/ruby-stanford-parser.yaml")
|
61
72
|
if config.file?
|
62
73
|
configuration = open(config) {|f| YAML.load(f)}
|
63
74
|
if configuration.key?("root") and not configuration["root"].nil?
|
@@ -243,14 +254,12 @@ module StanfordParser
|
|
243
254
|
end
|
244
255
|
end # DocumentPreprocessor
|
245
256
|
|
246
|
-
StandoffToken = Struct.new(:current, :word, :before, :after,
|
247
|
-
:begin_position, :end_position)
|
248
|
-
|
249
257
|
# A text token that contains raw and normalized token identity (.e.g "(" and
|
250
258
|
# "-LRB-"), an offset span, and the characters immediately preceding and
|
251
259
|
# following the token. Given a list of these objects it is possible to
|
252
260
|
# recreate the text from which they came verbatim.
|
253
|
-
class StandoffToken
|
261
|
+
class StandoffToken < Struct.new(:current, :word, :before, :after,
|
262
|
+
:begin_position, :end_position)
|
254
263
|
def to_s
|
255
264
|
"#{current} [#{begin_position},#{end_position}]"
|
256
265
|
end
|
metadata
CHANGED
@@ -3,8 +3,8 @@ rubygems_version: 0.9.2
|
|
3
3
|
specification_version: 1
|
4
4
|
name: stanfordparser
|
5
5
|
version: !ruby/object:Gem::Version
|
6
|
-
version: 2.
|
7
|
-
date: 2008-06-
|
6
|
+
version: 2.1.0
|
7
|
+
date: 2008-06-23 00:00:00 -07:00
|
8
8
|
summary: Ruby wrapper for the Stanford Natural Language Parser
|
9
9
|
require_paths:
|
10
10
|
- lib
|