stanford-core-nlp 0.3.5 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +10 -11
- data/lib/stanford-core-nlp.rb +100 -56
- data/lib/stanford-core-nlp/config.rb +7 -73
- data/lib/stanford-core-nlp/jruby_bridge.rb +41 -0
- data/lib/stanford-core-nlp/rjb_bridge.rb +42 -0
- metadata +22 -20
- data/lib/stanford-core-nlp/bridge.rb +0 -40
data/README.md
CHANGED
@@ -1,18 +1,17 @@
|
|
1
1
|
**About**
|
2
2
|
|
3
|
-
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, part-of-speech tagging, lemmatization, and parsing of
|
3
|
+
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English. This gem is compatible with JRuby 1.6.4 and above, as well as Ruby 1.9.2 and 1.9.3 (through Rjb).
|
4
4
|
|
5
|
-
If you are looking for
|
5
|
+
This gem only provides a thin wrapper over the Stanford Core NLP API. If you are looking for a Ruby natural language processing framework, have a look at [Treat](https://github.com/louismullie/treat).
|
6
6
|
|
7
7
|
**Installing**
|
8
8
|
|
9
|
-
_Note:
|
9
|
+
_Note: If you are running on MRI, this gem will use the Ruby-Java Bridge (Rjb), which currently does not support Java 7. Therefore, if you have installed Java 7, you should set your JAVA_HOME to point to your old Java 6 install before installing Rjb; for example, `export "JAVA_HOME=/usr/lib/jvm/java-6-openjdk/"`._
|
10
10
|
|
11
11
|
First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Three different packages are available:
|
12
12
|
|
13
|
-
* A [minimal package
|
14
|
-
* A [full package
|
15
|
-
* A [full package for all languages](http://louismullie.com/treat/stanford-core-nlp-all.zip), including tagger and parser models for English, French, German, Arabic and Chinese.
|
13
|
+
* A [minimal package](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with the default tagger and parser models for English, French and German.
|
14
|
+
* A [full package](http://louismullie.com/treat/stanford-core-nlp-all.zip), with all of the tagger and parser models for English, French and German, as well as named entity and coreference resolution models for English.
|
16
15
|
|
17
16
|
Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/).
|
18
17
|
|
@@ -38,7 +37,7 @@ StanfordCoreNLP.jvm_args = ['-option1', '-option2']
|
|
38
37
|
StanfordCoreNLP.log_file = 'log.txt'
|
39
38
|
|
40
39
|
# Use the model files for a different language than English.
|
41
|
-
StanfordCoreNLP.use(:french)
|
40
|
+
StanfordCoreNLP.use(:french) # or :german
|
42
41
|
|
43
42
|
# Change a specific model file.
|
44
43
|
StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
|
@@ -52,7 +51,7 @@ text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
|
|
52
51
|
'looked pleased, but Merkel was dismayed.'
|
53
52
|
|
54
53
|
pipeline = StanfordCoreNLP.load(:tokenize, :ssplit, :pos, :lemma, :parse, :ner, :dcoref)
|
55
|
-
text = StanfordCoreNLP::
|
54
|
+
text = StanfordCoreNLP::Annotation.new(text)
|
56
55
|
pipeline.annotate(text)
|
57
56
|
|
58
57
|
text.get(:sentences).each do |sentence|
|
@@ -71,13 +70,13 @@ text.get(:sentences).each do |sentence|
|
|
71
70
|
# Named entity tag
|
72
71
|
puts token.get(:named_entity_tag).to_s
|
73
72
|
# Coreference
|
74
|
-
|
73
|
+
puts token.get(:coref_cluster_id).to_s
|
75
74
|
# Also of interest: coref, coref_chain, coref_cluster, coref_dest, coref_graph.
|
76
75
|
end
|
77
76
|
end
|
78
77
|
```
|
79
78
|
|
80
|
-
> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::
|
79
|
+
> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Annotation class.
|
81
80
|
|
82
81
|
A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
|
83
82
|
|
@@ -124,7 +123,7 @@ Here is a full list of the default models for the Stanford Core NLP pipeline. Yo
|
|
124
123
|
* 'ner.model.3class' - 'all.3class.distsim.crf.ser.gz'
|
125
124
|
* 'ner.model.7class' - 'muc.7class.distsim.crf.ser.gz'
|
126
125
|
* 'ner.model.MISCclass' -- 'conll.4class.distsim.crf.ser.gz'
|
127
|
-
* '
|
126
|
+
* 'parse.model' - 'englishPCFG.ser.gz'
|
128
127
|
* 'dcoref.demonym' - 'demonyms.txt'
|
129
128
|
* 'dcoref.animate' - 'animate.unigrams.txt'
|
130
129
|
* 'dcoref.female' - 'female.unigrams.txt'
|
data/lib/stanford-core-nlp.rb
CHANGED
@@ -1,57 +1,56 @@
|
|
1
|
+
require 'stanford-core-nlp/config'
|
2
|
+
|
1
3
|
module StanfordCoreNLP
|
2
4
|
|
3
|
-
VERSION = '0.
|
5
|
+
VERSION = '0.4.0'
|
4
6
|
|
5
7
|
require 'bind-it'
|
6
8
|
extend BindIt::Binding
|
7
|
-
|
9
|
+
|
8
10
|
# ############################ #
|
9
11
|
# BindIt Configuration Options #
|
10
12
|
# ############################ #
|
11
|
-
|
12
|
-
# The default path for the JAR files
|
13
|
+
|
14
|
+
# The default path for the JAR files
|
13
15
|
# is the gem's bin folder.
|
14
|
-
self.jar_path = File.dirname(__FILE__).
|
15
|
-
|
16
|
-
|
16
|
+
self.jar_path = File.dirname(__FILE__).gsub(/\/lib\z/, '') + '/bins/'
|
17
|
+
|
18
|
+
# Default namespace is the Stanford pipeline namespace.
|
19
|
+
self.default_namespace = 'edu.stanford.nlp.pipeline'
|
20
|
+
|
17
21
|
# Load the JVM with a minimum heap size of 512MB,
|
18
22
|
# and a maximum heap size of 1024MB.
|
19
|
-
|
20
|
-
|
23
|
+
StanfordCoreNLP.jvm_args = ['-Xms512M', '-Xmx1024M']
|
24
|
+
|
21
25
|
# Turn logging off by default.
|
22
|
-
|
23
|
-
|
26
|
+
StanfordCoreNLP.log_file = nil
|
27
|
+
|
24
28
|
# Default JAR files to load.
|
25
|
-
|
26
|
-
'joda-time.jar',
|
27
|
-
'xom.jar',
|
29
|
+
StanfordCoreNLP.default_jars = [
|
30
|
+
'joda-time.jar',
|
31
|
+
'xom.jar',
|
28
32
|
'stanford-parser.jar',
|
29
|
-
'stanford-corenlp.jar',
|
33
|
+
'stanford-corenlp.jar',
|
34
|
+
'stanford-segmenter.jar',
|
30
35
|
'bridge.jar'
|
31
36
|
]
|
32
|
-
|
37
|
+
|
33
38
|
# Default classes to load.
|
34
|
-
|
39
|
+
StanfordCoreNLP.default_classes = [
|
35
40
|
['StanfordCoreNLP', 'edu.stanford.nlp.pipeline', 'CoreNLP'],
|
36
|
-
['Annotation', 'edu.stanford.nlp.pipeline'
|
41
|
+
['Annotation', 'edu.stanford.nlp.pipeline'],
|
37
42
|
['Word', 'edu.stanford.nlp.ling'],
|
43
|
+
['CoreLabel', 'edu.stanford.nlp.ling'],
|
38
44
|
['MaxentTagger', 'edu.stanford.nlp.tagger.maxent'],
|
39
45
|
['CRFClassifier', 'edu.stanford.nlp.ie.crf'],
|
40
46
|
['Properties', 'java.util'],
|
41
|
-
['ArrayList', 'java.util']
|
42
|
-
['AnnotationBridge', '']
|
47
|
+
['ArrayList', 'java.util']
|
43
48
|
]
|
44
|
-
|
45
|
-
# Default namespace is the Stanford pipeline namespace.
|
46
|
-
self.default_namespace = 'edu.stanford.nlp.pipeline'
|
47
|
-
|
49
|
+
|
48
50
|
# ########################### #
|
49
51
|
# Stanford Core NLP bindings #
|
50
52
|
# ########################### #
|
51
|
-
|
52
|
-
require 'stanford-core-nlp/config'
|
53
|
-
require 'stanford-core-nlp/bridge'
|
54
|
-
|
53
|
+
|
55
54
|
class << self
|
56
55
|
# The model file names for a given language.
|
57
56
|
attr_accessor :model_files
|
@@ -60,12 +59,28 @@ module StanfordCoreNLP
|
|
60
59
|
# Store the language currently being used.
|
61
60
|
attr_accessor :language
|
62
61
|
end
|
63
|
-
|
62
|
+
|
64
63
|
# The path to the main folder containing the folders
|
65
64
|
# with the individual models inside. By default, this
|
66
65
|
# is the same as the JAR path.
|
67
66
|
self.model_path = self.jar_path
|
68
67
|
|
68
|
+
# ########################### #
|
69
|
+
# Annotation bridge (Rjb/Jrb) #
|
70
|
+
# ########################### #
|
71
|
+
|
72
|
+
if RUBY_PLATFORM =~ /java/
|
73
|
+
require 'stanford-core-nlp/jruby_bridge'
|
74
|
+
extend StanfordCoreNLP::JrubyBridge
|
75
|
+
else
|
76
|
+
require 'stanford-core-nlp/rjb_bridge'
|
77
|
+
extend StanfordCoreNLP::RjbBridge
|
78
|
+
end
|
79
|
+
|
80
|
+
# ########################### #
|
81
|
+
# Public configuration params #
|
82
|
+
# ########################### #
|
83
|
+
|
69
84
|
# Use models for a given language. Language can be
|
70
85
|
# supplied as full-length, or ISO-639 2 or 3 letter
|
71
86
|
# code (e.g. :english, :eng or :en will work).
|
@@ -83,40 +98,47 @@ module StanfordCoreNLP
|
|
83
98
|
n = n.to_s
|
84
99
|
n += '.model' if n == 'ner'
|
85
100
|
models.each do |m, file|
|
86
|
-
self.model_files["#{n}.#{m}"] =
|
87
|
-
folder + file
|
101
|
+
self.model_files["#{n}.#{m}"] = folder + file
|
88
102
|
end
|
89
103
|
elsif models.is_a?(String)
|
90
|
-
self.model_files["#{n}.model"] =
|
91
|
-
folder + models
|
104
|
+
self.model_files["#{n}.model"] = folder + models
|
92
105
|
end
|
93
106
|
end
|
94
107
|
end
|
95
108
|
|
96
109
|
# Use english by default.
|
97
110
|
self.use :english
|
98
|
-
|
99
|
-
# Set a model file.
|
111
|
+
|
112
|
+
# Set a model file.
|
100
113
|
def self.set_model(name, file)
|
101
114
|
n = name.split('.')[0].intern
|
102
|
-
self.model_files[name] =
|
103
|
-
Config::ModelFolders[n] + file
|
115
|
+
self.model_files[name] = Config::ModelFolders[n] + file
|
104
116
|
end
|
105
117
|
|
118
|
+
# ########################### #
|
119
|
+
# Public API methods #
|
120
|
+
# ########################### #
|
121
|
+
|
106
122
|
# Load a StanfordCoreNLP pipeline with the
|
107
123
|
# specified JVM flags and StanfordCoreNLP
|
108
124
|
# properties.
|
109
125
|
def self.load(*annotators)
|
110
|
-
|
126
|
+
|
111
127
|
# Take care of Windows users.
|
112
128
|
if self.running_on_windows?
|
113
129
|
self.jar_path.gsub!('/', '\\')
|
114
130
|
self.model_path.gsub!('/', '\\')
|
115
131
|
end
|
116
|
-
|
132
|
+
|
117
133
|
# Make the bindings.
|
118
134
|
self.bind
|
119
|
-
|
135
|
+
|
136
|
+
# Bind annotation bridge.
|
137
|
+
self.default_classes.each do |info|
|
138
|
+
klass = const_get(info.first)
|
139
|
+
self.inject_get_method(klass)
|
140
|
+
end
|
141
|
+
|
120
142
|
# Prepend the JAR path to the model files.
|
121
143
|
properties = {}
|
122
144
|
self.model_files.each do |k,v|
|
@@ -129,26 +151,42 @@ module StanfordCoreNLP
|
|
129
151
|
f = self.model_path + v
|
130
152
|
unless File.readable?(f)
|
131
153
|
raise "Model file #{f} could not be found. " +
|
132
|
-
"You may need to download this file manually "+
|
133
|
-
"
|
154
|
+
"You may need to download this file manually " +
|
155
|
+
"and/or set paths properly."
|
134
156
|
end
|
135
157
|
properties[k] = f
|
136
158
|
end
|
159
|
+
|
160
|
+
properties['annotators'] = annotators.map { |x| x.to_s }.join(', ')
|
137
161
|
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
|
142
|
-
|
143
|
-
|
162
|
+
unless self.language == :english
|
163
|
+
# Bug fix for French/German parsers.
|
164
|
+
# Otherwise throws "IllegalArgumentException:
|
165
|
+
# Unknown option: -retainTmpSubcategories"
|
166
|
+
properties['parse.flags'] = ''
|
167
|
+
# Bug fix for French/German parsers.
|
168
|
+
# Otherswise throws java.lang.NullPointerException: null.
|
169
|
+
properties['parse.buildgraphs'] = 'false'
|
170
|
+
end
|
171
|
+
|
172
|
+
# Hack for Rjb compatibility.
|
173
|
+
const_get(:CoreNLP).new(get_properties(properties))
|
174
|
+
|
175
|
+
end
|
176
|
+
|
177
|
+
# Hack in order not to break backwards compatibility.
|
178
|
+
def self.const_missing(const)
|
179
|
+
if const == :Text
|
180
|
+
puts "WARNING: StanfordCoreNLP::Text has been deprecated." +
|
181
|
+
"Please use StanfordCoreNLP::Annotation instead."
|
182
|
+
Annotation
|
183
|
+
else
|
184
|
+
super(const)
|
144
185
|
end
|
145
|
-
|
146
|
-
properties['annotators'] =
|
147
|
-
annotators.map { |x| x.to_s }.join(', ')
|
148
|
-
|
149
|
-
CoreNLP.new(get_properties(properties))
|
150
186
|
end
|
151
187
|
|
188
|
+
private
|
189
|
+
|
152
190
|
# Create a java.util.Properties object from a hash.
|
153
191
|
def self.get_properties(properties)
|
154
192
|
props = Properties.new
|
@@ -157,13 +195,13 @@ module StanfordCoreNLP
|
|
157
195
|
end
|
158
196
|
props
|
159
197
|
end
|
160
|
-
|
198
|
+
|
161
199
|
# Get a Java ArrayList binding to pass lists
|
162
200
|
# of tokens to the Stanford Core NLP process.
|
163
201
|
def self.get_list(tokens)
|
164
202
|
list = StanfordCoreNLP::ArrayList.new
|
165
203
|
tokens.each do |t|
|
166
|
-
list.add(
|
204
|
+
list.add(Word.new(t.to_s))
|
167
205
|
end
|
168
206
|
list
|
169
207
|
end
|
@@ -173,4 +211,10 @@ module StanfordCoreNLP
|
|
173
211
|
RUBY_PLATFORM.split("-")[1] == 'mswin32'
|
174
212
|
end
|
175
213
|
|
176
|
-
|
214
|
+
# camel_case which also support dot as separator
|
215
|
+
def self.camel_case(s)
|
216
|
+
s = s.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }
|
217
|
+
s.gsub(/(?:^|_|\.)(.)/) { $1.upcase }
|
218
|
+
end
|
219
|
+
|
220
|
+
end
|
@@ -7,15 +7,13 @@ module StanfordCoreNLP
|
|
7
7
|
LanguageCodes = {
|
8
8
|
:english => [:en, :eng, :english],
|
9
9
|
:german => [:de, :ger, :german],
|
10
|
-
:french => [:fr, :fre, :french]
|
11
|
-
:arabic => [:ar, :ara, :arabic],
|
12
|
-
:chinese => [:ch, :chi, :chinese]
|
10
|
+
:french => [:fr, :fre, :french]
|
13
11
|
}
|
14
12
|
|
15
13
|
# Folders inside the JAR path for the models.
|
16
14
|
ModelFolders = {
|
17
15
|
:pos => 'taggers/',
|
18
|
-
:
|
16
|
+
:parse => 'grammar/',
|
19
17
|
:ner => 'classifiers/',
|
20
18
|
:dcoref => 'dcoref/'
|
21
19
|
}
|
@@ -24,7 +22,6 @@ module StanfordCoreNLP
|
|
24
22
|
TagSets = {
|
25
23
|
:english => :penn,
|
26
24
|
:german => :stutgart,
|
27
|
-
:chinese => :chinese,
|
28
25
|
:french => :paris7
|
29
26
|
}
|
30
27
|
|
@@ -34,17 +31,13 @@ module StanfordCoreNLP
|
|
34
31
|
:pos => {
|
35
32
|
:english => 'english-left3words-distsim.tagger',
|
36
33
|
:german => 'german-fast.tagger',
|
37
|
-
:french => 'french.tagger'
|
38
|
-
:arabic => 'arabic-fast.tagger',
|
39
|
-
:chinese => 'chinese.tagger'
|
34
|
+
:french => 'french.tagger'
|
40
35
|
},
|
41
36
|
|
42
|
-
:
|
37
|
+
:parse => {
|
43
38
|
:english => 'englishPCFG.ser.gz',
|
44
39
|
:german => 'germanPCFG.ser.gz',
|
45
|
-
:french => 'frenchFactored.ser.gz'
|
46
|
-
:arabic => 'arabicFactored.ser.gz',
|
47
|
-
:chinese => 'chinesePCFG.ser.gz'
|
40
|
+
:french => 'frenchFactored.ser.gz'
|
48
41
|
},
|
49
42
|
|
50
43
|
:ner => {
|
@@ -54,9 +47,7 @@ module StanfordCoreNLP
|
|
54
47
|
'MISCclass' => 'conll.4class.distsim.crf.ser.gz'
|
55
48
|
},
|
56
49
|
:german => {},
|
57
|
-
:french => {}
|
58
|
-
:arabic => {},
|
59
|
-
:chinese => {}
|
50
|
+
:french => {}
|
60
51
|
},
|
61
52
|
|
62
53
|
:dcoref => {
|
@@ -75,9 +66,7 @@ module StanfordCoreNLP
|
|
75
66
|
'extra.gender' => 'namegender.combine.txt'
|
76
67
|
},
|
77
68
|
:german => {},
|
78
|
-
:french => {}
|
79
|
-
:arabic => {},
|
80
|
-
:chinese => {}
|
69
|
+
:french => {}
|
81
70
|
}
|
82
71
|
|
83
72
|
# Models to add.
|
@@ -92,61 +81,6 @@ module StanfordCoreNLP
|
|
92
81
|
# List of annotations by JAVA class path.
|
93
82
|
Annotations = {
|
94
83
|
|
95
|
-
'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
|
96
|
-
'AdjectivalModifierGRAnnotation',
|
97
|
-
'AdverbialModifierGRAnnotation',
|
98
|
-
'ArgumentGRAnnotation',
|
99
|
-
'AspectMarkerGRAnnotation',
|
100
|
-
'AssociativeMarkerGRAnnotation',
|
101
|
-
'AssociativeModifierGRAnnotation',
|
102
|
-
'AttributiveGRAnnotation',
|
103
|
-
'AuxModifierGRAnnotation',
|
104
|
-
'AuxPassiveGRAnnotation',
|
105
|
-
'BaGRAnnotation',
|
106
|
-
'ClausalComplementGRAnnotation',
|
107
|
-
'ClausalSubjectGRAnnotation',
|
108
|
-
'ClauseModifierGRAnnotation',
|
109
|
-
'ComplementGRAnnotation',
|
110
|
-
'ComplementizerGRAnnotation',
|
111
|
-
'ControllingSubjectGRAnnotation',
|
112
|
-
'CoordinationGRAnnotation',
|
113
|
-
'DeterminerGRAnnotation',
|
114
|
-
'DirectObjectGRAnnotation',
|
115
|
-
'DvpMarkerGRAnnotation',
|
116
|
-
'DvpModifierGRAnnotation',
|
117
|
-
'EtcGRAnnotation',
|
118
|
-
'LocalizerComplementGRAnnotation',
|
119
|
-
'ModalGRAnnotation',
|
120
|
-
'ModifierGRAnnotation',
|
121
|
-
'NegationModifierGRAnnotation',
|
122
|
-
'NominalPassiveSubjectGRAnnotation',
|
123
|
-
'NominalSubjectGRAnnotation',
|
124
|
-
'NounCompoundModifierGRAnnotation',
|
125
|
-
'NumberModifierGRAnnotation',
|
126
|
-
'NumericModifierGRAnnotation',
|
127
|
-
'ObjectGRAnnotation',
|
128
|
-
'OrdNumberGRAnnotation',
|
129
|
-
'ParentheticalGRAnnotation',
|
130
|
-
'ParticipialModifierGRAnnotation',
|
131
|
-
'PreconjunctGRAnnotation',
|
132
|
-
'PrepositionalLocalizerModifierGRAnnotation',
|
133
|
-
'PrepositionalModifierGRAnnotation',
|
134
|
-
'PrepositionalObjectGRAnnotation',
|
135
|
-
'PunctuationGRAnnotation',
|
136
|
-
'RangeGRAnnotation',
|
137
|
-
'RelativeClauseModifierGRAnnotation',
|
138
|
-
'ResultativeComplementGRAnnotation',
|
139
|
-
'SemanticDependentGRAnnotation',
|
140
|
-
'SubjectGRAnnotation',
|
141
|
-
'TemporalClauseGRAnnotation',
|
142
|
-
'TemporalGRAnnotation',
|
143
|
-
'TimePostpositionGRAnnotation',
|
144
|
-
'TopicGRAnnotation',
|
145
|
-
'VerbCompoundGRAnnotation',
|
146
|
-
'VerbModifierGRAnnotation',
|
147
|
-
'XClausalComplementGRAnnotation'
|
148
|
-
],
|
149
|
-
|
150
84
|
'nlp.dcoref.CoNLL2011DocumentReader' => [
|
151
85
|
'CorefMentionAnnotation',
|
152
86
|
'NamedEntityAnnotation'
|
@@ -0,0 +1,41 @@
|
|
1
|
+
module StanfordCoreNLP::JrubyBridge
|
2
|
+
|
3
|
+
def inject_get_method(klass)
|
4
|
+
return unless klass.method_defined?(:get)
|
5
|
+
klass.class_eval do
|
6
|
+
|
7
|
+
# Dynamically defined on all proxied annotation classes.
|
8
|
+
# Get an annotation using the annotation bridge.
|
9
|
+
def get_with_casting(annotation, anno_base = nil)
|
10
|
+
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
11
|
+
if anno_base
|
12
|
+
unless StanfordNLP::Config::Annotations[anno_base]
|
13
|
+
raise "The path #{anno_base} doesn't exist."
|
14
|
+
end
|
15
|
+
anno_bases = [anno_base]
|
16
|
+
else
|
17
|
+
anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
|
18
|
+
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
19
|
+
end
|
20
|
+
if anno_bases.size > 1
|
21
|
+
msg = "There are many different annotations bearing the name #{anno_class}. \nPlease specify one of the following base classes as second parameter to disambiguate: "
|
22
|
+
msg << anno_bases.join(',')
|
23
|
+
raise msg
|
24
|
+
else
|
25
|
+
base_class = anno_bases[0]
|
26
|
+
end
|
27
|
+
|
28
|
+
fqcn = "edu.stanford.#{base_class}"
|
29
|
+
class_path = fqcn.split(".")
|
30
|
+
class_name = class_path.pop
|
31
|
+
jruby_class = "Java::#{StanfordCoreNLP.camel_case(class_path.join("."))}::#{class_name}::#{anno_class}"
|
32
|
+
|
33
|
+
get_without_casting(Object.module_eval(jruby_class))
|
34
|
+
end
|
35
|
+
|
36
|
+
alias_method :get_without_casting, :get
|
37
|
+
alias_method :get, :get_with_casting
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
end
|
@@ -0,0 +1,42 @@
|
|
1
|
+
module StanfordCoreNLP::RjbBridge
|
2
|
+
|
3
|
+
StanfordCoreNLP.default_classes << ['AnnotationBridge', '']
|
4
|
+
|
5
|
+
def inject_get_method(klass)
|
6
|
+
klass.class_eval do
|
7
|
+
|
8
|
+
# Dynamically defined on all proxied annotation classes.
|
9
|
+
# Get an annotation using the annotation bridge.
|
10
|
+
def get(annotation, anno_base = nil)
|
11
|
+
if !java_methods.include?('get(Ljava.lang.Class;)')
|
12
|
+
raise 'No annotation can be retrieved on this object.'
|
13
|
+
else
|
14
|
+
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
15
|
+
if anno_base
|
16
|
+
unless StanfordNLP::Config::Annotations[anno_base]
|
17
|
+
raise "The path #{anno_base} doesn't exist."
|
18
|
+
end
|
19
|
+
anno_bases = [anno_base]
|
20
|
+
else
|
21
|
+
anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
|
22
|
+
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
23
|
+
end
|
24
|
+
if anno_bases.size > 1
|
25
|
+
msg = "There are many different annotations " +
|
26
|
+
"bearing the name #{anno_class}. \nPlease specify " +
|
27
|
+
"one of the following base classes as second " +
|
28
|
+
"parameter to disambiguate: "
|
29
|
+
msg << anno_bases.join(',')
|
30
|
+
raise msg
|
31
|
+
else
|
32
|
+
base_class = anno_bases[0]
|
33
|
+
end
|
34
|
+
url = "edu.stanford.#{base_class}$#{anno_class}"
|
35
|
+
StanfordCoreNLP::AnnotationBridge.getAnnotation(self, url)
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
end
|
40
|
+
|
41
|
+
end
|
42
|
+
end
|
metadata
CHANGED
@@ -1,45 +1,46 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: stanford-core-nlp
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
|
4
|
+
prerelease:
|
5
|
+
version: 0.4.0
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Louis Mullie
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-12-
|
12
|
+
date: 2012-12-18 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bind-it
|
16
|
-
|
17
|
-
none: false
|
16
|
+
version_requirements: !ruby/object:Gem::Requirement
|
18
17
|
requirements:
|
19
18
|
- - ! '>='
|
20
19
|
- !ruby/object:Gem::Version
|
21
20
|
version: '0'
|
22
|
-
type: :runtime
|
23
|
-
prerelease: false
|
24
|
-
version_requirements: !ruby/object:Gem::Requirement
|
25
21
|
none: false
|
22
|
+
requirement: !ruby/object:Gem::Requirement
|
26
23
|
requirements:
|
27
24
|
- - ! '>='
|
28
25
|
- !ruby/object:Gem::Version
|
29
26
|
version: '0'
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
27
|
+
none: false
|
28
|
+
prerelease: false
|
29
|
+
type: :runtime
|
30
|
+
description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural\
|
31
|
+
\ language processing \ntools that provides tokenization, part-of-speech tagging\
|
32
|
+
\ and parsing for several languages, as well as named entity \nrecognition and coreference\
|
33
|
+
\ resolution for English. "
|
34
34
|
email:
|
35
35
|
- louis.mullie@gmail.com
|
36
36
|
executables: []
|
37
37
|
extensions: []
|
38
38
|
extra_rdoc_files: []
|
39
39
|
files:
|
40
|
-
- lib/stanford-core-nlp/bridge.rb
|
41
|
-
- lib/stanford-core-nlp/config.rb
|
42
40
|
- lib/stanford-core-nlp.rb
|
41
|
+
- lib/stanford-core-nlp/config.rb
|
42
|
+
- lib/stanford-core-nlp/jruby_bridge.rb
|
43
|
+
- lib/stanford-core-nlp/rjb_bridge.rb
|
43
44
|
- bin/AnnotationBridge.java
|
44
45
|
- bin/bridge.jar
|
45
46
|
- bin/Stanford.java
|
@@ -47,26 +48,27 @@ files:
|
|
47
48
|
- LICENSE
|
48
49
|
homepage: https://github.com/louismullie/stanford-core-nlp
|
49
50
|
licenses: []
|
50
|
-
post_install_message:
|
51
|
+
post_install_message:
|
51
52
|
rdoc_options: []
|
52
53
|
require_paths:
|
53
54
|
- lib
|
54
55
|
required_ruby_version: !ruby/object:Gem::Requirement
|
55
|
-
none: false
|
56
56
|
requirements:
|
57
57
|
- - ! '>='
|
58
58
|
- !ruby/object:Gem::Version
|
59
59
|
version: '0'
|
60
|
-
required_rubygems_version: !ruby/object:Gem::Requirement
|
61
60
|
none: false
|
61
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
62
62
|
requirements:
|
63
63
|
- - ! '>='
|
64
64
|
- !ruby/object:Gem::Version
|
65
65
|
version: '0'
|
66
|
+
none: false
|
66
67
|
requirements: []
|
67
|
-
rubyforge_project:
|
68
|
+
rubyforge_project:
|
68
69
|
rubygems_version: 1.8.24
|
69
|
-
signing_key:
|
70
|
+
signing_key:
|
70
71
|
specification_version: 3
|
71
72
|
summary: Ruby bindings to the Stanford Core NLP tools.
|
72
73
|
test_files: []
|
74
|
+
...
|
@@ -1,40 +0,0 @@
|
|
1
|
-
module StanfordCoreNLP
|
2
|
-
|
3
|
-
# Modify the Rjb JavaProxy class to add our
|
4
|
-
# own methods to every Java object.
|
5
|
-
Rjb::Rjb_JavaProxy.class_eval do
|
6
|
-
|
7
|
-
# Dynamically defined on all proxied annotation classes.
|
8
|
-
# Get an annotation using the annotation bridge.
|
9
|
-
def get(annotation, anno_base = nil)
|
10
|
-
if !java_methods.include?('get(Ljava.lang.Class;)')
|
11
|
-
raise 'No annotation can be retrieved on this object.'
|
12
|
-
else
|
13
|
-
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
14
|
-
if anno_base
|
15
|
-
unless StanfordNLP::Config::Annotations[anno_base]
|
16
|
-
raise "The path #{anno_base} doesn't exist."
|
17
|
-
end
|
18
|
-
anno_bases = [anno_base]
|
19
|
-
else
|
20
|
-
anno_bases = StanfordCoreNLP::Config::AnnotationsByName[anno_class]
|
21
|
-
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
22
|
-
end
|
23
|
-
if anno_bases.size > 1
|
24
|
-
msg = "There are many different annotations " +
|
25
|
-
"bearing the name #{anno_class}. \nPlease specify " +
|
26
|
-
"one of the following base classes as second " +
|
27
|
-
"parameter to disambiguate: "
|
28
|
-
msg << anno_bases.join(',')
|
29
|
-
raise msg
|
30
|
-
else
|
31
|
-
base_class = anno_bases[0]
|
32
|
-
end
|
33
|
-
url = "edu.stanford.#{base_class}$#{anno_class}"
|
34
|
-
StanfordCoreNLP::AnnotationBridge.getAnnotation(self, url)
|
35
|
-
end
|
36
|
-
end
|
37
|
-
|
38
|
-
end
|
39
|
-
|
40
|
-
end
|