stanford-core-nlp 0.1.4 → 0.1.5
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +22 -23
- data/bin/INFO +1 -1
- data/lib/stanford-core-nlp.rb +126 -52
- data/lib/stanford-core-nlp/config.rb +453 -0
- data/lib/stanford-core-nlp/java_wrapper.rb +27 -0
- metadata +5 -5
- data/lib/stanford-core-nlp/stanford_annotations.rb +0 -401
data/README.markdown
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
**About**
|
2
2
|
|
3
|
-
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools
|
3
|
+
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that features tokenization, part-of-speech tagging, lemmatization, and parsing for five languages (English, French, German, Arabic and Chinese), as well as named entity recognition and coreference resolution for English.
|
4
4
|
|
5
5
|
**Installing**
|
6
6
|
|
7
7
|
1. Install the gem: `gem install stanford-core-nlp`.
|
8
8
|
|
9
|
-
2. Download the Stanford Core NLP JAR and model files [
|
9
|
+
2. Download the Stanford Core NLP JAR and model files. Two package are available with the necessary files: a package for [English only](http://louismullie.com/stanford-core-nlp-english.zip), or a package with models for [all languages](http://louismullie.com/stanford-core-nlp-all.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/).
|
10
10
|
|
11
11
|
**Configuration**
|
12
12
|
|
@@ -23,18 +23,12 @@ After installing and requiring the gem (`require 'stanford-core-nlp'`), you may
|
|
23
23
|
# Redirect VM output to log.txt
|
24
24
|
StanfordCoreNLP.log_file = 'log.txt'
|
25
25
|
|
26
|
-
|
27
|
-
|
28
|
-
# Default base class is edu.stanford.nlp.pipeline.
|
29
|
-
StanfordCoreNLP.load('PTBTokenizerAnnotator')
|
30
|
-
puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
|
31
|
-
# => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
|
32
|
-
|
33
|
-
# Here, we specify another base class.
|
34
|
-
StanfordCoreNLP.load('MaxentTagger', 'edu.stanford.nlp.tagger')
|
35
|
-
puts StanfordCoreNLP::MaxentTagger.inspect
|
36
|
-
# => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
|
26
|
+
# Use the model files for a different language than English.
|
27
|
+
StanfordCoreNLP.use(:french)
|
37
28
|
|
29
|
+
# Change a specific model file.
|
30
|
+
StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
|
31
|
+
|
38
32
|
**Using the gem**
|
39
33
|
|
40
34
|
text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
|
@@ -64,22 +58,27 @@ You may also want to load your own classes from the Stanford NLP to do more spec
|
|
64
58
|
end
|
65
59
|
end
|
66
60
|
|
67
|
-
|
61
|
+
> Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
|
68
62
|
|
69
|
-
|
63
|
+
A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
|
70
64
|
|
71
|
-
|
72
|
-
- For the Stanford Tagger, download the [tagger files](http://nlp.stanford.edu/software/tagger.shtml), and copy from the models/ directory the models you need into the gem's bin/models directory. Models are available for Arabic, Chinese, French and German.
|
65
|
+
**Loading specific classes**
|
73
66
|
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
StanfordCoreNLP.
|
78
|
-
|
67
|
+
You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
|
68
|
+
|
69
|
+
# Default base class is edu.stanford.nlp.pipeline.
|
70
|
+
StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
|
71
|
+
puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
|
72
|
+
# => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
|
73
|
+
|
74
|
+
# Here, we specify another base class.
|
75
|
+
StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
|
76
|
+
puts StanfordCoreNLP::MaxentTagger.inspect
|
77
|
+
# => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
|
79
78
|
|
80
79
|
**Current known issues**
|
81
80
|
|
82
|
-
The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be
|
81
|
+
The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be grateful if somebody could add/e-mail me these files.
|
83
82
|
|
84
83
|
**Contributing**
|
85
84
|
|
data/bin/INFO
CHANGED
@@ -1 +1 @@
|
|
1
|
-
This is where you should put the JAR files.
|
1
|
+
This is where you should put the JAR files and the folders with the model files.
|
data/lib/stanford-core-nlp.rb
CHANGED
@@ -1,81 +1,135 @@
|
|
1
1
|
module StanfordCoreNLP
|
2
2
|
|
3
|
-
VERSION = '0.1.
|
4
|
-
require 'stanford-core-nlp/jar_loader
|
3
|
+
VERSION = '0.1.5'
|
4
|
+
require 'stanford-core-nlp/jar_loader'
|
5
5
|
require 'stanford-core-nlp/java_wrapper'
|
6
|
-
require 'stanford-core-nlp/
|
7
|
-
|
6
|
+
require 'stanford-core-nlp/config'
|
7
|
+
|
8
8
|
class << self
|
9
|
-
# The path in which to look for the Stanford JAR files
|
10
|
-
#
|
9
|
+
# The path in which to look for the Stanford JAR files,
|
10
|
+
# with a trailing slash.
|
11
|
+
#
|
12
|
+
# The structure of the JAR folder must be as follows:
|
13
|
+
#
|
14
|
+
# Files:
|
15
|
+
#
|
16
|
+
# /stanford-core-nlp.jar
|
17
|
+
# /joda-time.jar
|
18
|
+
# /xom.jar
|
19
|
+
# /bridge.jar*
|
20
|
+
#
|
21
|
+
# Folders:
|
22
|
+
#
|
23
|
+
# /classifiers # Models for the NER system.
|
24
|
+
# /dcoref # Models for the coreference resolver.
|
25
|
+
# /taggers # Models for the POS tagger.
|
26
|
+
# /grammar # Models for the parser.
|
27
|
+
#
|
28
|
+
# *The file bridge.jar is a thin JAVA wrapper over the
|
29
|
+
# Stanford Core NLP get() function, which allows to
|
30
|
+
# retrieve annotations using static classes as names.
|
31
|
+
# This works around one of the lacunae of Rjb.
|
11
32
|
attr_accessor :jar_path
|
12
|
-
# The flags for starting the JVM machine.
|
13
|
-
#
|
33
|
+
# The flags for starting the JVM machine. The parser
|
34
|
+
# and named entity recognizer are very memory consuming.
|
14
35
|
attr_accessor :jvm_args
|
15
36
|
# A file to redirect JVM output to.
|
16
37
|
attr_accessor :log_file
|
17
|
-
# The model files
|
38
|
+
# The model files for a given language.
|
18
39
|
attr_accessor :model_files
|
19
40
|
end
|
20
41
|
|
21
42
|
# The default JAR path is the gem's bin folder.
|
22
43
|
self.jar_path = File.dirname(__FILE__) + '/../bin/'
|
23
|
-
# Load the JVM with a minimum heap size of 512MB and a
|
44
|
+
# Load the JVM with a minimum heap size of 512MB and a
|
24
45
|
# maximum heap size of 1024MB.
|
25
46
|
self.jvm_args = ['-Xms512M', '-Xmx1024M']
|
26
47
|
# Turn logging off by default.
|
27
48
|
self.log_file = nil
|
28
49
|
|
29
|
-
# Default model files.
|
30
|
-
self.model_files = {
|
31
|
-
'pos.model' => 'taggers/english-left3words-distsim.tagger',
|
32
|
-
'ner.model.3class' => 'classifiers/all.3class.distsim.crf.ser.gz',
|
33
|
-
'ner.model.7class' => 'classifiers/muc.7class.distsim.crf.ser.gz',
|
34
|
-
'ner.model.MISCclass' => 'classifiers/conll.4class.distsim.crf.ser.gz',
|
35
|
-
'parser.model' => 'grammar/englishPCFG.ser.gz',
|
36
|
-
'dcoref.demonym' => 'dcoref/demonyms.txt',
|
37
|
-
'dcoref.animate' => 'dcoref/animate.unigrams.txt',
|
38
|
-
'dcoref.female' => 'dcoref/female.unigrams.txt',
|
39
|
-
'dcoref.inanimate' => 'dcoref/inanimate.unigrams.txt',
|
40
|
-
'dcoref.male' => 'dcoref/male.unigrams.txt',
|
41
|
-
'dcoref.neutral' => 'dcoref/neutral.unigrams.txt',
|
42
|
-
'dcoref.plural' => 'dcoref/plural.unigrams.txt',
|
43
|
-
'dcoref.singular' => 'dcoref/singular.unigrams.txt',
|
44
|
-
'dcoref.states' => 'dcoref/state-abbreviations.txt',
|
45
|
-
'dcoref.countries' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
|
46
|
-
'dcoref.states.provinces' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
|
47
|
-
'dcoref.extra.gender' => 'dcoref/namegender.combine.txt'
|
48
|
-
}
|
49
50
|
|
50
|
-
#
|
51
|
-
|
52
|
-
#
|
53
|
-
|
51
|
+
# Use models for a given language. Language can be
|
52
|
+
# supplied as full-length, or ISO-639 2 or 3 letter
|
53
|
+
# code (e.g. :english, :eng or :en will work).
|
54
|
+
def self.use(language)
|
55
|
+
lang = nil
|
56
|
+
self.model_files = {}
|
57
|
+
Config::LanguageCodes.each do |l,codes|
|
58
|
+
lang = codes[2] if codes.include?(language)
|
59
|
+
end
|
60
|
+
Config::Models.each do |n, languages|
|
61
|
+
models = languages[lang]
|
62
|
+
folder = Config::ModelFolders[n]
|
63
|
+
if models.is_a?(Hash)
|
64
|
+
n = n.to_s
|
65
|
+
n += '.model' if n == 'ner'
|
66
|
+
models.each do |m, file|
|
67
|
+
self.model_files["#{n}.#{m}"] =
|
68
|
+
folder + file
|
69
|
+
end
|
70
|
+
elsif models.is_a?(String)
|
71
|
+
self.model_files["#{n}.model"] =
|
72
|
+
folder + models
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
# Use english by default.
|
78
|
+
self.use(:english)
|
54
79
|
|
55
|
-
# Set a model file.
|
80
|
+
# Set a model file. Here are the default models for English:
|
81
|
+
#
|
82
|
+
# 'pos.model' => 'english-left3words-distsim.tagger',
|
83
|
+
# 'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
|
84
|
+
# 'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
|
85
|
+
# 'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
|
86
|
+
# 'parser.model' => 'englishPCFG.ser.gz',
|
87
|
+
# 'dcoref.demonym' => 'demonyms.txt',
|
88
|
+
# 'dcoref.animate' => 'animate.unigrams.txt',
|
89
|
+
# 'dcoref.female' => 'female.unigrams.txt',
|
90
|
+
# 'dcoref.inanimate' => 'inanimate.unigrams.txt',
|
91
|
+
# 'dcoref.male' => 'male.unigrams.txt',
|
92
|
+
# 'dcoref.neutral' => 'neutral.unigrams.txt',
|
93
|
+
# 'dcoref.plural' => 'plural.unigrams.txt',
|
94
|
+
# 'dcoref.singular' => 'singular.unigrams.txt',
|
95
|
+
# 'dcoref.states' => 'state-abbreviations.txt',
|
96
|
+
# 'dcoref.extra.gender' => 'namegender.combine.txt'
|
97
|
+
#
|
56
98
|
def self.set_model(name, file)
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
end
|
61
|
-
self.model_files[name] = file
|
99
|
+
n = name.split('.')[0].intern
|
100
|
+
self.model_files[name] =
|
101
|
+
Config::ModelFolders[n] + file
|
62
102
|
end
|
63
103
|
|
104
|
+
# Whether the classes are initialized or not.
|
105
|
+
@@initialized = false
|
106
|
+
# Whether the JAR files are loaded or not.
|
107
|
+
@@loaded = false
|
108
|
+
|
64
109
|
# Load the JARs, create the classes.
|
65
110
|
def self.init
|
66
111
|
self.load_jars unless @@loaded
|
67
112
|
self.create_classes
|
68
113
|
@@initialized = true
|
69
114
|
end
|
70
|
-
|
71
|
-
# Load a StanfordCoreNLP pipeline with the
|
72
|
-
#
|
115
|
+
|
116
|
+
# Load a StanfordCoreNLP pipeline with the
|
117
|
+
# specified JVM flags and StanfordCoreNLP
|
118
|
+
# properties.
|
73
119
|
def self.load(*annotators)
|
74
120
|
self.init unless @@initialized
|
75
121
|
# Prepend the JAR path to the model files.
|
76
122
|
properties = {}
|
77
|
-
self.model_files.each
|
78
|
-
|
123
|
+
self.model_files.each do |k,v|
|
124
|
+
f = self.jar_path + v
|
125
|
+
unless File.readable?(f)
|
126
|
+
raise "Model file #{f} could not be found. " +
|
127
|
+
"You may need to download this file manually and/or set paths properly."
|
128
|
+
else
|
129
|
+
properties[k] = f
|
130
|
+
end
|
131
|
+
end
|
132
|
+
properties['annotators'] =
|
79
133
|
annotators.map { |x| x.to_s }.join(', ')
|
80
134
|
CoreNLP.new(get_properties(properties))
|
81
135
|
end
|
@@ -101,17 +155,37 @@ module StanfordCoreNLP
|
|
101
155
|
const_set(:Properties, Rjb::import('java.util.Properties'))
|
102
156
|
const_set(:AnnotationBridge, Rjb::import('AnnotationBridge'))
|
103
157
|
end
|
104
|
-
|
158
|
+
|
105
159
|
# Load a class (e.g. PTBTokenizerAnnotator) in a specific
|
106
160
|
# class path (default is 'edu.stanford.nlp.pipeline').
|
107
161
|
# The class is then accessible under the StanfordCoreNLP
|
108
162
|
# namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
|
163
|
+
#
|
164
|
+
# List of annotators:
|
165
|
+
#
|
166
|
+
# - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
|
167
|
+
# - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
|
168
|
+
# - POSTaggerAnnotator - annotates the text with part-of-speech tags.
|
169
|
+
# - MorphaAnnotator - morphological normalizer (generates lemmas).
|
170
|
+
# - NERAnnotator - annotates the text with named-entity labels.
|
171
|
+
# - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
|
172
|
+
# - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
|
173
|
+
# - ParserAnnotator - generates constituent and dependency trees.
|
174
|
+
# - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
|
175
|
+
# - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
|
176
|
+
# - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
|
177
|
+
# - SRLAnnotator - annotates predicates and their semantic roles.
|
178
|
+
# - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
|
179
|
+
# - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
|
180
|
+
# - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
|
109
181
|
def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
|
110
182
|
self.load_jars unless @@loaded
|
111
183
|
const_set(klass.intern, Rjb::import("#{base}.#{klass}"))
|
112
184
|
end
|
113
|
-
|
114
|
-
|
185
|
+
|
186
|
+
# Private helper functions.
|
187
|
+
private
|
188
|
+
# HCreate a java.util.Properties object from a hash.
|
115
189
|
def self.get_properties(properties)
|
116
190
|
props = Properties.new
|
117
191
|
properties.each do |property, value|
|
@@ -119,10 +193,10 @@ module StanfordCoreNLP
|
|
119
193
|
end
|
120
194
|
props
|
121
195
|
end
|
122
|
-
|
123
|
-
#
|
196
|
+
|
197
|
+
# Under_case -> CamelCase.
|
124
198
|
def self.camel_case(text)
|
125
199
|
text.to_s.gsub(/^[a-z]|_[a-z]/) { |a| a.upcase }.gsub('_', '')
|
126
200
|
end
|
127
|
-
|
128
|
-
end
|
201
|
+
|
202
|
+
end
|
@@ -0,0 +1,453 @@
|
|
1
|
+
module StanfordCoreNLP
|
2
|
+
|
3
|
+
class Config
|
4
|
+
|
5
|
+
# A hash of language codes in humanized,
|
6
|
+
# 2 and 3-letter ISO639 codes.
|
7
|
+
LanguageCodes = {
|
8
|
+
:english => [:en, :eng, :english],
|
9
|
+
:german => [:de, :ger, :german],
|
10
|
+
:french => [:fr, :fre, :french],
|
11
|
+
:arabic => [:ar, :ara, :arabic],
|
12
|
+
:chinese => [:ch, :chi, :chinese],
|
13
|
+
:xinhua => [:xi, :xin, :xinhua]
|
14
|
+
}
|
15
|
+
|
16
|
+
# Folders inside the JAR path for the models.
|
17
|
+
ModelFolders = {
|
18
|
+
:pos => 'taggers/',
|
19
|
+
:parser => 'grammar/',
|
20
|
+
:ner => 'classifiers/',
|
21
|
+
:dcoref => 'dcoref/'
|
22
|
+
}
|
23
|
+
|
24
|
+
# Default models for all languages.
|
25
|
+
Models = {
|
26
|
+
:pos => {
|
27
|
+
:english => 'english-left3words-distsim.tagger',
|
28
|
+
:german => 'german-fast.tagger',
|
29
|
+
:french => 'french.tagger',
|
30
|
+
:arabic => 'arabic-fast.tagger',
|
31
|
+
:chinese => 'chinese.tagger',
|
32
|
+
:xinhua => nil
|
33
|
+
},
|
34
|
+
:parser => {
|
35
|
+
:english => 'englishPCFG.ser.gz',
|
36
|
+
:german => 'germanPCFG.ser.gz',
|
37
|
+
:french => 'frenchFactored.ser.gz',
|
38
|
+
:arabic => 'arabicFactored.ser.gz',
|
39
|
+
:chinese => 'chinesePCFG.ser.gz',
|
40
|
+
:xinhua => 'xinhuaPCFG.ser.gz'
|
41
|
+
},
|
42
|
+
:ner => {
|
43
|
+
:english => {
|
44
|
+
'3class' => 'all.3class.distsim.crf.ser.gz',
|
45
|
+
'7class' => 'muc.7class.distsim.crf.ser.gz',
|
46
|
+
'MISCclass' => 'conll.4class.distsim.crf.ser.gz'
|
47
|
+
},
|
48
|
+
:german => {},
|
49
|
+
:french => {},
|
50
|
+
:arabic => {},
|
51
|
+
:chinese => {},
|
52
|
+
:xinhua => {}
|
53
|
+
},
|
54
|
+
:dcoref => {
|
55
|
+
:english => {
|
56
|
+
'demonym' => 'demonyms.txt',
|
57
|
+
'animate' => 'animate.unigrams.txt',
|
58
|
+
'female' => 'female.unigrams.txt',
|
59
|
+
'inanimate' => 'inanimate.unigrams.txt',
|
60
|
+
'male' => 'male.unigrams.txt',
|
61
|
+
'neutral' => 'neutral.unigrams.txt',
|
62
|
+
'plural' => 'plural.unigrams.txt',
|
63
|
+
'singular' => 'singular.unigrams.txt',
|
64
|
+
'states' => 'state-abbreviations.txt',
|
65
|
+
'countries' => 'unknown.txt', # Fix - can somebody provide this file?
|
66
|
+
'states.provinces' => 'unknown.txt', # Fix - can somebody provide this file?
|
67
|
+
'extra.gender' => 'namegender.combine.txt'
|
68
|
+
},
|
69
|
+
:german => {},
|
70
|
+
:french => {},
|
71
|
+
:arabic => {},
|
72
|
+
:chinese => {},
|
73
|
+
:xinhua => {}
|
74
|
+
}
|
75
|
+
# Models to add.
|
76
|
+
|
77
|
+
#"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
|
78
|
+
#"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
|
79
|
+
#"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
|
80
|
+
#"nfl.gazetteer" - path towards the gazetteer for the NFL domain
|
81
|
+
#"nfl.relation.model" - path towards the NFL relation extraction model
|
82
|
+
}
|
83
|
+
|
84
|
+
# List of annotations by JAVA class path.
|
85
|
+
Annotations = {
|
86
|
+
|
87
|
+
'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
|
88
|
+
'AdjectivalModifierGRAnnotation',
|
89
|
+
'AdverbialModifierGRAnnotation',
|
90
|
+
'ArgumentGRAnnotation',
|
91
|
+
'AspectMarkerGRAnnotation',
|
92
|
+
'AssociativeMarkerGRAnnotation',
|
93
|
+
'AssociativeModifierGRAnnotation',
|
94
|
+
'AttributiveGRAnnotation',
|
95
|
+
'AuxModifierGRAnnotation',
|
96
|
+
'AuxPassiveGRAnnotation',
|
97
|
+
'BaGRAnnotation',
|
98
|
+
'ClausalComplementGRAnnotation',
|
99
|
+
'ClausalSubjectGRAnnotation',
|
100
|
+
'ClauseModifierGRAnnotation',
|
101
|
+
'ComplementGRAnnotation',
|
102
|
+
'ComplementizerGRAnnotation',
|
103
|
+
'ControllingSubjectGRAnnotation',
|
104
|
+
'CoordinationGRAnnotation',
|
105
|
+
'DeterminerGRAnnotation',
|
106
|
+
'DirectObjectGRAnnotation',
|
107
|
+
'DvpMarkerGRAnnotation',
|
108
|
+
'DvpModifierGRAnnotation',
|
109
|
+
'EtcGRAnnotation',
|
110
|
+
'LocalizerComplementGRAnnotation',
|
111
|
+
'ModalGRAnnotation',
|
112
|
+
'ModifierGRAnnotation',
|
113
|
+
'NegationModifierGRAnnotation',
|
114
|
+
'NominalPassiveSubjectGRAnnotation',
|
115
|
+
'NominalSubjectGRAnnotation',
|
116
|
+
'NounCompoundModifierGRAnnotation',
|
117
|
+
'NumberModifierGRAnnotation',
|
118
|
+
'NumericModifierGRAnnotation',
|
119
|
+
'ObjectGRAnnotation',
|
120
|
+
'OrdNumberGRAnnotation',
|
121
|
+
'ParentheticalGRAnnotation',
|
122
|
+
'ParticipialModifierGRAnnotation',
|
123
|
+
'PreconjunctGRAnnotation',
|
124
|
+
'PrepositionalLocalizerModifierGRAnnotation',
|
125
|
+
'PrepositionalModifierGRAnnotation',
|
126
|
+
'PrepositionalObjectGRAnnotation',
|
127
|
+
'PunctuationGRAnnotation',
|
128
|
+
'RangeGRAnnotation',
|
129
|
+
'RelativeClauseModifierGRAnnotation',
|
130
|
+
'ResultativeComplementGRAnnotation',
|
131
|
+
'SemanticDependentGRAnnotation',
|
132
|
+
'SubjectGRAnnotation',
|
133
|
+
'TemporalClauseGRAnnotation',
|
134
|
+
'TemporalGRAnnotation',
|
135
|
+
'TimePostpositionGRAnnotation',
|
136
|
+
'TopicGRAnnotation',
|
137
|
+
'VerbCompoundGRAnnotation',
|
138
|
+
'VerbModifierGRAnnotation',
|
139
|
+
'XClausalComplementGRAnnotation'
|
140
|
+
],
|
141
|
+
|
142
|
+
'nlp.dcoref.CoNLL2011DocumentReader' => [
|
143
|
+
'CorefMentionAnnotation',
|
144
|
+
'NamedEntityAnnotation'
|
145
|
+
],
|
146
|
+
|
147
|
+
'nlp.ling.CoreAnnotations' => [
|
148
|
+
|
149
|
+
'AbbrAnnotation',
|
150
|
+
'AbgeneAnnotation',
|
151
|
+
'AbstrAnnotation',
|
152
|
+
'AfterAnnotation',
|
153
|
+
'AnswerAnnotation',
|
154
|
+
'AnswerObjectAnnotation',
|
155
|
+
'AntecedentAnnotation',
|
156
|
+
'ArgDescendentAnnotation',
|
157
|
+
'ArgumentAnnotation',
|
158
|
+
'BagOfWordsAnnotation',
|
159
|
+
'BeAnnotation',
|
160
|
+
'BeforeAnnotation',
|
161
|
+
'BeginIndexAnnotation',
|
162
|
+
'BestCliquesAnnotation',
|
163
|
+
'BestFullAnnotation',
|
164
|
+
'CalendarAnnotation',
|
165
|
+
'CategoryAnnotation',
|
166
|
+
'CategoryFunctionalTagAnnotation',
|
167
|
+
'CharacterOffsetBeginAnnotation',
|
168
|
+
'CharacterOffsetEndAnnotation',
|
169
|
+
'CharAnnotation',
|
170
|
+
'ChineseCharAnnotation',
|
171
|
+
'ChineseIsSegmentedAnnotation',
|
172
|
+
'ChineseOrigSegAnnotation',
|
173
|
+
'ChineseSegAnnotation',
|
174
|
+
'ChunkAnnotation',
|
175
|
+
'CoarseTagAnnotation',
|
176
|
+
'CommonWordsAnnotation',
|
177
|
+
'CoNLLDepAnnotation',
|
178
|
+
'CoNLLDepParentIndexAnnotation',
|
179
|
+
'CoNLLDepTypeAnnotation',
|
180
|
+
'CoNLLPredicateAnnotation',
|
181
|
+
'CoNLLSRLAnnotation',
|
182
|
+
'ContextsAnnotation',
|
183
|
+
'CopyAnnotation',
|
184
|
+
'CostMagnificationAnnotation',
|
185
|
+
'CovertIDAnnotation',
|
186
|
+
'D2_LBeginAnnotation',
|
187
|
+
'D2_LEndAnnotation',
|
188
|
+
'D2_LMiddleAnnotation',
|
189
|
+
'DayAnnotation',
|
190
|
+
'DependentsAnnotation',
|
191
|
+
'DictAnnotation',
|
192
|
+
'DistSimAnnotation',
|
193
|
+
'DoAnnotation',
|
194
|
+
'DocDateAnnotation',
|
195
|
+
'DocIDAnnotation',
|
196
|
+
'DomainAnnotation',
|
197
|
+
'EndIndexAnnotation',
|
198
|
+
'EntityClassAnnotation',
|
199
|
+
'EntityRuleAnnotation',
|
200
|
+
'EntityTypeAnnotation',
|
201
|
+
'FeaturesAnnotation',
|
202
|
+
'FemaleGazAnnotation',
|
203
|
+
'FirstChildAnnotation',
|
204
|
+
'ForcedSentenceEndAnnotation',
|
205
|
+
'FreqAnnotation',
|
206
|
+
'GazAnnotation',
|
207
|
+
'GazetteerAnnotation',
|
208
|
+
'GenericTokensAnnotation',
|
209
|
+
'GeniaAnnotation',
|
210
|
+
'GoldAnswerAnnotation',
|
211
|
+
'GovernorAnnotation',
|
212
|
+
'GrandparentAnnotation',
|
213
|
+
'HaveAnnotation',
|
214
|
+
'HeadWordStringAnnotation',
|
215
|
+
'HeightAnnotation',
|
216
|
+
'IDAnnotation',
|
217
|
+
'IDFAnnotation',
|
218
|
+
'INAnnotation',
|
219
|
+
'IndexAnnotation',
|
220
|
+
'InterpretationAnnotation',
|
221
|
+
'IsDateRangeAnnotation',
|
222
|
+
'IsURLAnnotation',
|
223
|
+
'LabelAnnotation',
|
224
|
+
'LastGazAnnotation',
|
225
|
+
'LastTaggedAnnotation',
|
226
|
+
'LBeginAnnotation',
|
227
|
+
'LeftChildrenNodeAnnotation',
|
228
|
+
'LeftTermAnnotation',
|
229
|
+
'LemmaAnnotation',
|
230
|
+
'LEndAnnotation',
|
231
|
+
'LengthAnnotation',
|
232
|
+
'LMiddleAnnotation',
|
233
|
+
'MaleGazAnnotation',
|
234
|
+
'MarkingAnnotation',
|
235
|
+
'MonthAnnotation',
|
236
|
+
'MorphoCaseAnnotation',
|
237
|
+
'MorphoGenAnnotation',
|
238
|
+
'MorphoNumAnnotation',
|
239
|
+
'MorphoPersAnnotation',
|
240
|
+
'NamedEntityTagAnnotation',
|
241
|
+
'NeighborsAnnotation',
|
242
|
+
'NERIDAnnotation',
|
243
|
+
'NormalizedNamedEntityTagAnnotation',
|
244
|
+
'NotAnnotation',
|
245
|
+
'NumericCompositeObjectAnnotation',
|
246
|
+
'NumericCompositeTypeAnnotation',
|
247
|
+
'NumericCompositeValueAnnotation',
|
248
|
+
'NumericObjectAnnotation',
|
249
|
+
'NumericTypeAnnotation',
|
250
|
+
'NumericValueAnnotation',
|
251
|
+
'NumerizedTokensAnnotation',
|
252
|
+
'NumTxtSentencesAnnotation',
|
253
|
+
'OriginalAnswerAnnotation',
|
254
|
+
'OriginalCharAnnotation',
|
255
|
+
'OriginalTextAnnotation',
|
256
|
+
'ParagraphAnnotation',
|
257
|
+
'ParagraphsAnnotation',
|
258
|
+
'ParaPositionAnnotation',
|
259
|
+
'ParentAnnotation',
|
260
|
+
'PartOfSpeechAnnotation',
|
261
|
+
'PercentAnnotation',
|
262
|
+
'PhraseWordsAnnotation',
|
263
|
+
'PhraseWordsTagAnnotation',
|
264
|
+
'PolarityAnnotation',
|
265
|
+
'PositionAnnotation',
|
266
|
+
'PossibleAnswersAnnotation',
|
267
|
+
'PredictedAnswerAnnotation',
|
268
|
+
'PrevChildAnnotation',
|
269
|
+
'PriorAnnotation',
|
270
|
+
'ProjectedCategoryAnnotation',
|
271
|
+
'ProtoAnnotation',
|
272
|
+
'RoleAnnotation',
|
273
|
+
'SectionAnnotation',
|
274
|
+
'SemanticHeadTagAnnotation',
|
275
|
+
'SemanticHeadWordAnnotation',
|
276
|
+
'SemanticTagAnnotation',
|
277
|
+
'SemanticWordAnnotation',
|
278
|
+
'SentenceIDAnnotation',
|
279
|
+
'SentenceIndexAnnotation',
|
280
|
+
'SentencePositionAnnotation',
|
281
|
+
'SentencesAnnotation',
|
282
|
+
'ShapeAnnotation',
|
283
|
+
'SpaceBeforeAnnotation',
|
284
|
+
'SpanAnnotation',
|
285
|
+
'SpeakerAnnotation',
|
286
|
+
'SRL_ID',
|
287
|
+
'SRLIDAnnotation',
|
288
|
+
'SRLInstancesAnnotation',
|
289
|
+
'StackedNamedEntityTagAnnotation',
|
290
|
+
'StateAnnotation',
|
291
|
+
'StemAnnotation',
|
292
|
+
'SubcategorizationAnnotation',
|
293
|
+
'TagLabelAnnotation',
|
294
|
+
'TextAnnotation',
|
295
|
+
'TokenBeginAnnotation',
|
296
|
+
'TokenEndAnnotation',
|
297
|
+
'TokensAnnotation',
|
298
|
+
'TopicAnnotation',
|
299
|
+
'TrueCaseAnnotation',
|
300
|
+
'TrueCaseTextAnnotation',
|
301
|
+
'TrueTagAnnotation',
|
302
|
+
'UBlockAnnotation',
|
303
|
+
'UnaryAnnotation',
|
304
|
+
'UnknownAnnotation',
|
305
|
+
'UtteranceAnnotation',
|
306
|
+
'UTypeAnnotation',
|
307
|
+
'ValueAnnotation',
|
308
|
+
'VerbSenseAnnotation',
|
309
|
+
'WebAnnotation',
|
310
|
+
'WordFormAnnotation',
|
311
|
+
'WordnetSynAnnotation',
|
312
|
+
'WordPositionAnnotation',
|
313
|
+
'WordSenseAnnotation',
|
314
|
+
'XmlContextAnnotation',
|
315
|
+
'XmlElementAnnotation',
|
316
|
+
'YearAnnotation'
|
317
|
+
],
|
318
|
+
|
319
|
+
'nlp.dcoref.CorefCoreAnnotations' => [
|
320
|
+
|
321
|
+
'CorefAnnotation',
|
322
|
+
'CorefChainAnnotation',
|
323
|
+
'CorefClusterAnnotation',
|
324
|
+
'CorefClusterIdAnnotation',
|
325
|
+
'CorefDestAnnotation',
|
326
|
+
'CorefGraphAnnotation'
|
327
|
+
],
|
328
|
+
|
329
|
+
'nlp.ling.CoreLabel' => [
|
330
|
+
'GenericAnnotation'
|
331
|
+
],
|
332
|
+
|
333
|
+
'nlp.trees.EnglishGrammaticalRelations' => [
|
334
|
+
'AbbreviationModifierGRAnnotation',
|
335
|
+
'AdjectivalComplementGRAnnotation',
|
336
|
+
'AdjectivalModifierGRAnnotation',
|
337
|
+
'AdvClauseModifierGRAnnotation',
|
338
|
+
'AdverbialModifierGRAnnotation',
|
339
|
+
'AgentGRAnnotation',
|
340
|
+
'AppositionalModifierGRAnnotation',
|
341
|
+
'ArgumentGRAnnotation',
|
342
|
+
'AttributiveGRAnnotation',
|
343
|
+
'AuxModifierGRAnnotation',
|
344
|
+
'AuxPassiveGRAnnotation',
|
345
|
+
'ClausalComplementGRAnnotation',
|
346
|
+
'ClausalPassiveSubjectGRAnnotation',
|
347
|
+
'ClausalSubjectGRAnnotation',
|
348
|
+
'ComplementGRAnnotation',
|
349
|
+
'ComplementizerGRAnnotation',
|
350
|
+
'ConjunctGRAnnotation',
|
351
|
+
'ControllingSubjectGRAnnotation',
|
352
|
+
'CoordinationGRAnnotation',
|
353
|
+
'CopulaGRAnnotation',
|
354
|
+
'DeterminerGRAnnotation',
|
355
|
+
'DirectObjectGRAnnotation',
|
356
|
+
'ExpletiveGRAnnotation',
|
357
|
+
'IndirectObjectGRAnnotation',
|
358
|
+
'InfinitivalModifierGRAnnotation',
|
359
|
+
'MarkerGRAnnotation',
|
360
|
+
'ModifierGRAnnotation',
|
361
|
+
'MultiWordExpressionGRAnnotation',
|
362
|
+
'NegationModifierGRAnnotation',
|
363
|
+
'NominalPassiveSubjectGRAnnotation',
|
364
|
+
'NominalSubjectGRAnnotation',
|
365
|
+
'NounCompoundModifierGRAnnotation',
|
366
|
+
'NpAdverbialModifierGRAnnotation',
|
367
|
+
'NumberModifierGRAnnotation',
|
368
|
+
'NumericModifierGRAnnotation',
|
369
|
+
'ObjectGRAnnotation',
|
370
|
+
'ParataxisGRAnnotation',
|
371
|
+
'ParticipialModifierGRAnnotation',
|
372
|
+
'PhrasalVerbParticleGRAnnotation',
|
373
|
+
'PossessionModifierGRAnnotation',
|
374
|
+
'PossessiveModifierGRAnnotation',
|
375
|
+
'PreconjunctGRAnnotation',
|
376
|
+
'PredeterminerGRAnnotation',
|
377
|
+
'PredicateGRAnnotation',
|
378
|
+
'PrepositionalComplementGRAnnotation',
|
379
|
+
'PrepositionalModifierGRAnnotation',
|
380
|
+
'PrepositionalObjectGRAnnotation',
|
381
|
+
'PunctuationGRAnnotation',
|
382
|
+
'PurposeClauseModifierGRAnnotation',
|
383
|
+
'QuantifierModifierGRAnnotation',
|
384
|
+
'ReferentGRAnnotation',
|
385
|
+
'RelativeClauseModifierGRAnnotation',
|
386
|
+
'RelativeGRAnnotation',
|
387
|
+
'SemanticDependentGRAnnotation',
|
388
|
+
'SubjectGRAnnotation',
|
389
|
+
'TemporalModifierGRAnnotation',
|
390
|
+
'XClausalComplementGRAnnotation'
|
391
|
+
],
|
392
|
+
|
393
|
+
'nlp.trees.GrammaticalRelation' => [
|
394
|
+
'DependentGRAnnotation',
|
395
|
+
'GovernorGRAnnotation',
|
396
|
+
'GrammaticalRelationAnnotation',
|
397
|
+
'KillGRAnnotation',
|
398
|
+
'Language',
|
399
|
+
'RootGRAnnotation'
|
400
|
+
],
|
401
|
+
|
402
|
+
'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
|
403
|
+
'DependencyAnnotation',
|
404
|
+
'DocumentDirectoryAnnotation',
|
405
|
+
'DocumentIdAnnotation',
|
406
|
+
'EntityMentionsAnnotation',
|
407
|
+
'EventMentionsAnnotation',
|
408
|
+
'GenderAnnotation',
|
409
|
+
'RelationMentionsAnnotation',
|
410
|
+
'TriggerAnnotation'
|
411
|
+
],
|
412
|
+
|
413
|
+
'nlp.parser.lexparser.ParserAnnotations' => [
|
414
|
+
'ConstraintAnnotation'
|
415
|
+
],
|
416
|
+
|
417
|
+
'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
|
418
|
+
'BasicDependenciesAnnotation',
|
419
|
+
'CollapsedCCProcessedDependenciesAnnotation',
|
420
|
+
'CollapsedDependenciesAnnotation'
|
421
|
+
],
|
422
|
+
|
423
|
+
'nlp.time.TimeAnnotations' => [
|
424
|
+
'TimexAnnotation',
|
425
|
+
'TimexAnnotations'
|
426
|
+
],
|
427
|
+
|
428
|
+
'nlp.time.TimeExpression' => [
|
429
|
+
'Annotation',
|
430
|
+
'ChildrenAnnotation'
|
431
|
+
],
|
432
|
+
|
433
|
+
'nlp.trees.TreeCoreAnnotations' => [
|
434
|
+
'TreeHeadTagAnnotation',
|
435
|
+
'TreeHeadWordAnnotation',
|
436
|
+
'TreeAnnotation'
|
437
|
+
]
|
438
|
+
}
|
439
|
+
|
440
|
+
# Create a list of annotation names => paths.
|
441
|
+
annotations_by_name = {}
|
442
|
+
Annotations.each do |base_class, annotation_classes|
|
443
|
+
annotation_classes.each do |annotation_class|
|
444
|
+
annotations_by_name[annotation_class] ||= []
|
445
|
+
annotations_by_name[annotation_class] << base_class
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
449
|
+
# Hash of name => path.
|
450
|
+
AnnotationsByName = annotations_by_name
|
451
|
+
|
452
|
+
end
|
453
|
+
end
|
@@ -18,5 +18,32 @@ module StanfordCoreNLP
|
|
18
18
|
end
|
19
19
|
end
|
20
20
|
|
21
|
+
# Dynamically defined on all proxied annotation classes.
|
22
|
+
# Get an annotation using the annotation bridge.
|
23
|
+
def get(annotation, anno_base = nil)
|
24
|
+
if !java_methods.include?('get(Ljava.lang.Class;)')
|
25
|
+
raise'No annotation can be retrieved on this object.'
|
26
|
+
else
|
27
|
+
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
28
|
+
if anno_base
|
29
|
+
raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
|
30
|
+
anno_bases = [anno_base]
|
31
|
+
else
|
32
|
+
anno_bases = Config::AnnotationsByName[anno_class]
|
33
|
+
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
34
|
+
end
|
35
|
+
if anno_bases.size > 1
|
36
|
+
msg = "There are many different annotations bearing the name #{anno_class}. "
|
37
|
+
msg << "Please specify one of the following base classes as second parameter to disambiguate: "
|
38
|
+
msg << anno_bases.join(',')
|
39
|
+
raise msg
|
40
|
+
else
|
41
|
+
base_class = anno_bases[0]
|
42
|
+
end
|
43
|
+
url = "edu.stanford.#{base_class}$#{anno_class}"
|
44
|
+
AnnotationBridge.getAnnotation(self, url)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
21
48
|
end
|
22
49
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: stanford-core-nlp
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-
|
12
|
+
date: 2012-02-04 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rjb
|
16
|
-
requirement: &
|
16
|
+
requirement: &70191057037760 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70191057037760
|
25
25
|
description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
|
26
26
|
language processing \ntools for English, including tokenization, part-of-speech
|
27
27
|
tagging, lemmatization, named entity recognition,\nparsing, and coreference resolution. "
|
@@ -31,9 +31,9 @@ executables: []
|
|
31
31
|
extensions: []
|
32
32
|
extra_rdoc_files: []
|
33
33
|
files:
|
34
|
+
- lib/stanford-core-nlp/config.rb
|
34
35
|
- lib/stanford-core-nlp/jar_loader.rb
|
35
36
|
- lib/stanford-core-nlp/java_wrapper.rb
|
36
|
-
- lib/stanford-core-nlp/stanford_annotations.rb
|
37
37
|
- lib/stanford-core-nlp.rb
|
38
38
|
- bin/bridge.jar
|
39
39
|
- bin/INFO
|
@@ -1,401 +0,0 @@
|
|
1
|
-
module StanfordCoreNLP
|
2
|
-
|
3
|
-
# @private
|
4
|
-
Annotations = {
|
5
|
-
|
6
|
-
'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
|
7
|
-
'AdjectivalModifierGRAnnotation',
|
8
|
-
'AdverbialModifierGRAnnotation',
|
9
|
-
'ArgumentGRAnnotation',
|
10
|
-
'AspectMarkerGRAnnotation',
|
11
|
-
'AssociativeMarkerGRAnnotation',
|
12
|
-
'AssociativeModifierGRAnnotation',
|
13
|
-
'AttributiveGRAnnotation',
|
14
|
-
'AuxModifierGRAnnotation',
|
15
|
-
'AuxPassiveGRAnnotation',
|
16
|
-
'BaGRAnnotation',
|
17
|
-
'ClausalComplementGRAnnotation',
|
18
|
-
'ClausalSubjectGRAnnotation',
|
19
|
-
'ClauseModifierGRAnnotation',
|
20
|
-
'ComplementGRAnnotation',
|
21
|
-
'ComplementizerGRAnnotation',
|
22
|
-
'ControllingSubjectGRAnnotation',
|
23
|
-
'CoordinationGRAnnotation',
|
24
|
-
'DeterminerGRAnnotation',
|
25
|
-
'DirectObjectGRAnnotation',
|
26
|
-
'DvpMarkerGRAnnotation',
|
27
|
-
'DvpModifierGRAnnotation',
|
28
|
-
'EtcGRAnnotation',
|
29
|
-
'LocalizerComplementGRAnnotation',
|
30
|
-
'ModalGRAnnotation',
|
31
|
-
'ModifierGRAnnotation',
|
32
|
-
'NegationModifierGRAnnotation',
|
33
|
-
'NominalPassiveSubjectGRAnnotation',
|
34
|
-
'NominalSubjectGRAnnotation',
|
35
|
-
'NounCompoundModifierGRAnnotation',
|
36
|
-
'NumberModifierGRAnnotation',
|
37
|
-
'NumericModifierGRAnnotation',
|
38
|
-
'ObjectGRAnnotation',
|
39
|
-
'OrdNumberGRAnnotation',
|
40
|
-
'ParentheticalGRAnnotation',
|
41
|
-
'ParticipialModifierGRAnnotation',
|
42
|
-
'PreconjunctGRAnnotation',
|
43
|
-
'PrepositionalLocalizerModifierGRAnnotation',
|
44
|
-
'PrepositionalModifierGRAnnotation',
|
45
|
-
'PrepositionalObjectGRAnnotation',
|
46
|
-
'PunctuationGRAnnotation',
|
47
|
-
'RangeGRAnnotation',
|
48
|
-
'RelativeClauseModifierGRAnnotation',
|
49
|
-
'ResultativeComplementGRAnnotation',
|
50
|
-
'SemanticDependentGRAnnotation',
|
51
|
-
'SubjectGRAnnotation',
|
52
|
-
'TemporalClauseGRAnnotation',
|
53
|
-
'TemporalGRAnnotation',
|
54
|
-
'TimePostpositionGRAnnotation',
|
55
|
-
'TopicGRAnnotation',
|
56
|
-
'VerbCompoundGRAnnotation',
|
57
|
-
'VerbModifierGRAnnotation',
|
58
|
-
'XClausalComplementGRAnnotation'
|
59
|
-
],
|
60
|
-
|
61
|
-
'nlp.dcoref.CoNLL2011DocumentReader' => [
|
62
|
-
'CorefMentionAnnotation',
|
63
|
-
'NamedEntityAnnotation'
|
64
|
-
],
|
65
|
-
|
66
|
-
'nlp.ling.CoreAnnotations' => [
|
67
|
-
|
68
|
-
'AbbrAnnotation',
|
69
|
-
'AbgeneAnnotation',
|
70
|
-
'AbstrAnnotation',
|
71
|
-
'AfterAnnotation',
|
72
|
-
'AnswerAnnotation',
|
73
|
-
'AnswerObjectAnnotation',
|
74
|
-
'AntecedentAnnotation',
|
75
|
-
'ArgDescendentAnnotation',
|
76
|
-
'ArgumentAnnotation',
|
77
|
-
'BagOfWordsAnnotation',
|
78
|
-
'BeAnnotation',
|
79
|
-
'BeforeAnnotation',
|
80
|
-
'BeginIndexAnnotation',
|
81
|
-
'BestCliquesAnnotation',
|
82
|
-
'BestFullAnnotation',
|
83
|
-
'CalendarAnnotation',
|
84
|
-
'CategoryAnnotation',
|
85
|
-
'CategoryFunctionalTagAnnotation',
|
86
|
-
'CharacterOffsetBeginAnnotation',
|
87
|
-
'CharacterOffsetEndAnnotation',
|
88
|
-
'CharAnnotation',
|
89
|
-
'ChineseCharAnnotation',
|
90
|
-
'ChineseIsSegmentedAnnotation',
|
91
|
-
'ChineseOrigSegAnnotation',
|
92
|
-
'ChineseSegAnnotation',
|
93
|
-
'ChunkAnnotation',
|
94
|
-
'CoarseTagAnnotation',
|
95
|
-
'CommonWordsAnnotation',
|
96
|
-
'CoNLLDepAnnotation',
|
97
|
-
'CoNLLDepParentIndexAnnotation',
|
98
|
-
'CoNLLDepTypeAnnotation',
|
99
|
-
'CoNLLPredicateAnnotation',
|
100
|
-
'CoNLLSRLAnnotation',
|
101
|
-
'ContextsAnnotation',
|
102
|
-
'CopyAnnotation',
|
103
|
-
'CostMagnificationAnnotation',
|
104
|
-
'CovertIDAnnotation',
|
105
|
-
'D2_LBeginAnnotation',
|
106
|
-
'D2_LEndAnnotation',
|
107
|
-
'D2_LMiddleAnnotation',
|
108
|
-
'DayAnnotation',
|
109
|
-
'DependentsAnnotation',
|
110
|
-
'DictAnnotation',
|
111
|
-
'DistSimAnnotation',
|
112
|
-
'DoAnnotation',
|
113
|
-
'DocDateAnnotation',
|
114
|
-
'DocIDAnnotation',
|
115
|
-
'DomainAnnotation',
|
116
|
-
'EndIndexAnnotation',
|
117
|
-
'EntityClassAnnotation',
|
118
|
-
'EntityRuleAnnotation',
|
119
|
-
'EntityTypeAnnotation',
|
120
|
-
'FeaturesAnnotation',
|
121
|
-
'FemaleGazAnnotation',
|
122
|
-
'FirstChildAnnotation',
|
123
|
-
'ForcedSentenceEndAnnotation',
|
124
|
-
'FreqAnnotation',
|
125
|
-
'GazAnnotation',
|
126
|
-
'GazetteerAnnotation',
|
127
|
-
'GenericTokensAnnotation',
|
128
|
-
'GeniaAnnotation',
|
129
|
-
'GoldAnswerAnnotation',
|
130
|
-
'GovernorAnnotation',
|
131
|
-
'GrandparentAnnotation',
|
132
|
-
'HaveAnnotation',
|
133
|
-
'HeadWordStringAnnotation',
|
134
|
-
'HeightAnnotation',
|
135
|
-
'IDAnnotation',
|
136
|
-
'IDFAnnotation',
|
137
|
-
'INAnnotation',
|
138
|
-
'IndexAnnotation',
|
139
|
-
'InterpretationAnnotation',
|
140
|
-
'IsDateRangeAnnotation',
|
141
|
-
'IsURLAnnotation',
|
142
|
-
'LabelAnnotation',
|
143
|
-
'LastGazAnnotation',
|
144
|
-
'LastTaggedAnnotation',
|
145
|
-
'LBeginAnnotation',
|
146
|
-
'LeftChildrenNodeAnnotation',
|
147
|
-
'LeftTermAnnotation',
|
148
|
-
'LemmaAnnotation',
|
149
|
-
'LEndAnnotation',
|
150
|
-
'LengthAnnotation',
|
151
|
-
'LMiddleAnnotation',
|
152
|
-
'MaleGazAnnotation',
|
153
|
-
'MarkingAnnotation',
|
154
|
-
'MonthAnnotation',
|
155
|
-
'MorphoCaseAnnotation',
|
156
|
-
'MorphoGenAnnotation',
|
157
|
-
'MorphoNumAnnotation',
|
158
|
-
'MorphoPersAnnotation',
|
159
|
-
'NamedEntityTagAnnotation',
|
160
|
-
'NeighborsAnnotation',
|
161
|
-
'NERIDAnnotation',
|
162
|
-
'NormalizedNamedEntityTagAnnotation',
|
163
|
-
'NotAnnotation',
|
164
|
-
'NumericCompositeObjectAnnotation',
|
165
|
-
'NumericCompositeTypeAnnotation',
|
166
|
-
'NumericCompositeValueAnnotation',
|
167
|
-
'NumericObjectAnnotation',
|
168
|
-
'NumericTypeAnnotation',
|
169
|
-
'NumericValueAnnotation',
|
170
|
-
'NumerizedTokensAnnotation',
|
171
|
-
'NumTxtSentencesAnnotation',
|
172
|
-
'OriginalAnswerAnnotation',
|
173
|
-
'OriginalCharAnnotation',
|
174
|
-
'OriginalTextAnnotation',
|
175
|
-
'ParagraphAnnotation',
|
176
|
-
'ParagraphsAnnotation',
|
177
|
-
'ParaPositionAnnotation',
|
178
|
-
'ParentAnnotation',
|
179
|
-
'PartOfSpeechAnnotation',
|
180
|
-
'PercentAnnotation',
|
181
|
-
'PhraseWordsAnnotation',
|
182
|
-
'PhraseWordsTagAnnotation',
|
183
|
-
'PolarityAnnotation',
|
184
|
-
'PositionAnnotation',
|
185
|
-
'PossibleAnswersAnnotation',
|
186
|
-
'PredictedAnswerAnnotation',
|
187
|
-
'PrevChildAnnotation',
|
188
|
-
'PriorAnnotation',
|
189
|
-
'ProjectedCategoryAnnotation',
|
190
|
-
'ProtoAnnotation',
|
191
|
-
'RoleAnnotation',
|
192
|
-
'SectionAnnotation',
|
193
|
-
'SemanticHeadTagAnnotation',
|
194
|
-
'SemanticHeadWordAnnotation',
|
195
|
-
'SemanticTagAnnotation',
|
196
|
-
'SemanticWordAnnotation',
|
197
|
-
'SentenceIDAnnotation',
|
198
|
-
'SentenceIndexAnnotation',
|
199
|
-
'SentencePositionAnnotation',
|
200
|
-
'SentencesAnnotation',
|
201
|
-
'ShapeAnnotation',
|
202
|
-
'SpaceBeforeAnnotation',
|
203
|
-
'SpanAnnotation',
|
204
|
-
'SpeakerAnnotation',
|
205
|
-
'SRL_ID',
|
206
|
-
'SRLIDAnnotation',
|
207
|
-
'SRLInstancesAnnotation',
|
208
|
-
'StackedNamedEntityTagAnnotation',
|
209
|
-
'StateAnnotation',
|
210
|
-
'StemAnnotation',
|
211
|
-
'SubcategorizationAnnotation',
|
212
|
-
'TagLabelAnnotation',
|
213
|
-
'TextAnnotation',
|
214
|
-
'TokenBeginAnnotation',
|
215
|
-
'TokenEndAnnotation',
|
216
|
-
'TokensAnnotation',
|
217
|
-
'TopicAnnotation',
|
218
|
-
'TrueCaseAnnotation',
|
219
|
-
'TrueCaseTextAnnotation',
|
220
|
-
'TrueTagAnnotation',
|
221
|
-
'UBlockAnnotation',
|
222
|
-
'UnaryAnnotation',
|
223
|
-
'UnknownAnnotation',
|
224
|
-
'UtteranceAnnotation',
|
225
|
-
'UTypeAnnotation',
|
226
|
-
'ValueAnnotation',
|
227
|
-
'VerbSenseAnnotation',
|
228
|
-
'WebAnnotation',
|
229
|
-
'WordFormAnnotation',
|
230
|
-
'WordnetSynAnnotation',
|
231
|
-
'WordPositionAnnotation',
|
232
|
-
'WordSenseAnnotation',
|
233
|
-
'XmlContextAnnotation',
|
234
|
-
'XmlElementAnnotation',
|
235
|
-
'YearAnnotation'
|
236
|
-
],
|
237
|
-
|
238
|
-
'nlp.dcoref.CorefCoreAnnotations' => [
|
239
|
-
|
240
|
-
'CorefAnnotation',
|
241
|
-
'CorefChainAnnotation',
|
242
|
-
'CorefClusterAnnotation',
|
243
|
-
'CorefClusterIdAnnotation',
|
244
|
-
'CorefDestAnnotation',
|
245
|
-
'CorefGraphAnnotation'
|
246
|
-
],
|
247
|
-
|
248
|
-
'nlp.ling.CoreLabel' => [
|
249
|
-
'GenericAnnotation'
|
250
|
-
],
|
251
|
-
|
252
|
-
'nlp.trees.EnglishGrammaticalRelations' => [
|
253
|
-
'AbbreviationModifierGRAnnotation',
|
254
|
-
'AdjectivalComplementGRAnnotation',
|
255
|
-
'AdjectivalModifierGRAnnotation',
|
256
|
-
'AdvClauseModifierGRAnnotation',
|
257
|
-
'AdverbialModifierGRAnnotation',
|
258
|
-
'AgentGRAnnotation',
|
259
|
-
'AppositionalModifierGRAnnotation',
|
260
|
-
'ArgumentGRAnnotation',
|
261
|
-
'AttributiveGRAnnotation',
|
262
|
-
'AuxModifierGRAnnotation',
|
263
|
-
'AuxPassiveGRAnnotation',
|
264
|
-
'ClausalComplementGRAnnotation',
|
265
|
-
'ClausalPassiveSubjectGRAnnotation',
|
266
|
-
'ClausalSubjectGRAnnotation',
|
267
|
-
'ComplementGRAnnotation',
|
268
|
-
'ComplementizerGRAnnotation',
|
269
|
-
'ConjunctGRAnnotation',
|
270
|
-
'ControllingSubjectGRAnnotation',
|
271
|
-
'CoordinationGRAnnotation',
|
272
|
-
'CopulaGRAnnotation',
|
273
|
-
'DeterminerGRAnnotation',
|
274
|
-
'DirectObjectGRAnnotation',
|
275
|
-
'ExpletiveGRAnnotation',
|
276
|
-
'IndirectObjectGRAnnotation',
|
277
|
-
'InfinitivalModifierGRAnnotation',
|
278
|
-
'MarkerGRAnnotation',
|
279
|
-
'ModifierGRAnnotation',
|
280
|
-
'MultiWordExpressionGRAnnotation',
|
281
|
-
'NegationModifierGRAnnotation',
|
282
|
-
'NominalPassiveSubjectGRAnnotation',
|
283
|
-
'NominalSubjectGRAnnotation',
|
284
|
-
'NounCompoundModifierGRAnnotation',
|
285
|
-
'NpAdverbialModifierGRAnnotation',
|
286
|
-
'NumberModifierGRAnnotation',
|
287
|
-
'NumericModifierGRAnnotation',
|
288
|
-
'ObjectGRAnnotation',
|
289
|
-
'ParataxisGRAnnotation',
|
290
|
-
'ParticipialModifierGRAnnotation',
|
291
|
-
'PhrasalVerbParticleGRAnnotation',
|
292
|
-
'PossessionModifierGRAnnotation',
|
293
|
-
'PossessiveModifierGRAnnotation',
|
294
|
-
'PreconjunctGRAnnotation',
|
295
|
-
'PredeterminerGRAnnotation',
|
296
|
-
'PredicateGRAnnotation',
|
297
|
-
'PrepositionalComplementGRAnnotation',
|
298
|
-
'PrepositionalModifierGRAnnotation',
|
299
|
-
'PrepositionalObjectGRAnnotation',
|
300
|
-
'PunctuationGRAnnotation',
|
301
|
-
'PurposeClauseModifierGRAnnotation',
|
302
|
-
'QuantifierModifierGRAnnotation',
|
303
|
-
'ReferentGRAnnotation',
|
304
|
-
'RelativeClauseModifierGRAnnotation',
|
305
|
-
'RelativeGRAnnotation',
|
306
|
-
'SemanticDependentGRAnnotation',
|
307
|
-
'SubjectGRAnnotation',
|
308
|
-
'TemporalModifierGRAnnotation',
|
309
|
-
'XClausalComplementGRAnnotation'
|
310
|
-
],
|
311
|
-
|
312
|
-
'nlp.trees.GrammaticalRelation' => [
|
313
|
-
'DependentGRAnnotation',
|
314
|
-
'GovernorGRAnnotation',
|
315
|
-
'GrammaticalRelationAnnotation',
|
316
|
-
'KillGRAnnotation',
|
317
|
-
'Language',
|
318
|
-
'RootGRAnnotation'
|
319
|
-
],
|
320
|
-
|
321
|
-
'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
|
322
|
-
'DependencyAnnotation',
|
323
|
-
'DocumentDirectoryAnnotation',
|
324
|
-
'DocumentIdAnnotation',
|
325
|
-
'EntityMentionsAnnotation',
|
326
|
-
'EventMentionsAnnotation',
|
327
|
-
'GenderAnnotation',
|
328
|
-
'RelationMentionsAnnotation',
|
329
|
-
'TriggerAnnotation'
|
330
|
-
],
|
331
|
-
|
332
|
-
'nlp.parser.lexparser.ParserAnnotations' => [
|
333
|
-
'ConstraintAnnotation'
|
334
|
-
],
|
335
|
-
|
336
|
-
'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
|
337
|
-
'BasicDependenciesAnnotation',
|
338
|
-
'CollapsedCCProcessedDependenciesAnnotation',
|
339
|
-
'CollapsedDependenciesAnnotation'
|
340
|
-
],
|
341
|
-
|
342
|
-
'nlp.time.TimeAnnotations' => [
|
343
|
-
'TimexAnnotation',
|
344
|
-
'TimexAnnotations'
|
345
|
-
],
|
346
|
-
|
347
|
-
'nlp.time.TimeExpression' => [
|
348
|
-
'Annotation',
|
349
|
-
'ChildrenAnnotation'
|
350
|
-
],
|
351
|
-
|
352
|
-
'nlp.trees.TreeCoreAnnotations' => [
|
353
|
-
'TreeHeadTagAnnotation',
|
354
|
-
'TreeHeadWordAnnotation',
|
355
|
-
'TreeAnnotation'
|
356
|
-
]
|
357
|
-
}
|
358
|
-
|
359
|
-
annotations_by_name = {}
|
360
|
-
Annotations.each do |base_class, annotation_classes|
|
361
|
-
annotation_classes.each do |annotation_class|
|
362
|
-
annotations_by_name[annotation_class] ||= []
|
363
|
-
annotations_by_name[annotation_class] << base_class
|
364
|
-
end
|
365
|
-
end
|
366
|
-
|
367
|
-
AnnotationsByName = annotations_by_name
|
368
|
-
|
369
|
-
# Modify the Rjb JavaProxy class to add our own method to get annotations.
|
370
|
-
Rjb::Rjb_JavaProxy.class_eval do
|
371
|
-
|
372
|
-
# Dynamically defined on all proxied annotation classes.
|
373
|
-
# Get an annotation using the annotation bridge.
|
374
|
-
def get(annotation, anno_base = nil)
|
375
|
-
if !java_methods.include?('get(Ljava.lang.Class;)')
|
376
|
-
raise'No annotation can be retrieved on this object.'
|
377
|
-
else
|
378
|
-
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
379
|
-
if anno_base
|
380
|
-
raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
|
381
|
-
anno_bases = [anno_base]
|
382
|
-
else
|
383
|
-
anno_bases = AnnotationsByName[anno_class]
|
384
|
-
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
385
|
-
end
|
386
|
-
if anno_bases.size > 1
|
387
|
-
msg = "There are many different annotations bearing the name #{anno_class}. "
|
388
|
-
msg << "Please specify one of the following base classes as second parameter to disambiguate: "
|
389
|
-
msg << anno_bases.join(',')
|
390
|
-
raise msg
|
391
|
-
else
|
392
|
-
base_class = anno_bases[0]
|
393
|
-
end
|
394
|
-
url = "edu.stanford.#{base_class}$#{anno_class}"
|
395
|
-
AnnotationBridge.getAnnotation(self, url)
|
396
|
-
end
|
397
|
-
end
|
398
|
-
|
399
|
-
end
|
400
|
-
|
401
|
-
end
|