stanford-core-nlp 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.markdown +22 -23
- data/bin/INFO +1 -1
- data/lib/stanford-core-nlp.rb +126 -52
- data/lib/stanford-core-nlp/config.rb +453 -0
- data/lib/stanford-core-nlp/java_wrapper.rb +27 -0
- metadata +5 -5
- data/lib/stanford-core-nlp/stanford_annotations.rb +0 -401
data/README.markdown
CHANGED
@@ -1,12 +1,12 @@
|
|
1
1
|
**About**
|
2
2
|
|
3
|
-
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools
|
3
|
+
This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that features tokenization, part-of-speech tagging, lemmatization, and parsing for five languages (English, French, German, Arabic and Chinese), as well as named entity recognition and coreference resolution for English.
|
4
4
|
|
5
5
|
**Installing**
|
6
6
|
|
7
7
|
1. Install the gem: `gem install stanford-core-nlp`.
|
8
8
|
|
9
|
-
2. Download the Stanford Core NLP JAR and model files [
|
9
|
+
2. Download the Stanford Core NLP JAR and model files. Two package are available with the necessary files: a package for [English only](http://louismullie.com/stanford-core-nlp-english.zip), or a package with models for [all languages](http://louismullie.com/stanford-core-nlp-all.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/).
|
10
10
|
|
11
11
|
**Configuration**
|
12
12
|
|
@@ -23,18 +23,12 @@ After installing and requiring the gem (`require 'stanford-core-nlp'`), you may
|
|
23
23
|
# Redirect VM output to log.txt
|
24
24
|
StanfordCoreNLP.log_file = 'log.txt'
|
25
25
|
|
26
|
-
|
27
|
-
|
28
|
-
# Default base class is edu.stanford.nlp.pipeline.
|
29
|
-
StanfordCoreNLP.load('PTBTokenizerAnnotator')
|
30
|
-
puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
|
31
|
-
# => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
|
32
|
-
|
33
|
-
# Here, we specify another base class.
|
34
|
-
StanfordCoreNLP.load('MaxentTagger', 'edu.stanford.nlp.tagger')
|
35
|
-
puts StanfordCoreNLP::MaxentTagger.inspect
|
36
|
-
# => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
|
26
|
+
# Use the model files for a different language than English.
|
27
|
+
StanfordCoreNLP.use(:french)
|
37
28
|
|
29
|
+
# Change a specific model file.
|
30
|
+
StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
|
31
|
+
|
38
32
|
**Using the gem**
|
39
33
|
|
40
34
|
text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
|
@@ -64,22 +58,27 @@ You may also want to load your own classes from the Stanford NLP to do more spec
|
|
64
58
|
end
|
65
59
|
end
|
66
60
|
|
67
|
-
|
61
|
+
> Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
|
68
62
|
|
69
|
-
|
63
|
+
A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
|
70
64
|
|
71
|
-
|
72
|
-
- For the Stanford Tagger, download the [tagger files](http://nlp.stanford.edu/software/tagger.shtml), and copy from the models/ directory the models you need into the gem's bin/models directory. Models are available for Arabic, Chinese, French and German.
|
65
|
+
**Loading specific classes**
|
73
66
|
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
StanfordCoreNLP.
|
78
|
-
|
67
|
+
You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
|
68
|
+
|
69
|
+
# Default base class is edu.stanford.nlp.pipeline.
|
70
|
+
StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
|
71
|
+
puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
|
72
|
+
# => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
|
73
|
+
|
74
|
+
# Here, we specify another base class.
|
75
|
+
StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
|
76
|
+
puts StanfordCoreNLP::MaxentTagger.inspect
|
77
|
+
# => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
|
79
78
|
|
80
79
|
**Current known issues**
|
81
80
|
|
82
|
-
The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be
|
81
|
+
The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be grateful if somebody could add/e-mail me these files.
|
83
82
|
|
84
83
|
**Contributing**
|
85
84
|
|
data/bin/INFO
CHANGED
@@ -1 +1 @@
|
|
1
|
-
This is where you should put the JAR files.
|
1
|
+
This is where you should put the JAR files and the folders with the model files.
|
data/lib/stanford-core-nlp.rb
CHANGED
@@ -1,81 +1,135 @@
|
|
1
1
|
module StanfordCoreNLP
|
2
2
|
|
3
|
-
VERSION = '0.1.
|
4
|
-
require 'stanford-core-nlp/jar_loader
|
3
|
+
VERSION = '0.1.5'
|
4
|
+
require 'stanford-core-nlp/jar_loader'
|
5
5
|
require 'stanford-core-nlp/java_wrapper'
|
6
|
-
require 'stanford-core-nlp/
|
7
|
-
|
6
|
+
require 'stanford-core-nlp/config'
|
7
|
+
|
8
8
|
class << self
|
9
|
-
# The path in which to look for the Stanford JAR files
|
10
|
-
#
|
9
|
+
# The path in which to look for the Stanford JAR files,
|
10
|
+
# with a trailing slash.
|
11
|
+
#
|
12
|
+
# The structure of the JAR folder must be as follows:
|
13
|
+
#
|
14
|
+
# Files:
|
15
|
+
#
|
16
|
+
# /stanford-core-nlp.jar
|
17
|
+
# /joda-time.jar
|
18
|
+
# /xom.jar
|
19
|
+
# /bridge.jar*
|
20
|
+
#
|
21
|
+
# Folders:
|
22
|
+
#
|
23
|
+
# /classifiers # Models for the NER system.
|
24
|
+
# /dcoref # Models for the coreference resolver.
|
25
|
+
# /taggers # Models for the POS tagger.
|
26
|
+
# /grammar # Models for the parser.
|
27
|
+
#
|
28
|
+
# *The file bridge.jar is a thin JAVA wrapper over the
|
29
|
+
# Stanford Core NLP get() function, which allows to
|
30
|
+
# retrieve annotations using static classes as names.
|
31
|
+
# This works around one of the lacunae of Rjb.
|
11
32
|
attr_accessor :jar_path
|
12
|
-
# The flags for starting the JVM machine.
|
13
|
-
#
|
33
|
+
# The flags for starting the JVM machine. The parser
|
34
|
+
# and named entity recognizer are very memory consuming.
|
14
35
|
attr_accessor :jvm_args
|
15
36
|
# A file to redirect JVM output to.
|
16
37
|
attr_accessor :log_file
|
17
|
-
# The model files
|
38
|
+
# The model files for a given language.
|
18
39
|
attr_accessor :model_files
|
19
40
|
end
|
20
41
|
|
21
42
|
# The default JAR path is the gem's bin folder.
|
22
43
|
self.jar_path = File.dirname(__FILE__) + '/../bin/'
|
23
|
-
# Load the JVM with a minimum heap size of 512MB and a
|
44
|
+
# Load the JVM with a minimum heap size of 512MB and a
|
24
45
|
# maximum heap size of 1024MB.
|
25
46
|
self.jvm_args = ['-Xms512M', '-Xmx1024M']
|
26
47
|
# Turn logging off by default.
|
27
48
|
self.log_file = nil
|
28
49
|
|
29
|
-
# Default model files.
|
30
|
-
self.model_files = {
|
31
|
-
'pos.model' => 'taggers/english-left3words-distsim.tagger',
|
32
|
-
'ner.model.3class' => 'classifiers/all.3class.distsim.crf.ser.gz',
|
33
|
-
'ner.model.7class' => 'classifiers/muc.7class.distsim.crf.ser.gz',
|
34
|
-
'ner.model.MISCclass' => 'classifiers/conll.4class.distsim.crf.ser.gz',
|
35
|
-
'parser.model' => 'grammar/englishPCFG.ser.gz',
|
36
|
-
'dcoref.demonym' => 'dcoref/demonyms.txt',
|
37
|
-
'dcoref.animate' => 'dcoref/animate.unigrams.txt',
|
38
|
-
'dcoref.female' => 'dcoref/female.unigrams.txt',
|
39
|
-
'dcoref.inanimate' => 'dcoref/inanimate.unigrams.txt',
|
40
|
-
'dcoref.male' => 'dcoref/male.unigrams.txt',
|
41
|
-
'dcoref.neutral' => 'dcoref/neutral.unigrams.txt',
|
42
|
-
'dcoref.plural' => 'dcoref/plural.unigrams.txt',
|
43
|
-
'dcoref.singular' => 'dcoref/singular.unigrams.txt',
|
44
|
-
'dcoref.states' => 'dcoref/state-abbreviations.txt',
|
45
|
-
'dcoref.countries' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
|
46
|
-
'dcoref.states.provinces' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
|
47
|
-
'dcoref.extra.gender' => 'dcoref/namegender.combine.txt'
|
48
|
-
}
|
49
50
|
|
50
|
-
#
|
51
|
-
|
52
|
-
#
|
53
|
-
|
51
|
+
# Use models for a given language. Language can be
|
52
|
+
# supplied as full-length, or ISO-639 2 or 3 letter
|
53
|
+
# code (e.g. :english, :eng or :en will work).
|
54
|
+
def self.use(language)
|
55
|
+
lang = nil
|
56
|
+
self.model_files = {}
|
57
|
+
Config::LanguageCodes.each do |l,codes|
|
58
|
+
lang = codes[2] if codes.include?(language)
|
59
|
+
end
|
60
|
+
Config::Models.each do |n, languages|
|
61
|
+
models = languages[lang]
|
62
|
+
folder = Config::ModelFolders[n]
|
63
|
+
if models.is_a?(Hash)
|
64
|
+
n = n.to_s
|
65
|
+
n += '.model' if n == 'ner'
|
66
|
+
models.each do |m, file|
|
67
|
+
self.model_files["#{n}.#{m}"] =
|
68
|
+
folder + file
|
69
|
+
end
|
70
|
+
elsif models.is_a?(String)
|
71
|
+
self.model_files["#{n}.model"] =
|
72
|
+
folder + models
|
73
|
+
end
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
# Use english by default.
|
78
|
+
self.use(:english)
|
54
79
|
|
55
|
-
# Set a model file.
|
80
|
+
# Set a model file. Here are the default models for English:
|
81
|
+
#
|
82
|
+
# 'pos.model' => 'english-left3words-distsim.tagger',
|
83
|
+
# 'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
|
84
|
+
# 'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
|
85
|
+
# 'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
|
86
|
+
# 'parser.model' => 'englishPCFG.ser.gz',
|
87
|
+
# 'dcoref.demonym' => 'demonyms.txt',
|
88
|
+
# 'dcoref.animate' => 'animate.unigrams.txt',
|
89
|
+
# 'dcoref.female' => 'female.unigrams.txt',
|
90
|
+
# 'dcoref.inanimate' => 'inanimate.unigrams.txt',
|
91
|
+
# 'dcoref.male' => 'male.unigrams.txt',
|
92
|
+
# 'dcoref.neutral' => 'neutral.unigrams.txt',
|
93
|
+
# 'dcoref.plural' => 'plural.unigrams.txt',
|
94
|
+
# 'dcoref.singular' => 'singular.unigrams.txt',
|
95
|
+
# 'dcoref.states' => 'state-abbreviations.txt',
|
96
|
+
# 'dcoref.extra.gender' => 'namegender.combine.txt'
|
97
|
+
#
|
56
98
|
def self.set_model(name, file)
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
end
|
61
|
-
self.model_files[name] = file
|
99
|
+
n = name.split('.')[0].intern
|
100
|
+
self.model_files[name] =
|
101
|
+
Config::ModelFolders[n] + file
|
62
102
|
end
|
63
103
|
|
104
|
+
# Whether the classes are initialized or not.
|
105
|
+
@@initialized = false
|
106
|
+
# Whether the JAR files are loaded or not.
|
107
|
+
@@loaded = false
|
108
|
+
|
64
109
|
# Load the JARs, create the classes.
|
65
110
|
def self.init
|
66
111
|
self.load_jars unless @@loaded
|
67
112
|
self.create_classes
|
68
113
|
@@initialized = true
|
69
114
|
end
|
70
|
-
|
71
|
-
# Load a StanfordCoreNLP pipeline with the
|
72
|
-
#
|
115
|
+
|
116
|
+
# Load a StanfordCoreNLP pipeline with the
|
117
|
+
# specified JVM flags and StanfordCoreNLP
|
118
|
+
# properties.
|
73
119
|
def self.load(*annotators)
|
74
120
|
self.init unless @@initialized
|
75
121
|
# Prepend the JAR path to the model files.
|
76
122
|
properties = {}
|
77
|
-
self.model_files.each
|
78
|
-
|
123
|
+
self.model_files.each do |k,v|
|
124
|
+
f = self.jar_path + v
|
125
|
+
unless File.readable?(f)
|
126
|
+
raise "Model file #{f} could not be found. " +
|
127
|
+
"You may need to download this file manually and/or set paths properly."
|
128
|
+
else
|
129
|
+
properties[k] = f
|
130
|
+
end
|
131
|
+
end
|
132
|
+
properties['annotators'] =
|
79
133
|
annotators.map { |x| x.to_s }.join(', ')
|
80
134
|
CoreNLP.new(get_properties(properties))
|
81
135
|
end
|
@@ -101,17 +155,37 @@ module StanfordCoreNLP
|
|
101
155
|
const_set(:Properties, Rjb::import('java.util.Properties'))
|
102
156
|
const_set(:AnnotationBridge, Rjb::import('AnnotationBridge'))
|
103
157
|
end
|
104
|
-
|
158
|
+
|
105
159
|
# Load a class (e.g. PTBTokenizerAnnotator) in a specific
|
106
160
|
# class path (default is 'edu.stanford.nlp.pipeline').
|
107
161
|
# The class is then accessible under the StanfordCoreNLP
|
108
162
|
# namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
|
163
|
+
#
|
164
|
+
# List of annotators:
|
165
|
+
#
|
166
|
+
# - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
|
167
|
+
# - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
|
168
|
+
# - POSTaggerAnnotator - annotates the text with part-of-speech tags.
|
169
|
+
# - MorphaAnnotator - morphological normalizer (generates lemmas).
|
170
|
+
# - NERAnnotator - annotates the text with named-entity labels.
|
171
|
+
# - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
|
172
|
+
# - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
|
173
|
+
# - ParserAnnotator - generates constituent and dependency trees.
|
174
|
+
# - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
|
175
|
+
# - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
|
176
|
+
# - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
|
177
|
+
# - SRLAnnotator - annotates predicates and their semantic roles.
|
178
|
+
# - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
|
179
|
+
# - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
|
180
|
+
# - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
|
109
181
|
def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
|
110
182
|
self.load_jars unless @@loaded
|
111
183
|
const_set(klass.intern, Rjb::import("#{base}.#{klass}"))
|
112
184
|
end
|
113
|
-
|
114
|
-
|
185
|
+
|
186
|
+
# Private helper functions.
|
187
|
+
private
|
188
|
+
# HCreate a java.util.Properties object from a hash.
|
115
189
|
def self.get_properties(properties)
|
116
190
|
props = Properties.new
|
117
191
|
properties.each do |property, value|
|
@@ -119,10 +193,10 @@ module StanfordCoreNLP
|
|
119
193
|
end
|
120
194
|
props
|
121
195
|
end
|
122
|
-
|
123
|
-
#
|
196
|
+
|
197
|
+
# Under_case -> CamelCase.
|
124
198
|
def self.camel_case(text)
|
125
199
|
text.to_s.gsub(/^[a-z]|_[a-z]/) { |a| a.upcase }.gsub('_', '')
|
126
200
|
end
|
127
|
-
|
128
|
-
end
|
201
|
+
|
202
|
+
end
|
@@ -0,0 +1,453 @@
|
|
1
|
+
module StanfordCoreNLP
|
2
|
+
|
3
|
+
class Config
|
4
|
+
|
5
|
+
# A hash of language codes in humanized,
|
6
|
+
# 2 and 3-letter ISO639 codes.
|
7
|
+
LanguageCodes = {
|
8
|
+
:english => [:en, :eng, :english],
|
9
|
+
:german => [:de, :ger, :german],
|
10
|
+
:french => [:fr, :fre, :french],
|
11
|
+
:arabic => [:ar, :ara, :arabic],
|
12
|
+
:chinese => [:ch, :chi, :chinese],
|
13
|
+
:xinhua => [:xi, :xin, :xinhua]
|
14
|
+
}
|
15
|
+
|
16
|
+
# Folders inside the JAR path for the models.
|
17
|
+
ModelFolders = {
|
18
|
+
:pos => 'taggers/',
|
19
|
+
:parser => 'grammar/',
|
20
|
+
:ner => 'classifiers/',
|
21
|
+
:dcoref => 'dcoref/'
|
22
|
+
}
|
23
|
+
|
24
|
+
# Default models for all languages.
|
25
|
+
Models = {
|
26
|
+
:pos => {
|
27
|
+
:english => 'english-left3words-distsim.tagger',
|
28
|
+
:german => 'german-fast.tagger',
|
29
|
+
:french => 'french.tagger',
|
30
|
+
:arabic => 'arabic-fast.tagger',
|
31
|
+
:chinese => 'chinese.tagger',
|
32
|
+
:xinhua => nil
|
33
|
+
},
|
34
|
+
:parser => {
|
35
|
+
:english => 'englishPCFG.ser.gz',
|
36
|
+
:german => 'germanPCFG.ser.gz',
|
37
|
+
:french => 'frenchFactored.ser.gz',
|
38
|
+
:arabic => 'arabicFactored.ser.gz',
|
39
|
+
:chinese => 'chinesePCFG.ser.gz',
|
40
|
+
:xinhua => 'xinhuaPCFG.ser.gz'
|
41
|
+
},
|
42
|
+
:ner => {
|
43
|
+
:english => {
|
44
|
+
'3class' => 'all.3class.distsim.crf.ser.gz',
|
45
|
+
'7class' => 'muc.7class.distsim.crf.ser.gz',
|
46
|
+
'MISCclass' => 'conll.4class.distsim.crf.ser.gz'
|
47
|
+
},
|
48
|
+
:german => {},
|
49
|
+
:french => {},
|
50
|
+
:arabic => {},
|
51
|
+
:chinese => {},
|
52
|
+
:xinhua => {}
|
53
|
+
},
|
54
|
+
:dcoref => {
|
55
|
+
:english => {
|
56
|
+
'demonym' => 'demonyms.txt',
|
57
|
+
'animate' => 'animate.unigrams.txt',
|
58
|
+
'female' => 'female.unigrams.txt',
|
59
|
+
'inanimate' => 'inanimate.unigrams.txt',
|
60
|
+
'male' => 'male.unigrams.txt',
|
61
|
+
'neutral' => 'neutral.unigrams.txt',
|
62
|
+
'plural' => 'plural.unigrams.txt',
|
63
|
+
'singular' => 'singular.unigrams.txt',
|
64
|
+
'states' => 'state-abbreviations.txt',
|
65
|
+
'countries' => 'unknown.txt', # Fix - can somebody provide this file?
|
66
|
+
'states.provinces' => 'unknown.txt', # Fix - can somebody provide this file?
|
67
|
+
'extra.gender' => 'namegender.combine.txt'
|
68
|
+
},
|
69
|
+
:german => {},
|
70
|
+
:french => {},
|
71
|
+
:arabic => {},
|
72
|
+
:chinese => {},
|
73
|
+
:xinhua => {}
|
74
|
+
}
|
75
|
+
# Models to add.
|
76
|
+
|
77
|
+
#"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
|
78
|
+
#"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
|
79
|
+
#"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
|
80
|
+
#"nfl.gazetteer" - path towards the gazetteer for the NFL domain
|
81
|
+
#"nfl.relation.model" - path towards the NFL relation extraction model
|
82
|
+
}
|
83
|
+
|
84
|
+
# List of annotations by JAVA class path.
|
85
|
+
Annotations = {
|
86
|
+
|
87
|
+
'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
|
88
|
+
'AdjectivalModifierGRAnnotation',
|
89
|
+
'AdverbialModifierGRAnnotation',
|
90
|
+
'ArgumentGRAnnotation',
|
91
|
+
'AspectMarkerGRAnnotation',
|
92
|
+
'AssociativeMarkerGRAnnotation',
|
93
|
+
'AssociativeModifierGRAnnotation',
|
94
|
+
'AttributiveGRAnnotation',
|
95
|
+
'AuxModifierGRAnnotation',
|
96
|
+
'AuxPassiveGRAnnotation',
|
97
|
+
'BaGRAnnotation',
|
98
|
+
'ClausalComplementGRAnnotation',
|
99
|
+
'ClausalSubjectGRAnnotation',
|
100
|
+
'ClauseModifierGRAnnotation',
|
101
|
+
'ComplementGRAnnotation',
|
102
|
+
'ComplementizerGRAnnotation',
|
103
|
+
'ControllingSubjectGRAnnotation',
|
104
|
+
'CoordinationGRAnnotation',
|
105
|
+
'DeterminerGRAnnotation',
|
106
|
+
'DirectObjectGRAnnotation',
|
107
|
+
'DvpMarkerGRAnnotation',
|
108
|
+
'DvpModifierGRAnnotation',
|
109
|
+
'EtcGRAnnotation',
|
110
|
+
'LocalizerComplementGRAnnotation',
|
111
|
+
'ModalGRAnnotation',
|
112
|
+
'ModifierGRAnnotation',
|
113
|
+
'NegationModifierGRAnnotation',
|
114
|
+
'NominalPassiveSubjectGRAnnotation',
|
115
|
+
'NominalSubjectGRAnnotation',
|
116
|
+
'NounCompoundModifierGRAnnotation',
|
117
|
+
'NumberModifierGRAnnotation',
|
118
|
+
'NumericModifierGRAnnotation',
|
119
|
+
'ObjectGRAnnotation',
|
120
|
+
'OrdNumberGRAnnotation',
|
121
|
+
'ParentheticalGRAnnotation',
|
122
|
+
'ParticipialModifierGRAnnotation',
|
123
|
+
'PreconjunctGRAnnotation',
|
124
|
+
'PrepositionalLocalizerModifierGRAnnotation',
|
125
|
+
'PrepositionalModifierGRAnnotation',
|
126
|
+
'PrepositionalObjectGRAnnotation',
|
127
|
+
'PunctuationGRAnnotation',
|
128
|
+
'RangeGRAnnotation',
|
129
|
+
'RelativeClauseModifierGRAnnotation',
|
130
|
+
'ResultativeComplementGRAnnotation',
|
131
|
+
'SemanticDependentGRAnnotation',
|
132
|
+
'SubjectGRAnnotation',
|
133
|
+
'TemporalClauseGRAnnotation',
|
134
|
+
'TemporalGRAnnotation',
|
135
|
+
'TimePostpositionGRAnnotation',
|
136
|
+
'TopicGRAnnotation',
|
137
|
+
'VerbCompoundGRAnnotation',
|
138
|
+
'VerbModifierGRAnnotation',
|
139
|
+
'XClausalComplementGRAnnotation'
|
140
|
+
],
|
141
|
+
|
142
|
+
'nlp.dcoref.CoNLL2011DocumentReader' => [
|
143
|
+
'CorefMentionAnnotation',
|
144
|
+
'NamedEntityAnnotation'
|
145
|
+
],
|
146
|
+
|
147
|
+
'nlp.ling.CoreAnnotations' => [
|
148
|
+
|
149
|
+
'AbbrAnnotation',
|
150
|
+
'AbgeneAnnotation',
|
151
|
+
'AbstrAnnotation',
|
152
|
+
'AfterAnnotation',
|
153
|
+
'AnswerAnnotation',
|
154
|
+
'AnswerObjectAnnotation',
|
155
|
+
'AntecedentAnnotation',
|
156
|
+
'ArgDescendentAnnotation',
|
157
|
+
'ArgumentAnnotation',
|
158
|
+
'BagOfWordsAnnotation',
|
159
|
+
'BeAnnotation',
|
160
|
+
'BeforeAnnotation',
|
161
|
+
'BeginIndexAnnotation',
|
162
|
+
'BestCliquesAnnotation',
|
163
|
+
'BestFullAnnotation',
|
164
|
+
'CalendarAnnotation',
|
165
|
+
'CategoryAnnotation',
|
166
|
+
'CategoryFunctionalTagAnnotation',
|
167
|
+
'CharacterOffsetBeginAnnotation',
|
168
|
+
'CharacterOffsetEndAnnotation',
|
169
|
+
'CharAnnotation',
|
170
|
+
'ChineseCharAnnotation',
|
171
|
+
'ChineseIsSegmentedAnnotation',
|
172
|
+
'ChineseOrigSegAnnotation',
|
173
|
+
'ChineseSegAnnotation',
|
174
|
+
'ChunkAnnotation',
|
175
|
+
'CoarseTagAnnotation',
|
176
|
+
'CommonWordsAnnotation',
|
177
|
+
'CoNLLDepAnnotation',
|
178
|
+
'CoNLLDepParentIndexAnnotation',
|
179
|
+
'CoNLLDepTypeAnnotation',
|
180
|
+
'CoNLLPredicateAnnotation',
|
181
|
+
'CoNLLSRLAnnotation',
|
182
|
+
'ContextsAnnotation',
|
183
|
+
'CopyAnnotation',
|
184
|
+
'CostMagnificationAnnotation',
|
185
|
+
'CovertIDAnnotation',
|
186
|
+
'D2_LBeginAnnotation',
|
187
|
+
'D2_LEndAnnotation',
|
188
|
+
'D2_LMiddleAnnotation',
|
189
|
+
'DayAnnotation',
|
190
|
+
'DependentsAnnotation',
|
191
|
+
'DictAnnotation',
|
192
|
+
'DistSimAnnotation',
|
193
|
+
'DoAnnotation',
|
194
|
+
'DocDateAnnotation',
|
195
|
+
'DocIDAnnotation',
|
196
|
+
'DomainAnnotation',
|
197
|
+
'EndIndexAnnotation',
|
198
|
+
'EntityClassAnnotation',
|
199
|
+
'EntityRuleAnnotation',
|
200
|
+
'EntityTypeAnnotation',
|
201
|
+
'FeaturesAnnotation',
|
202
|
+
'FemaleGazAnnotation',
|
203
|
+
'FirstChildAnnotation',
|
204
|
+
'ForcedSentenceEndAnnotation',
|
205
|
+
'FreqAnnotation',
|
206
|
+
'GazAnnotation',
|
207
|
+
'GazetteerAnnotation',
|
208
|
+
'GenericTokensAnnotation',
|
209
|
+
'GeniaAnnotation',
|
210
|
+
'GoldAnswerAnnotation',
|
211
|
+
'GovernorAnnotation',
|
212
|
+
'GrandparentAnnotation',
|
213
|
+
'HaveAnnotation',
|
214
|
+
'HeadWordStringAnnotation',
|
215
|
+
'HeightAnnotation',
|
216
|
+
'IDAnnotation',
|
217
|
+
'IDFAnnotation',
|
218
|
+
'INAnnotation',
|
219
|
+
'IndexAnnotation',
|
220
|
+
'InterpretationAnnotation',
|
221
|
+
'IsDateRangeAnnotation',
|
222
|
+
'IsURLAnnotation',
|
223
|
+
'LabelAnnotation',
|
224
|
+
'LastGazAnnotation',
|
225
|
+
'LastTaggedAnnotation',
|
226
|
+
'LBeginAnnotation',
|
227
|
+
'LeftChildrenNodeAnnotation',
|
228
|
+
'LeftTermAnnotation',
|
229
|
+
'LemmaAnnotation',
|
230
|
+
'LEndAnnotation',
|
231
|
+
'LengthAnnotation',
|
232
|
+
'LMiddleAnnotation',
|
233
|
+
'MaleGazAnnotation',
|
234
|
+
'MarkingAnnotation',
|
235
|
+
'MonthAnnotation',
|
236
|
+
'MorphoCaseAnnotation',
|
237
|
+
'MorphoGenAnnotation',
|
238
|
+
'MorphoNumAnnotation',
|
239
|
+
'MorphoPersAnnotation',
|
240
|
+
'NamedEntityTagAnnotation',
|
241
|
+
'NeighborsAnnotation',
|
242
|
+
'NERIDAnnotation',
|
243
|
+
'NormalizedNamedEntityTagAnnotation',
|
244
|
+
'NotAnnotation',
|
245
|
+
'NumericCompositeObjectAnnotation',
|
246
|
+
'NumericCompositeTypeAnnotation',
|
247
|
+
'NumericCompositeValueAnnotation',
|
248
|
+
'NumericObjectAnnotation',
|
249
|
+
'NumericTypeAnnotation',
|
250
|
+
'NumericValueAnnotation',
|
251
|
+
'NumerizedTokensAnnotation',
|
252
|
+
'NumTxtSentencesAnnotation',
|
253
|
+
'OriginalAnswerAnnotation',
|
254
|
+
'OriginalCharAnnotation',
|
255
|
+
'OriginalTextAnnotation',
|
256
|
+
'ParagraphAnnotation',
|
257
|
+
'ParagraphsAnnotation',
|
258
|
+
'ParaPositionAnnotation',
|
259
|
+
'ParentAnnotation',
|
260
|
+
'PartOfSpeechAnnotation',
|
261
|
+
'PercentAnnotation',
|
262
|
+
'PhraseWordsAnnotation',
|
263
|
+
'PhraseWordsTagAnnotation',
|
264
|
+
'PolarityAnnotation',
|
265
|
+
'PositionAnnotation',
|
266
|
+
'PossibleAnswersAnnotation',
|
267
|
+
'PredictedAnswerAnnotation',
|
268
|
+
'PrevChildAnnotation',
|
269
|
+
'PriorAnnotation',
|
270
|
+
'ProjectedCategoryAnnotation',
|
271
|
+
'ProtoAnnotation',
|
272
|
+
'RoleAnnotation',
|
273
|
+
'SectionAnnotation',
|
274
|
+
'SemanticHeadTagAnnotation',
|
275
|
+
'SemanticHeadWordAnnotation',
|
276
|
+
'SemanticTagAnnotation',
|
277
|
+
'SemanticWordAnnotation',
|
278
|
+
'SentenceIDAnnotation',
|
279
|
+
'SentenceIndexAnnotation',
|
280
|
+
'SentencePositionAnnotation',
|
281
|
+
'SentencesAnnotation',
|
282
|
+
'ShapeAnnotation',
|
283
|
+
'SpaceBeforeAnnotation',
|
284
|
+
'SpanAnnotation',
|
285
|
+
'SpeakerAnnotation',
|
286
|
+
'SRL_ID',
|
287
|
+
'SRLIDAnnotation',
|
288
|
+
'SRLInstancesAnnotation',
|
289
|
+
'StackedNamedEntityTagAnnotation',
|
290
|
+
'StateAnnotation',
|
291
|
+
'StemAnnotation',
|
292
|
+
'SubcategorizationAnnotation',
|
293
|
+
'TagLabelAnnotation',
|
294
|
+
'TextAnnotation',
|
295
|
+
'TokenBeginAnnotation',
|
296
|
+
'TokenEndAnnotation',
|
297
|
+
'TokensAnnotation',
|
298
|
+
'TopicAnnotation',
|
299
|
+
'TrueCaseAnnotation',
|
300
|
+
'TrueCaseTextAnnotation',
|
301
|
+
'TrueTagAnnotation',
|
302
|
+
'UBlockAnnotation',
|
303
|
+
'UnaryAnnotation',
|
304
|
+
'UnknownAnnotation',
|
305
|
+
'UtteranceAnnotation',
|
306
|
+
'UTypeAnnotation',
|
307
|
+
'ValueAnnotation',
|
308
|
+
'VerbSenseAnnotation',
|
309
|
+
'WebAnnotation',
|
310
|
+
'WordFormAnnotation',
|
311
|
+
'WordnetSynAnnotation',
|
312
|
+
'WordPositionAnnotation',
|
313
|
+
'WordSenseAnnotation',
|
314
|
+
'XmlContextAnnotation',
|
315
|
+
'XmlElementAnnotation',
|
316
|
+
'YearAnnotation'
|
317
|
+
],
|
318
|
+
|
319
|
+
'nlp.dcoref.CorefCoreAnnotations' => [
|
320
|
+
|
321
|
+
'CorefAnnotation',
|
322
|
+
'CorefChainAnnotation',
|
323
|
+
'CorefClusterAnnotation',
|
324
|
+
'CorefClusterIdAnnotation',
|
325
|
+
'CorefDestAnnotation',
|
326
|
+
'CorefGraphAnnotation'
|
327
|
+
],
|
328
|
+
|
329
|
+
'nlp.ling.CoreLabel' => [
|
330
|
+
'GenericAnnotation'
|
331
|
+
],
|
332
|
+
|
333
|
+
'nlp.trees.EnglishGrammaticalRelations' => [
|
334
|
+
'AbbreviationModifierGRAnnotation',
|
335
|
+
'AdjectivalComplementGRAnnotation',
|
336
|
+
'AdjectivalModifierGRAnnotation',
|
337
|
+
'AdvClauseModifierGRAnnotation',
|
338
|
+
'AdverbialModifierGRAnnotation',
|
339
|
+
'AgentGRAnnotation',
|
340
|
+
'AppositionalModifierGRAnnotation',
|
341
|
+
'ArgumentGRAnnotation',
|
342
|
+
'AttributiveGRAnnotation',
|
343
|
+
'AuxModifierGRAnnotation',
|
344
|
+
'AuxPassiveGRAnnotation',
|
345
|
+
'ClausalComplementGRAnnotation',
|
346
|
+
'ClausalPassiveSubjectGRAnnotation',
|
347
|
+
'ClausalSubjectGRAnnotation',
|
348
|
+
'ComplementGRAnnotation',
|
349
|
+
'ComplementizerGRAnnotation',
|
350
|
+
'ConjunctGRAnnotation',
|
351
|
+
'ControllingSubjectGRAnnotation',
|
352
|
+
'CoordinationGRAnnotation',
|
353
|
+
'CopulaGRAnnotation',
|
354
|
+
'DeterminerGRAnnotation',
|
355
|
+
'DirectObjectGRAnnotation',
|
356
|
+
'ExpletiveGRAnnotation',
|
357
|
+
'IndirectObjectGRAnnotation',
|
358
|
+
'InfinitivalModifierGRAnnotation',
|
359
|
+
'MarkerGRAnnotation',
|
360
|
+
'ModifierGRAnnotation',
|
361
|
+
'MultiWordExpressionGRAnnotation',
|
362
|
+
'NegationModifierGRAnnotation',
|
363
|
+
'NominalPassiveSubjectGRAnnotation',
|
364
|
+
'NominalSubjectGRAnnotation',
|
365
|
+
'NounCompoundModifierGRAnnotation',
|
366
|
+
'NpAdverbialModifierGRAnnotation',
|
367
|
+
'NumberModifierGRAnnotation',
|
368
|
+
'NumericModifierGRAnnotation',
|
369
|
+
'ObjectGRAnnotation',
|
370
|
+
'ParataxisGRAnnotation',
|
371
|
+
'ParticipialModifierGRAnnotation',
|
372
|
+
'PhrasalVerbParticleGRAnnotation',
|
373
|
+
'PossessionModifierGRAnnotation',
|
374
|
+
'PossessiveModifierGRAnnotation',
|
375
|
+
'PreconjunctGRAnnotation',
|
376
|
+
'PredeterminerGRAnnotation',
|
377
|
+
'PredicateGRAnnotation',
|
378
|
+
'PrepositionalComplementGRAnnotation',
|
379
|
+
'PrepositionalModifierGRAnnotation',
|
380
|
+
'PrepositionalObjectGRAnnotation',
|
381
|
+
'PunctuationGRAnnotation',
|
382
|
+
'PurposeClauseModifierGRAnnotation',
|
383
|
+
'QuantifierModifierGRAnnotation',
|
384
|
+
'ReferentGRAnnotation',
|
385
|
+
'RelativeClauseModifierGRAnnotation',
|
386
|
+
'RelativeGRAnnotation',
|
387
|
+
'SemanticDependentGRAnnotation',
|
388
|
+
'SubjectGRAnnotation',
|
389
|
+
'TemporalModifierGRAnnotation',
|
390
|
+
'XClausalComplementGRAnnotation'
|
391
|
+
],
|
392
|
+
|
393
|
+
'nlp.trees.GrammaticalRelation' => [
|
394
|
+
'DependentGRAnnotation',
|
395
|
+
'GovernorGRAnnotation',
|
396
|
+
'GrammaticalRelationAnnotation',
|
397
|
+
'KillGRAnnotation',
|
398
|
+
'Language',
|
399
|
+
'RootGRAnnotation'
|
400
|
+
],
|
401
|
+
|
402
|
+
'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
|
403
|
+
'DependencyAnnotation',
|
404
|
+
'DocumentDirectoryAnnotation',
|
405
|
+
'DocumentIdAnnotation',
|
406
|
+
'EntityMentionsAnnotation',
|
407
|
+
'EventMentionsAnnotation',
|
408
|
+
'GenderAnnotation',
|
409
|
+
'RelationMentionsAnnotation',
|
410
|
+
'TriggerAnnotation'
|
411
|
+
],
|
412
|
+
|
413
|
+
'nlp.parser.lexparser.ParserAnnotations' => [
|
414
|
+
'ConstraintAnnotation'
|
415
|
+
],
|
416
|
+
|
417
|
+
'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
|
418
|
+
'BasicDependenciesAnnotation',
|
419
|
+
'CollapsedCCProcessedDependenciesAnnotation',
|
420
|
+
'CollapsedDependenciesAnnotation'
|
421
|
+
],
|
422
|
+
|
423
|
+
'nlp.time.TimeAnnotations' => [
|
424
|
+
'TimexAnnotation',
|
425
|
+
'TimexAnnotations'
|
426
|
+
],
|
427
|
+
|
428
|
+
'nlp.time.TimeExpression' => [
|
429
|
+
'Annotation',
|
430
|
+
'ChildrenAnnotation'
|
431
|
+
],
|
432
|
+
|
433
|
+
'nlp.trees.TreeCoreAnnotations' => [
|
434
|
+
'TreeHeadTagAnnotation',
|
435
|
+
'TreeHeadWordAnnotation',
|
436
|
+
'TreeAnnotation'
|
437
|
+
]
|
438
|
+
}
|
439
|
+
|
440
|
+
# Create a list of annotation names => paths.
|
441
|
+
annotations_by_name = {}
|
442
|
+
Annotations.each do |base_class, annotation_classes|
|
443
|
+
annotation_classes.each do |annotation_class|
|
444
|
+
annotations_by_name[annotation_class] ||= []
|
445
|
+
annotations_by_name[annotation_class] << base_class
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
449
|
+
# Hash of name => path.
|
450
|
+
AnnotationsByName = annotations_by_name
|
451
|
+
|
452
|
+
end
|
453
|
+
end
|
@@ -18,5 +18,32 @@ module StanfordCoreNLP
|
|
18
18
|
end
|
19
19
|
end
|
20
20
|
|
21
|
+
# Dynamically defined on all proxied annotation classes.
|
22
|
+
# Get an annotation using the annotation bridge.
|
23
|
+
def get(annotation, anno_base = nil)
|
24
|
+
if !java_methods.include?('get(Ljava.lang.Class;)')
|
25
|
+
raise'No annotation can be retrieved on this object.'
|
26
|
+
else
|
27
|
+
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
28
|
+
if anno_base
|
29
|
+
raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
|
30
|
+
anno_bases = [anno_base]
|
31
|
+
else
|
32
|
+
anno_bases = Config::AnnotationsByName[anno_class]
|
33
|
+
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
34
|
+
end
|
35
|
+
if anno_bases.size > 1
|
36
|
+
msg = "There are many different annotations bearing the name #{anno_class}. "
|
37
|
+
msg << "Please specify one of the following base classes as second parameter to disambiguate: "
|
38
|
+
msg << anno_bases.join(',')
|
39
|
+
raise msg
|
40
|
+
else
|
41
|
+
base_class = anno_bases[0]
|
42
|
+
end
|
43
|
+
url = "edu.stanford.#{base_class}$#{anno_class}"
|
44
|
+
AnnotationBridge.getAnnotation(self, url)
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
21
48
|
end
|
22
49
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: stanford-core-nlp
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-
|
12
|
+
date: 2012-02-04 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rjb
|
16
|
-
requirement: &
|
16
|
+
requirement: &70191057037760 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70191057037760
|
25
25
|
description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
|
26
26
|
language processing \ntools for English, including tokenization, part-of-speech
|
27
27
|
tagging, lemmatization, named entity recognition,\nparsing, and coreference resolution. "
|
@@ -31,9 +31,9 @@ executables: []
|
|
31
31
|
extensions: []
|
32
32
|
extra_rdoc_files: []
|
33
33
|
files:
|
34
|
+
- lib/stanford-core-nlp/config.rb
|
34
35
|
- lib/stanford-core-nlp/jar_loader.rb
|
35
36
|
- lib/stanford-core-nlp/java_wrapper.rb
|
36
|
-
- lib/stanford-core-nlp/stanford_annotations.rb
|
37
37
|
- lib/stanford-core-nlp.rb
|
38
38
|
- bin/bridge.jar
|
39
39
|
- bin/INFO
|
@@ -1,401 +0,0 @@
|
|
1
|
-
module StanfordCoreNLP
|
2
|
-
|
3
|
-
# @private
|
4
|
-
Annotations = {
|
5
|
-
|
6
|
-
'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
|
7
|
-
'AdjectivalModifierGRAnnotation',
|
8
|
-
'AdverbialModifierGRAnnotation',
|
9
|
-
'ArgumentGRAnnotation',
|
10
|
-
'AspectMarkerGRAnnotation',
|
11
|
-
'AssociativeMarkerGRAnnotation',
|
12
|
-
'AssociativeModifierGRAnnotation',
|
13
|
-
'AttributiveGRAnnotation',
|
14
|
-
'AuxModifierGRAnnotation',
|
15
|
-
'AuxPassiveGRAnnotation',
|
16
|
-
'BaGRAnnotation',
|
17
|
-
'ClausalComplementGRAnnotation',
|
18
|
-
'ClausalSubjectGRAnnotation',
|
19
|
-
'ClauseModifierGRAnnotation',
|
20
|
-
'ComplementGRAnnotation',
|
21
|
-
'ComplementizerGRAnnotation',
|
22
|
-
'ControllingSubjectGRAnnotation',
|
23
|
-
'CoordinationGRAnnotation',
|
24
|
-
'DeterminerGRAnnotation',
|
25
|
-
'DirectObjectGRAnnotation',
|
26
|
-
'DvpMarkerGRAnnotation',
|
27
|
-
'DvpModifierGRAnnotation',
|
28
|
-
'EtcGRAnnotation',
|
29
|
-
'LocalizerComplementGRAnnotation',
|
30
|
-
'ModalGRAnnotation',
|
31
|
-
'ModifierGRAnnotation',
|
32
|
-
'NegationModifierGRAnnotation',
|
33
|
-
'NominalPassiveSubjectGRAnnotation',
|
34
|
-
'NominalSubjectGRAnnotation',
|
35
|
-
'NounCompoundModifierGRAnnotation',
|
36
|
-
'NumberModifierGRAnnotation',
|
37
|
-
'NumericModifierGRAnnotation',
|
38
|
-
'ObjectGRAnnotation',
|
39
|
-
'OrdNumberGRAnnotation',
|
40
|
-
'ParentheticalGRAnnotation',
|
41
|
-
'ParticipialModifierGRAnnotation',
|
42
|
-
'PreconjunctGRAnnotation',
|
43
|
-
'PrepositionalLocalizerModifierGRAnnotation',
|
44
|
-
'PrepositionalModifierGRAnnotation',
|
45
|
-
'PrepositionalObjectGRAnnotation',
|
46
|
-
'PunctuationGRAnnotation',
|
47
|
-
'RangeGRAnnotation',
|
48
|
-
'RelativeClauseModifierGRAnnotation',
|
49
|
-
'ResultativeComplementGRAnnotation',
|
50
|
-
'SemanticDependentGRAnnotation',
|
51
|
-
'SubjectGRAnnotation',
|
52
|
-
'TemporalClauseGRAnnotation',
|
53
|
-
'TemporalGRAnnotation',
|
54
|
-
'TimePostpositionGRAnnotation',
|
55
|
-
'TopicGRAnnotation',
|
56
|
-
'VerbCompoundGRAnnotation',
|
57
|
-
'VerbModifierGRAnnotation',
|
58
|
-
'XClausalComplementGRAnnotation'
|
59
|
-
],
|
60
|
-
|
61
|
-
'nlp.dcoref.CoNLL2011DocumentReader' => [
|
62
|
-
'CorefMentionAnnotation',
|
63
|
-
'NamedEntityAnnotation'
|
64
|
-
],
|
65
|
-
|
66
|
-
'nlp.ling.CoreAnnotations' => [
|
67
|
-
|
68
|
-
'AbbrAnnotation',
|
69
|
-
'AbgeneAnnotation',
|
70
|
-
'AbstrAnnotation',
|
71
|
-
'AfterAnnotation',
|
72
|
-
'AnswerAnnotation',
|
73
|
-
'AnswerObjectAnnotation',
|
74
|
-
'AntecedentAnnotation',
|
75
|
-
'ArgDescendentAnnotation',
|
76
|
-
'ArgumentAnnotation',
|
77
|
-
'BagOfWordsAnnotation',
|
78
|
-
'BeAnnotation',
|
79
|
-
'BeforeAnnotation',
|
80
|
-
'BeginIndexAnnotation',
|
81
|
-
'BestCliquesAnnotation',
|
82
|
-
'BestFullAnnotation',
|
83
|
-
'CalendarAnnotation',
|
84
|
-
'CategoryAnnotation',
|
85
|
-
'CategoryFunctionalTagAnnotation',
|
86
|
-
'CharacterOffsetBeginAnnotation',
|
87
|
-
'CharacterOffsetEndAnnotation',
|
88
|
-
'CharAnnotation',
|
89
|
-
'ChineseCharAnnotation',
|
90
|
-
'ChineseIsSegmentedAnnotation',
|
91
|
-
'ChineseOrigSegAnnotation',
|
92
|
-
'ChineseSegAnnotation',
|
93
|
-
'ChunkAnnotation',
|
94
|
-
'CoarseTagAnnotation',
|
95
|
-
'CommonWordsAnnotation',
|
96
|
-
'CoNLLDepAnnotation',
|
97
|
-
'CoNLLDepParentIndexAnnotation',
|
98
|
-
'CoNLLDepTypeAnnotation',
|
99
|
-
'CoNLLPredicateAnnotation',
|
100
|
-
'CoNLLSRLAnnotation',
|
101
|
-
'ContextsAnnotation',
|
102
|
-
'CopyAnnotation',
|
103
|
-
'CostMagnificationAnnotation',
|
104
|
-
'CovertIDAnnotation',
|
105
|
-
'D2_LBeginAnnotation',
|
106
|
-
'D2_LEndAnnotation',
|
107
|
-
'D2_LMiddleAnnotation',
|
108
|
-
'DayAnnotation',
|
109
|
-
'DependentsAnnotation',
|
110
|
-
'DictAnnotation',
|
111
|
-
'DistSimAnnotation',
|
112
|
-
'DoAnnotation',
|
113
|
-
'DocDateAnnotation',
|
114
|
-
'DocIDAnnotation',
|
115
|
-
'DomainAnnotation',
|
116
|
-
'EndIndexAnnotation',
|
117
|
-
'EntityClassAnnotation',
|
118
|
-
'EntityRuleAnnotation',
|
119
|
-
'EntityTypeAnnotation',
|
120
|
-
'FeaturesAnnotation',
|
121
|
-
'FemaleGazAnnotation',
|
122
|
-
'FirstChildAnnotation',
|
123
|
-
'ForcedSentenceEndAnnotation',
|
124
|
-
'FreqAnnotation',
|
125
|
-
'GazAnnotation',
|
126
|
-
'GazetteerAnnotation',
|
127
|
-
'GenericTokensAnnotation',
|
128
|
-
'GeniaAnnotation',
|
129
|
-
'GoldAnswerAnnotation',
|
130
|
-
'GovernorAnnotation',
|
131
|
-
'GrandparentAnnotation',
|
132
|
-
'HaveAnnotation',
|
133
|
-
'HeadWordStringAnnotation',
|
134
|
-
'HeightAnnotation',
|
135
|
-
'IDAnnotation',
|
136
|
-
'IDFAnnotation',
|
137
|
-
'INAnnotation',
|
138
|
-
'IndexAnnotation',
|
139
|
-
'InterpretationAnnotation',
|
140
|
-
'IsDateRangeAnnotation',
|
141
|
-
'IsURLAnnotation',
|
142
|
-
'LabelAnnotation',
|
143
|
-
'LastGazAnnotation',
|
144
|
-
'LastTaggedAnnotation',
|
145
|
-
'LBeginAnnotation',
|
146
|
-
'LeftChildrenNodeAnnotation',
|
147
|
-
'LeftTermAnnotation',
|
148
|
-
'LemmaAnnotation',
|
149
|
-
'LEndAnnotation',
|
150
|
-
'LengthAnnotation',
|
151
|
-
'LMiddleAnnotation',
|
152
|
-
'MaleGazAnnotation',
|
153
|
-
'MarkingAnnotation',
|
154
|
-
'MonthAnnotation',
|
155
|
-
'MorphoCaseAnnotation',
|
156
|
-
'MorphoGenAnnotation',
|
157
|
-
'MorphoNumAnnotation',
|
158
|
-
'MorphoPersAnnotation',
|
159
|
-
'NamedEntityTagAnnotation',
|
160
|
-
'NeighborsAnnotation',
|
161
|
-
'NERIDAnnotation',
|
162
|
-
'NormalizedNamedEntityTagAnnotation',
|
163
|
-
'NotAnnotation',
|
164
|
-
'NumericCompositeObjectAnnotation',
|
165
|
-
'NumericCompositeTypeAnnotation',
|
166
|
-
'NumericCompositeValueAnnotation',
|
167
|
-
'NumericObjectAnnotation',
|
168
|
-
'NumericTypeAnnotation',
|
169
|
-
'NumericValueAnnotation',
|
170
|
-
'NumerizedTokensAnnotation',
|
171
|
-
'NumTxtSentencesAnnotation',
|
172
|
-
'OriginalAnswerAnnotation',
|
173
|
-
'OriginalCharAnnotation',
|
174
|
-
'OriginalTextAnnotation',
|
175
|
-
'ParagraphAnnotation',
|
176
|
-
'ParagraphsAnnotation',
|
177
|
-
'ParaPositionAnnotation',
|
178
|
-
'ParentAnnotation',
|
179
|
-
'PartOfSpeechAnnotation',
|
180
|
-
'PercentAnnotation',
|
181
|
-
'PhraseWordsAnnotation',
|
182
|
-
'PhraseWordsTagAnnotation',
|
183
|
-
'PolarityAnnotation',
|
184
|
-
'PositionAnnotation',
|
185
|
-
'PossibleAnswersAnnotation',
|
186
|
-
'PredictedAnswerAnnotation',
|
187
|
-
'PrevChildAnnotation',
|
188
|
-
'PriorAnnotation',
|
189
|
-
'ProjectedCategoryAnnotation',
|
190
|
-
'ProtoAnnotation',
|
191
|
-
'RoleAnnotation',
|
192
|
-
'SectionAnnotation',
|
193
|
-
'SemanticHeadTagAnnotation',
|
194
|
-
'SemanticHeadWordAnnotation',
|
195
|
-
'SemanticTagAnnotation',
|
196
|
-
'SemanticWordAnnotation',
|
197
|
-
'SentenceIDAnnotation',
|
198
|
-
'SentenceIndexAnnotation',
|
199
|
-
'SentencePositionAnnotation',
|
200
|
-
'SentencesAnnotation',
|
201
|
-
'ShapeAnnotation',
|
202
|
-
'SpaceBeforeAnnotation',
|
203
|
-
'SpanAnnotation',
|
204
|
-
'SpeakerAnnotation',
|
205
|
-
'SRL_ID',
|
206
|
-
'SRLIDAnnotation',
|
207
|
-
'SRLInstancesAnnotation',
|
208
|
-
'StackedNamedEntityTagAnnotation',
|
209
|
-
'StateAnnotation',
|
210
|
-
'StemAnnotation',
|
211
|
-
'SubcategorizationAnnotation',
|
212
|
-
'TagLabelAnnotation',
|
213
|
-
'TextAnnotation',
|
214
|
-
'TokenBeginAnnotation',
|
215
|
-
'TokenEndAnnotation',
|
216
|
-
'TokensAnnotation',
|
217
|
-
'TopicAnnotation',
|
218
|
-
'TrueCaseAnnotation',
|
219
|
-
'TrueCaseTextAnnotation',
|
220
|
-
'TrueTagAnnotation',
|
221
|
-
'UBlockAnnotation',
|
222
|
-
'UnaryAnnotation',
|
223
|
-
'UnknownAnnotation',
|
224
|
-
'UtteranceAnnotation',
|
225
|
-
'UTypeAnnotation',
|
226
|
-
'ValueAnnotation',
|
227
|
-
'VerbSenseAnnotation',
|
228
|
-
'WebAnnotation',
|
229
|
-
'WordFormAnnotation',
|
230
|
-
'WordnetSynAnnotation',
|
231
|
-
'WordPositionAnnotation',
|
232
|
-
'WordSenseAnnotation',
|
233
|
-
'XmlContextAnnotation',
|
234
|
-
'XmlElementAnnotation',
|
235
|
-
'YearAnnotation'
|
236
|
-
],
|
237
|
-
|
238
|
-
'nlp.dcoref.CorefCoreAnnotations' => [
|
239
|
-
|
240
|
-
'CorefAnnotation',
|
241
|
-
'CorefChainAnnotation',
|
242
|
-
'CorefClusterAnnotation',
|
243
|
-
'CorefClusterIdAnnotation',
|
244
|
-
'CorefDestAnnotation',
|
245
|
-
'CorefGraphAnnotation'
|
246
|
-
],
|
247
|
-
|
248
|
-
'nlp.ling.CoreLabel' => [
|
249
|
-
'GenericAnnotation'
|
250
|
-
],
|
251
|
-
|
252
|
-
'nlp.trees.EnglishGrammaticalRelations' => [
|
253
|
-
'AbbreviationModifierGRAnnotation',
|
254
|
-
'AdjectivalComplementGRAnnotation',
|
255
|
-
'AdjectivalModifierGRAnnotation',
|
256
|
-
'AdvClauseModifierGRAnnotation',
|
257
|
-
'AdverbialModifierGRAnnotation',
|
258
|
-
'AgentGRAnnotation',
|
259
|
-
'AppositionalModifierGRAnnotation',
|
260
|
-
'ArgumentGRAnnotation',
|
261
|
-
'AttributiveGRAnnotation',
|
262
|
-
'AuxModifierGRAnnotation',
|
263
|
-
'AuxPassiveGRAnnotation',
|
264
|
-
'ClausalComplementGRAnnotation',
|
265
|
-
'ClausalPassiveSubjectGRAnnotation',
|
266
|
-
'ClausalSubjectGRAnnotation',
|
267
|
-
'ComplementGRAnnotation',
|
268
|
-
'ComplementizerGRAnnotation',
|
269
|
-
'ConjunctGRAnnotation',
|
270
|
-
'ControllingSubjectGRAnnotation',
|
271
|
-
'CoordinationGRAnnotation',
|
272
|
-
'CopulaGRAnnotation',
|
273
|
-
'DeterminerGRAnnotation',
|
274
|
-
'DirectObjectGRAnnotation',
|
275
|
-
'ExpletiveGRAnnotation',
|
276
|
-
'IndirectObjectGRAnnotation',
|
277
|
-
'InfinitivalModifierGRAnnotation',
|
278
|
-
'MarkerGRAnnotation',
|
279
|
-
'ModifierGRAnnotation',
|
280
|
-
'MultiWordExpressionGRAnnotation',
|
281
|
-
'NegationModifierGRAnnotation',
|
282
|
-
'NominalPassiveSubjectGRAnnotation',
|
283
|
-
'NominalSubjectGRAnnotation',
|
284
|
-
'NounCompoundModifierGRAnnotation',
|
285
|
-
'NpAdverbialModifierGRAnnotation',
|
286
|
-
'NumberModifierGRAnnotation',
|
287
|
-
'NumericModifierGRAnnotation',
|
288
|
-
'ObjectGRAnnotation',
|
289
|
-
'ParataxisGRAnnotation',
|
290
|
-
'ParticipialModifierGRAnnotation',
|
291
|
-
'PhrasalVerbParticleGRAnnotation',
|
292
|
-
'PossessionModifierGRAnnotation',
|
293
|
-
'PossessiveModifierGRAnnotation',
|
294
|
-
'PreconjunctGRAnnotation',
|
295
|
-
'PredeterminerGRAnnotation',
|
296
|
-
'PredicateGRAnnotation',
|
297
|
-
'PrepositionalComplementGRAnnotation',
|
298
|
-
'PrepositionalModifierGRAnnotation',
|
299
|
-
'PrepositionalObjectGRAnnotation',
|
300
|
-
'PunctuationGRAnnotation',
|
301
|
-
'PurposeClauseModifierGRAnnotation',
|
302
|
-
'QuantifierModifierGRAnnotation',
|
303
|
-
'ReferentGRAnnotation',
|
304
|
-
'RelativeClauseModifierGRAnnotation',
|
305
|
-
'RelativeGRAnnotation',
|
306
|
-
'SemanticDependentGRAnnotation',
|
307
|
-
'SubjectGRAnnotation',
|
308
|
-
'TemporalModifierGRAnnotation',
|
309
|
-
'XClausalComplementGRAnnotation'
|
310
|
-
],
|
311
|
-
|
312
|
-
'nlp.trees.GrammaticalRelation' => [
|
313
|
-
'DependentGRAnnotation',
|
314
|
-
'GovernorGRAnnotation',
|
315
|
-
'GrammaticalRelationAnnotation',
|
316
|
-
'KillGRAnnotation',
|
317
|
-
'Language',
|
318
|
-
'RootGRAnnotation'
|
319
|
-
],
|
320
|
-
|
321
|
-
'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
|
322
|
-
'DependencyAnnotation',
|
323
|
-
'DocumentDirectoryAnnotation',
|
324
|
-
'DocumentIdAnnotation',
|
325
|
-
'EntityMentionsAnnotation',
|
326
|
-
'EventMentionsAnnotation',
|
327
|
-
'GenderAnnotation',
|
328
|
-
'RelationMentionsAnnotation',
|
329
|
-
'TriggerAnnotation'
|
330
|
-
],
|
331
|
-
|
332
|
-
'nlp.parser.lexparser.ParserAnnotations' => [
|
333
|
-
'ConstraintAnnotation'
|
334
|
-
],
|
335
|
-
|
336
|
-
'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
|
337
|
-
'BasicDependenciesAnnotation',
|
338
|
-
'CollapsedCCProcessedDependenciesAnnotation',
|
339
|
-
'CollapsedDependenciesAnnotation'
|
340
|
-
],
|
341
|
-
|
342
|
-
'nlp.time.TimeAnnotations' => [
|
343
|
-
'TimexAnnotation',
|
344
|
-
'TimexAnnotations'
|
345
|
-
],
|
346
|
-
|
347
|
-
'nlp.time.TimeExpression' => [
|
348
|
-
'Annotation',
|
349
|
-
'ChildrenAnnotation'
|
350
|
-
],
|
351
|
-
|
352
|
-
'nlp.trees.TreeCoreAnnotations' => [
|
353
|
-
'TreeHeadTagAnnotation',
|
354
|
-
'TreeHeadWordAnnotation',
|
355
|
-
'TreeAnnotation'
|
356
|
-
]
|
357
|
-
}
|
358
|
-
|
359
|
-
annotations_by_name = {}
|
360
|
-
Annotations.each do |base_class, annotation_classes|
|
361
|
-
annotation_classes.each do |annotation_class|
|
362
|
-
annotations_by_name[annotation_class] ||= []
|
363
|
-
annotations_by_name[annotation_class] << base_class
|
364
|
-
end
|
365
|
-
end
|
366
|
-
|
367
|
-
AnnotationsByName = annotations_by_name
|
368
|
-
|
369
|
-
# Modify the Rjb JavaProxy class to add our own method to get annotations.
|
370
|
-
Rjb::Rjb_JavaProxy.class_eval do
|
371
|
-
|
372
|
-
# Dynamically defined on all proxied annotation classes.
|
373
|
-
# Get an annotation using the annotation bridge.
|
374
|
-
def get(annotation, anno_base = nil)
|
375
|
-
if !java_methods.include?('get(Ljava.lang.Class;)')
|
376
|
-
raise'No annotation can be retrieved on this object.'
|
377
|
-
else
|
378
|
-
anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
|
379
|
-
if anno_base
|
380
|
-
raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
|
381
|
-
anno_bases = [anno_base]
|
382
|
-
else
|
383
|
-
anno_bases = AnnotationsByName[anno_class]
|
384
|
-
raise "The annotation #{anno_class} doesn't exist." unless anno_bases
|
385
|
-
end
|
386
|
-
if anno_bases.size > 1
|
387
|
-
msg = "There are many different annotations bearing the name #{anno_class}. "
|
388
|
-
msg << "Please specify one of the following base classes as second parameter to disambiguate: "
|
389
|
-
msg << anno_bases.join(',')
|
390
|
-
raise msg
|
391
|
-
else
|
392
|
-
base_class = anno_bases[0]
|
393
|
-
end
|
394
|
-
url = "edu.stanford.#{base_class}$#{anno_class}"
|
395
|
-
AnnotationBridge.getAnnotation(self, url)
|
396
|
-
end
|
397
|
-
end
|
398
|
-
|
399
|
-
end
|
400
|
-
|
401
|
-
end
|