stanford-core-nlp 0.1.4 → 0.1.5

Sign up to get free protection for your applications and to get access to all the features.
data/README.markdown CHANGED
@@ -1,12 +1,12 @@
1
1
  **About**
2
2
 
3
- This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for English, including tokenization, part-of-speech tagging, lemmatization, named entity recognition, parsing, and coreference resolution.
3
+ This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that features tokenization, part-of-speech tagging, lemmatization, and parsing for five languages (English, French, German, Arabic and Chinese), as well as named entity recognition and coreference resolution for English.
4
4
 
5
5
  **Installing**
6
6
 
7
7
  1. Install the gem: `gem install stanford-core-nlp`.
8
8
 
9
- 2. Download the Stanford Core NLP JAR and model files [here](http://louismullie.com/stanford-core-nlp-english.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/). This package only includes model files for English; see below for information on adding model files for other languages.
9
+ 2. Download the Stanford Core NLP JAR and model files. Two package are available with the necessary files: a package for [English only](http://louismullie.com/stanford-core-nlp-english.zip), or a package with models for [all languages](http://louismullie.com/stanford-core-nlp-all.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/).
10
10
 
11
11
  **Configuration**
12
12
 
@@ -23,18 +23,12 @@ After installing and requiring the gem (`require 'stanford-core-nlp'`), you may
23
23
  # Redirect VM output to log.txt
24
24
  StanfordCoreNLP.log_file = 'log.txt'
25
25
 
26
- You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
27
-
28
- # Default base class is edu.stanford.nlp.pipeline.
29
- StanfordCoreNLP.load('PTBTokenizerAnnotator')
30
- puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
31
- # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
32
-
33
- # Here, we specify another base class.
34
- StanfordCoreNLP.load('MaxentTagger', 'edu.stanford.nlp.tagger')
35
- puts StanfordCoreNLP::MaxentTagger.inspect
36
- # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
26
+ # Use the model files for a different language than English.
27
+ StanfordCoreNLP.use(:french)
37
28
 
29
+ # Change a specific model file.
30
+ StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
31
+
38
32
  **Using the gem**
39
33
 
40
34
  text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
@@ -64,22 +58,27 @@ You may also want to load your own classes from the Stanford NLP to do more spec
64
58
  end
65
59
  end
66
60
 
67
- A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'stanford_annotations.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding ot a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
61
+ > Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
68
62
 
69
- **Adding models for other languages for the parser and tagger**
63
+ A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
70
64
 
71
- - For the Stanford Parser, download the [parser files](http://nlp.stanford.edu/software/lex-parser.shtml), and copy from the grammar/ directory the grammars you need into the gem's bin/grammar directory (e.g. /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/grammar). Grammars are available for Arabic, Chinese, French, German and Xinhua.
72
- - For the Stanford Tagger, download the [tagger files](http://nlp.stanford.edu/software/tagger.shtml), and copy from the models/ directory the models you need into the gem's bin/models directory. Models are available for Arabic, Chinese, French and German.
65
+ **Loading specific classes**
73
66
 
74
- Then, configure the gem to use your newly added files, e.g.:
75
-
76
- StanfordCoreNLP.set_model('parser.model', '/path/to/gem/bin/grammar/chinesePCFG.ser.gz')
77
- StanfordCoreNLP.set_model('tagger.model', '/path/to/gem/bin/grammar/chinese.tagger')
78
- pipeline = StanfordCoreNLP.load(:ssplit, :tokenize, :pos, :parse)
67
+ You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
68
+
69
+ # Default base class is edu.stanford.nlp.pipeline.
70
+ StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
71
+ puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
72
+ # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
73
+
74
+ # Here, we specify another base class.
75
+ StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
76
+ puts StanfordCoreNLP::MaxentTagger.inspect
77
+ # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
79
78
 
80
79
  **Current known issues**
81
80
 
82
- The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be very grateful if somebody could add/e-mail me these files.
81
+ The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be grateful if somebody could add/e-mail me these files.
83
82
 
84
83
  **Contributing**
85
84
 
data/bin/INFO CHANGED
@@ -1 +1 @@
1
- This is where you should put the JAR files.
1
+ This is where you should put the JAR files and the folders with the model files.
@@ -1,81 +1,135 @@
1
1
  module StanfordCoreNLP
2
2
 
3
- VERSION = '0.1.4'
4
- require 'stanford-core-nlp/jar_loader.rb'
3
+ VERSION = '0.1.5'
4
+ require 'stanford-core-nlp/jar_loader'
5
5
  require 'stanford-core-nlp/java_wrapper'
6
- require 'stanford-core-nlp/stanford_annotations'
7
-
6
+ require 'stanford-core-nlp/config'
7
+
8
8
  class << self
9
- # The path in which to look for the Stanford JAR files.
10
- # This is passed to JarLoader.
9
+ # The path in which to look for the Stanford JAR files,
10
+ # with a trailing slash.
11
+ #
12
+ # The structure of the JAR folder must be as follows:
13
+ #
14
+ # Files:
15
+ #
16
+ # /stanford-core-nlp.jar
17
+ # /joda-time.jar
18
+ # /xom.jar
19
+ # /bridge.jar*
20
+ #
21
+ # Folders:
22
+ #
23
+ # /classifiers # Models for the NER system.
24
+ # /dcoref # Models for the coreference resolver.
25
+ # /taggers # Models for the POS tagger.
26
+ # /grammar # Models for the parser.
27
+ #
28
+ # *The file bridge.jar is a thin JAVA wrapper over the
29
+ # Stanford Core NLP get() function, which allows to
30
+ # retrieve annotations using static classes as names.
31
+ # This works around one of the lacunae of Rjb.
11
32
  attr_accessor :jar_path
12
- # The flags for starting the JVM machine.
13
- # Parser and named entity recognizer are very memory consuming.
33
+ # The flags for starting the JVM machine. The parser
34
+ # and named entity recognizer are very memory consuming.
14
35
  attr_accessor :jvm_args
15
36
  # A file to redirect JVM output to.
16
37
  attr_accessor :log_file
17
- # The model files. Use #set_model to modify these.
38
+ # The model files for a given language.
18
39
  attr_accessor :model_files
19
40
  end
20
41
 
21
42
  # The default JAR path is the gem's bin folder.
22
43
  self.jar_path = File.dirname(__FILE__) + '/../bin/'
23
- # Load the JVM with a minimum heap size of 512MB and a
44
+ # Load the JVM with a minimum heap size of 512MB and a
24
45
  # maximum heap size of 1024MB.
25
46
  self.jvm_args = ['-Xms512M', '-Xmx1024M']
26
47
  # Turn logging off by default.
27
48
  self.log_file = nil
28
49
 
29
- # Default model files.
30
- self.model_files = {
31
- 'pos.model' => 'taggers/english-left3words-distsim.tagger',
32
- 'ner.model.3class' => 'classifiers/all.3class.distsim.crf.ser.gz',
33
- 'ner.model.7class' => 'classifiers/muc.7class.distsim.crf.ser.gz',
34
- 'ner.model.MISCclass' => 'classifiers/conll.4class.distsim.crf.ser.gz',
35
- 'parser.model' => 'grammar/englishPCFG.ser.gz',
36
- 'dcoref.demonym' => 'dcoref/demonyms.txt',
37
- 'dcoref.animate' => 'dcoref/animate.unigrams.txt',
38
- 'dcoref.female' => 'dcoref/female.unigrams.txt',
39
- 'dcoref.inanimate' => 'dcoref/inanimate.unigrams.txt',
40
- 'dcoref.male' => 'dcoref/male.unigrams.txt',
41
- 'dcoref.neutral' => 'dcoref/neutral.unigrams.txt',
42
- 'dcoref.plural' => 'dcoref/plural.unigrams.txt',
43
- 'dcoref.singular' => 'dcoref/singular.unigrams.txt',
44
- 'dcoref.states' => 'dcoref/state-abbreviations.txt',
45
- 'dcoref.countries' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
46
- 'dcoref.states.provinces' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
47
- 'dcoref.extra.gender' => 'dcoref/namegender.combine.txt'
48
- }
49
50
 
50
- # Whether the classes are initialized or not.
51
- @@initialized = false
52
- # Whether the jars are loaded or not.
53
- @@loaded = false
51
+ # Use models for a given language. Language can be
52
+ # supplied as full-length, or ISO-639 2 or 3 letter
53
+ # code (e.g. :english, :eng or :en will work).
54
+ def self.use(language)
55
+ lang = nil
56
+ self.model_files = {}
57
+ Config::LanguageCodes.each do |l,codes|
58
+ lang = codes[2] if codes.include?(language)
59
+ end
60
+ Config::Models.each do |n, languages|
61
+ models = languages[lang]
62
+ folder = Config::ModelFolders[n]
63
+ if models.is_a?(Hash)
64
+ n = n.to_s
65
+ n += '.model' if n == 'ner'
66
+ models.each do |m, file|
67
+ self.model_files["#{n}.#{m}"] =
68
+ folder + file
69
+ end
70
+ elsif models.is_a?(String)
71
+ self.model_files["#{n}.model"] =
72
+ folder + models
73
+ end
74
+ end
75
+ end
76
+
77
+ # Use english by default.
78
+ self.use(:english)
54
79
 
55
- # Set a model file.
80
+ # Set a model file. Here are the default models for English:
81
+ #
82
+ # 'pos.model' => 'english-left3words-distsim.tagger',
83
+ # 'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
84
+ # 'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
85
+ # 'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
86
+ # 'parser.model' => 'englishPCFG.ser.gz',
87
+ # 'dcoref.demonym' => 'demonyms.txt',
88
+ # 'dcoref.animate' => 'animate.unigrams.txt',
89
+ # 'dcoref.female' => 'female.unigrams.txt',
90
+ # 'dcoref.inanimate' => 'inanimate.unigrams.txt',
91
+ # 'dcoref.male' => 'male.unigrams.txt',
92
+ # 'dcoref.neutral' => 'neutral.unigrams.txt',
93
+ # 'dcoref.plural' => 'plural.unigrams.txt',
94
+ # 'dcoref.singular' => 'singular.unigrams.txt',
95
+ # 'dcoref.states' => 'state-abbreviations.txt',
96
+ # 'dcoref.extra.gender' => 'namegender.combine.txt'
97
+ #
56
98
  def self.set_model(name, file)
57
- unless File.readable?(self.jar_path + file)
58
- raise "JAR file #{self.jar_path + file} could not be found." +
59
- "You may need to download this file manually and/or set paths properly."
60
- end
61
- self.model_files[name] = file
99
+ n = name.split('.')[0].intern
100
+ self.model_files[name] =
101
+ Config::ModelFolders[n] + file
62
102
  end
63
103
 
104
+ # Whether the classes are initialized or not.
105
+ @@initialized = false
106
+ # Whether the JAR files are loaded or not.
107
+ @@loaded = false
108
+
64
109
  # Load the JARs, create the classes.
65
110
  def self.init
66
111
  self.load_jars unless @@loaded
67
112
  self.create_classes
68
113
  @@initialized = true
69
114
  end
70
-
71
- # Load a StanfordCoreNLP pipeline with the specified JVM flags and
72
- # StanfordCoreNLP properties (hash of property => values).
115
+
116
+ # Load a StanfordCoreNLP pipeline with the
117
+ # specified JVM flags and StanfordCoreNLP
118
+ # properties.
73
119
  def self.load(*annotators)
74
120
  self.init unless @@initialized
75
121
  # Prepend the JAR path to the model files.
76
122
  properties = {}
77
- self.model_files.each { |k,v| properties[k] = self.jar_path + v }
78
- properties['annotators'] =
123
+ self.model_files.each do |k,v|
124
+ f = self.jar_path + v
125
+ unless File.readable?(f)
126
+ raise "Model file #{f} could not be found. " +
127
+ "You may need to download this file manually and/or set paths properly."
128
+ else
129
+ properties[k] = f
130
+ end
131
+ end
132
+ properties['annotators'] =
79
133
  annotators.map { |x| x.to_s }.join(', ')
80
134
  CoreNLP.new(get_properties(properties))
81
135
  end
@@ -101,17 +155,37 @@ module StanfordCoreNLP
101
155
  const_set(:Properties, Rjb::import('java.util.Properties'))
102
156
  const_set(:AnnotationBridge, Rjb::import('AnnotationBridge'))
103
157
  end
104
-
158
+
105
159
  # Load a class (e.g. PTBTokenizerAnnotator) in a specific
106
160
  # class path (default is 'edu.stanford.nlp.pipeline').
107
161
  # The class is then accessible under the StanfordCoreNLP
108
162
  # namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
163
+ #
164
+ # List of annotators:
165
+ #
166
+ # - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
167
+ # - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
168
+ # - POSTaggerAnnotator - annotates the text with part-of-speech tags.
169
+ # - MorphaAnnotator - morphological normalizer (generates lemmas).
170
+ # - NERAnnotator - annotates the text with named-entity labels.
171
+ # - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
172
+ # - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
173
+ # - ParserAnnotator - generates constituent and dependency trees.
174
+ # - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
175
+ # - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
176
+ # - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
177
+ # - SRLAnnotator - annotates predicates and their semantic roles.
178
+ # - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
179
+ # - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
180
+ # - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
109
181
  def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
110
182
  self.load_jars unless @@loaded
111
183
  const_set(klass.intern, Rjb::import("#{base}.#{klass}"))
112
184
  end
113
-
114
- # Create a java.util.Properties object from a hash.
185
+
186
+ # Private helper functions.
187
+ private
188
+ # HCreate a java.util.Properties object from a hash.
115
189
  def self.get_properties(properties)
116
190
  props = Properties.new
117
191
  properties.each do |property, value|
@@ -119,10 +193,10 @@ module StanfordCoreNLP
119
193
  end
120
194
  props
121
195
  end
122
-
123
- # Helper function: under_case -> CamelCase.
196
+
197
+ # Under_case -> CamelCase.
124
198
  def self.camel_case(text)
125
199
  text.to_s.gsub(/^[a-z]|_[a-z]/) { |a| a.upcase }.gsub('_', '')
126
200
  end
127
-
128
- end
201
+
202
+ end
@@ -0,0 +1,453 @@
1
+ module StanfordCoreNLP
2
+
3
+ class Config
4
+
5
+ # A hash of language codes in humanized,
6
+ # 2 and 3-letter ISO639 codes.
7
+ LanguageCodes = {
8
+ :english => [:en, :eng, :english],
9
+ :german => [:de, :ger, :german],
10
+ :french => [:fr, :fre, :french],
11
+ :arabic => [:ar, :ara, :arabic],
12
+ :chinese => [:ch, :chi, :chinese],
13
+ :xinhua => [:xi, :xin, :xinhua]
14
+ }
15
+
16
+ # Folders inside the JAR path for the models.
17
+ ModelFolders = {
18
+ :pos => 'taggers/',
19
+ :parser => 'grammar/',
20
+ :ner => 'classifiers/',
21
+ :dcoref => 'dcoref/'
22
+ }
23
+
24
+ # Default models for all languages.
25
+ Models = {
26
+ :pos => {
27
+ :english => 'english-left3words-distsim.tagger',
28
+ :german => 'german-fast.tagger',
29
+ :french => 'french.tagger',
30
+ :arabic => 'arabic-fast.tagger',
31
+ :chinese => 'chinese.tagger',
32
+ :xinhua => nil
33
+ },
34
+ :parser => {
35
+ :english => 'englishPCFG.ser.gz',
36
+ :german => 'germanPCFG.ser.gz',
37
+ :french => 'frenchFactored.ser.gz',
38
+ :arabic => 'arabicFactored.ser.gz',
39
+ :chinese => 'chinesePCFG.ser.gz',
40
+ :xinhua => 'xinhuaPCFG.ser.gz'
41
+ },
42
+ :ner => {
43
+ :english => {
44
+ '3class' => 'all.3class.distsim.crf.ser.gz',
45
+ '7class' => 'muc.7class.distsim.crf.ser.gz',
46
+ 'MISCclass' => 'conll.4class.distsim.crf.ser.gz'
47
+ },
48
+ :german => {},
49
+ :french => {},
50
+ :arabic => {},
51
+ :chinese => {},
52
+ :xinhua => {}
53
+ },
54
+ :dcoref => {
55
+ :english => {
56
+ 'demonym' => 'demonyms.txt',
57
+ 'animate' => 'animate.unigrams.txt',
58
+ 'female' => 'female.unigrams.txt',
59
+ 'inanimate' => 'inanimate.unigrams.txt',
60
+ 'male' => 'male.unigrams.txt',
61
+ 'neutral' => 'neutral.unigrams.txt',
62
+ 'plural' => 'plural.unigrams.txt',
63
+ 'singular' => 'singular.unigrams.txt',
64
+ 'states' => 'state-abbreviations.txt',
65
+ 'countries' => 'unknown.txt', # Fix - can somebody provide this file?
66
+ 'states.provinces' => 'unknown.txt', # Fix - can somebody provide this file?
67
+ 'extra.gender' => 'namegender.combine.txt'
68
+ },
69
+ :german => {},
70
+ :french => {},
71
+ :arabic => {},
72
+ :chinese => {},
73
+ :xinhua => {}
74
+ }
75
+ # Models to add.
76
+
77
+ #"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
78
+ #"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
79
+ #"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
80
+ #"nfl.gazetteer" - path towards the gazetteer for the NFL domain
81
+ #"nfl.relation.model" - path towards the NFL relation extraction model
82
+ }
83
+
84
+ # List of annotations by JAVA class path.
85
+ Annotations = {
86
+
87
+ 'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
88
+ 'AdjectivalModifierGRAnnotation',
89
+ 'AdverbialModifierGRAnnotation',
90
+ 'ArgumentGRAnnotation',
91
+ 'AspectMarkerGRAnnotation',
92
+ 'AssociativeMarkerGRAnnotation',
93
+ 'AssociativeModifierGRAnnotation',
94
+ 'AttributiveGRAnnotation',
95
+ 'AuxModifierGRAnnotation',
96
+ 'AuxPassiveGRAnnotation',
97
+ 'BaGRAnnotation',
98
+ 'ClausalComplementGRAnnotation',
99
+ 'ClausalSubjectGRAnnotation',
100
+ 'ClauseModifierGRAnnotation',
101
+ 'ComplementGRAnnotation',
102
+ 'ComplementizerGRAnnotation',
103
+ 'ControllingSubjectGRAnnotation',
104
+ 'CoordinationGRAnnotation',
105
+ 'DeterminerGRAnnotation',
106
+ 'DirectObjectGRAnnotation',
107
+ 'DvpMarkerGRAnnotation',
108
+ 'DvpModifierGRAnnotation',
109
+ 'EtcGRAnnotation',
110
+ 'LocalizerComplementGRAnnotation',
111
+ 'ModalGRAnnotation',
112
+ 'ModifierGRAnnotation',
113
+ 'NegationModifierGRAnnotation',
114
+ 'NominalPassiveSubjectGRAnnotation',
115
+ 'NominalSubjectGRAnnotation',
116
+ 'NounCompoundModifierGRAnnotation',
117
+ 'NumberModifierGRAnnotation',
118
+ 'NumericModifierGRAnnotation',
119
+ 'ObjectGRAnnotation',
120
+ 'OrdNumberGRAnnotation',
121
+ 'ParentheticalGRAnnotation',
122
+ 'ParticipialModifierGRAnnotation',
123
+ 'PreconjunctGRAnnotation',
124
+ 'PrepositionalLocalizerModifierGRAnnotation',
125
+ 'PrepositionalModifierGRAnnotation',
126
+ 'PrepositionalObjectGRAnnotation',
127
+ 'PunctuationGRAnnotation',
128
+ 'RangeGRAnnotation',
129
+ 'RelativeClauseModifierGRAnnotation',
130
+ 'ResultativeComplementGRAnnotation',
131
+ 'SemanticDependentGRAnnotation',
132
+ 'SubjectGRAnnotation',
133
+ 'TemporalClauseGRAnnotation',
134
+ 'TemporalGRAnnotation',
135
+ 'TimePostpositionGRAnnotation',
136
+ 'TopicGRAnnotation',
137
+ 'VerbCompoundGRAnnotation',
138
+ 'VerbModifierGRAnnotation',
139
+ 'XClausalComplementGRAnnotation'
140
+ ],
141
+
142
+ 'nlp.dcoref.CoNLL2011DocumentReader' => [
143
+ 'CorefMentionAnnotation',
144
+ 'NamedEntityAnnotation'
145
+ ],
146
+
147
+ 'nlp.ling.CoreAnnotations' => [
148
+
149
+ 'AbbrAnnotation',
150
+ 'AbgeneAnnotation',
151
+ 'AbstrAnnotation',
152
+ 'AfterAnnotation',
153
+ 'AnswerAnnotation',
154
+ 'AnswerObjectAnnotation',
155
+ 'AntecedentAnnotation',
156
+ 'ArgDescendentAnnotation',
157
+ 'ArgumentAnnotation',
158
+ 'BagOfWordsAnnotation',
159
+ 'BeAnnotation',
160
+ 'BeforeAnnotation',
161
+ 'BeginIndexAnnotation',
162
+ 'BestCliquesAnnotation',
163
+ 'BestFullAnnotation',
164
+ 'CalendarAnnotation',
165
+ 'CategoryAnnotation',
166
+ 'CategoryFunctionalTagAnnotation',
167
+ 'CharacterOffsetBeginAnnotation',
168
+ 'CharacterOffsetEndAnnotation',
169
+ 'CharAnnotation',
170
+ 'ChineseCharAnnotation',
171
+ 'ChineseIsSegmentedAnnotation',
172
+ 'ChineseOrigSegAnnotation',
173
+ 'ChineseSegAnnotation',
174
+ 'ChunkAnnotation',
175
+ 'CoarseTagAnnotation',
176
+ 'CommonWordsAnnotation',
177
+ 'CoNLLDepAnnotation',
178
+ 'CoNLLDepParentIndexAnnotation',
179
+ 'CoNLLDepTypeAnnotation',
180
+ 'CoNLLPredicateAnnotation',
181
+ 'CoNLLSRLAnnotation',
182
+ 'ContextsAnnotation',
183
+ 'CopyAnnotation',
184
+ 'CostMagnificationAnnotation',
185
+ 'CovertIDAnnotation',
186
+ 'D2_LBeginAnnotation',
187
+ 'D2_LEndAnnotation',
188
+ 'D2_LMiddleAnnotation',
189
+ 'DayAnnotation',
190
+ 'DependentsAnnotation',
191
+ 'DictAnnotation',
192
+ 'DistSimAnnotation',
193
+ 'DoAnnotation',
194
+ 'DocDateAnnotation',
195
+ 'DocIDAnnotation',
196
+ 'DomainAnnotation',
197
+ 'EndIndexAnnotation',
198
+ 'EntityClassAnnotation',
199
+ 'EntityRuleAnnotation',
200
+ 'EntityTypeAnnotation',
201
+ 'FeaturesAnnotation',
202
+ 'FemaleGazAnnotation',
203
+ 'FirstChildAnnotation',
204
+ 'ForcedSentenceEndAnnotation',
205
+ 'FreqAnnotation',
206
+ 'GazAnnotation',
207
+ 'GazetteerAnnotation',
208
+ 'GenericTokensAnnotation',
209
+ 'GeniaAnnotation',
210
+ 'GoldAnswerAnnotation',
211
+ 'GovernorAnnotation',
212
+ 'GrandparentAnnotation',
213
+ 'HaveAnnotation',
214
+ 'HeadWordStringAnnotation',
215
+ 'HeightAnnotation',
216
+ 'IDAnnotation',
217
+ 'IDFAnnotation',
218
+ 'INAnnotation',
219
+ 'IndexAnnotation',
220
+ 'InterpretationAnnotation',
221
+ 'IsDateRangeAnnotation',
222
+ 'IsURLAnnotation',
223
+ 'LabelAnnotation',
224
+ 'LastGazAnnotation',
225
+ 'LastTaggedAnnotation',
226
+ 'LBeginAnnotation',
227
+ 'LeftChildrenNodeAnnotation',
228
+ 'LeftTermAnnotation',
229
+ 'LemmaAnnotation',
230
+ 'LEndAnnotation',
231
+ 'LengthAnnotation',
232
+ 'LMiddleAnnotation',
233
+ 'MaleGazAnnotation',
234
+ 'MarkingAnnotation',
235
+ 'MonthAnnotation',
236
+ 'MorphoCaseAnnotation',
237
+ 'MorphoGenAnnotation',
238
+ 'MorphoNumAnnotation',
239
+ 'MorphoPersAnnotation',
240
+ 'NamedEntityTagAnnotation',
241
+ 'NeighborsAnnotation',
242
+ 'NERIDAnnotation',
243
+ 'NormalizedNamedEntityTagAnnotation',
244
+ 'NotAnnotation',
245
+ 'NumericCompositeObjectAnnotation',
246
+ 'NumericCompositeTypeAnnotation',
247
+ 'NumericCompositeValueAnnotation',
248
+ 'NumericObjectAnnotation',
249
+ 'NumericTypeAnnotation',
250
+ 'NumericValueAnnotation',
251
+ 'NumerizedTokensAnnotation',
252
+ 'NumTxtSentencesAnnotation',
253
+ 'OriginalAnswerAnnotation',
254
+ 'OriginalCharAnnotation',
255
+ 'OriginalTextAnnotation',
256
+ 'ParagraphAnnotation',
257
+ 'ParagraphsAnnotation',
258
+ 'ParaPositionAnnotation',
259
+ 'ParentAnnotation',
260
+ 'PartOfSpeechAnnotation',
261
+ 'PercentAnnotation',
262
+ 'PhraseWordsAnnotation',
263
+ 'PhraseWordsTagAnnotation',
264
+ 'PolarityAnnotation',
265
+ 'PositionAnnotation',
266
+ 'PossibleAnswersAnnotation',
267
+ 'PredictedAnswerAnnotation',
268
+ 'PrevChildAnnotation',
269
+ 'PriorAnnotation',
270
+ 'ProjectedCategoryAnnotation',
271
+ 'ProtoAnnotation',
272
+ 'RoleAnnotation',
273
+ 'SectionAnnotation',
274
+ 'SemanticHeadTagAnnotation',
275
+ 'SemanticHeadWordAnnotation',
276
+ 'SemanticTagAnnotation',
277
+ 'SemanticWordAnnotation',
278
+ 'SentenceIDAnnotation',
279
+ 'SentenceIndexAnnotation',
280
+ 'SentencePositionAnnotation',
281
+ 'SentencesAnnotation',
282
+ 'ShapeAnnotation',
283
+ 'SpaceBeforeAnnotation',
284
+ 'SpanAnnotation',
285
+ 'SpeakerAnnotation',
286
+ 'SRL_ID',
287
+ 'SRLIDAnnotation',
288
+ 'SRLInstancesAnnotation',
289
+ 'StackedNamedEntityTagAnnotation',
290
+ 'StateAnnotation',
291
+ 'StemAnnotation',
292
+ 'SubcategorizationAnnotation',
293
+ 'TagLabelAnnotation',
294
+ 'TextAnnotation',
295
+ 'TokenBeginAnnotation',
296
+ 'TokenEndAnnotation',
297
+ 'TokensAnnotation',
298
+ 'TopicAnnotation',
299
+ 'TrueCaseAnnotation',
300
+ 'TrueCaseTextAnnotation',
301
+ 'TrueTagAnnotation',
302
+ 'UBlockAnnotation',
303
+ 'UnaryAnnotation',
304
+ 'UnknownAnnotation',
305
+ 'UtteranceAnnotation',
306
+ 'UTypeAnnotation',
307
+ 'ValueAnnotation',
308
+ 'VerbSenseAnnotation',
309
+ 'WebAnnotation',
310
+ 'WordFormAnnotation',
311
+ 'WordnetSynAnnotation',
312
+ 'WordPositionAnnotation',
313
+ 'WordSenseAnnotation',
314
+ 'XmlContextAnnotation',
315
+ 'XmlElementAnnotation',
316
+ 'YearAnnotation'
317
+ ],
318
+
319
+ 'nlp.dcoref.CorefCoreAnnotations' => [
320
+
321
+ 'CorefAnnotation',
322
+ 'CorefChainAnnotation',
323
+ 'CorefClusterAnnotation',
324
+ 'CorefClusterIdAnnotation',
325
+ 'CorefDestAnnotation',
326
+ 'CorefGraphAnnotation'
327
+ ],
328
+
329
+ 'nlp.ling.CoreLabel' => [
330
+ 'GenericAnnotation'
331
+ ],
332
+
333
+ 'nlp.trees.EnglishGrammaticalRelations' => [
334
+ 'AbbreviationModifierGRAnnotation',
335
+ 'AdjectivalComplementGRAnnotation',
336
+ 'AdjectivalModifierGRAnnotation',
337
+ 'AdvClauseModifierGRAnnotation',
338
+ 'AdverbialModifierGRAnnotation',
339
+ 'AgentGRAnnotation',
340
+ 'AppositionalModifierGRAnnotation',
341
+ 'ArgumentGRAnnotation',
342
+ 'AttributiveGRAnnotation',
343
+ 'AuxModifierGRAnnotation',
344
+ 'AuxPassiveGRAnnotation',
345
+ 'ClausalComplementGRAnnotation',
346
+ 'ClausalPassiveSubjectGRAnnotation',
347
+ 'ClausalSubjectGRAnnotation',
348
+ 'ComplementGRAnnotation',
349
+ 'ComplementizerGRAnnotation',
350
+ 'ConjunctGRAnnotation',
351
+ 'ControllingSubjectGRAnnotation',
352
+ 'CoordinationGRAnnotation',
353
+ 'CopulaGRAnnotation',
354
+ 'DeterminerGRAnnotation',
355
+ 'DirectObjectGRAnnotation',
356
+ 'ExpletiveGRAnnotation',
357
+ 'IndirectObjectGRAnnotation',
358
+ 'InfinitivalModifierGRAnnotation',
359
+ 'MarkerGRAnnotation',
360
+ 'ModifierGRAnnotation',
361
+ 'MultiWordExpressionGRAnnotation',
362
+ 'NegationModifierGRAnnotation',
363
+ 'NominalPassiveSubjectGRAnnotation',
364
+ 'NominalSubjectGRAnnotation',
365
+ 'NounCompoundModifierGRAnnotation',
366
+ 'NpAdverbialModifierGRAnnotation',
367
+ 'NumberModifierGRAnnotation',
368
+ 'NumericModifierGRAnnotation',
369
+ 'ObjectGRAnnotation',
370
+ 'ParataxisGRAnnotation',
371
+ 'ParticipialModifierGRAnnotation',
372
+ 'PhrasalVerbParticleGRAnnotation',
373
+ 'PossessionModifierGRAnnotation',
374
+ 'PossessiveModifierGRAnnotation',
375
+ 'PreconjunctGRAnnotation',
376
+ 'PredeterminerGRAnnotation',
377
+ 'PredicateGRAnnotation',
378
+ 'PrepositionalComplementGRAnnotation',
379
+ 'PrepositionalModifierGRAnnotation',
380
+ 'PrepositionalObjectGRAnnotation',
381
+ 'PunctuationGRAnnotation',
382
+ 'PurposeClauseModifierGRAnnotation',
383
+ 'QuantifierModifierGRAnnotation',
384
+ 'ReferentGRAnnotation',
385
+ 'RelativeClauseModifierGRAnnotation',
386
+ 'RelativeGRAnnotation',
387
+ 'SemanticDependentGRAnnotation',
388
+ 'SubjectGRAnnotation',
389
+ 'TemporalModifierGRAnnotation',
390
+ 'XClausalComplementGRAnnotation'
391
+ ],
392
+
393
+ 'nlp.trees.GrammaticalRelation' => [
394
+ 'DependentGRAnnotation',
395
+ 'GovernorGRAnnotation',
396
+ 'GrammaticalRelationAnnotation',
397
+ 'KillGRAnnotation',
398
+ 'Language',
399
+ 'RootGRAnnotation'
400
+ ],
401
+
402
+ 'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
403
+ 'DependencyAnnotation',
404
+ 'DocumentDirectoryAnnotation',
405
+ 'DocumentIdAnnotation',
406
+ 'EntityMentionsAnnotation',
407
+ 'EventMentionsAnnotation',
408
+ 'GenderAnnotation',
409
+ 'RelationMentionsAnnotation',
410
+ 'TriggerAnnotation'
411
+ ],
412
+
413
+ 'nlp.parser.lexparser.ParserAnnotations' => [
414
+ 'ConstraintAnnotation'
415
+ ],
416
+
417
+ 'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
418
+ 'BasicDependenciesAnnotation',
419
+ 'CollapsedCCProcessedDependenciesAnnotation',
420
+ 'CollapsedDependenciesAnnotation'
421
+ ],
422
+
423
+ 'nlp.time.TimeAnnotations' => [
424
+ 'TimexAnnotation',
425
+ 'TimexAnnotations'
426
+ ],
427
+
428
+ 'nlp.time.TimeExpression' => [
429
+ 'Annotation',
430
+ 'ChildrenAnnotation'
431
+ ],
432
+
433
+ 'nlp.trees.TreeCoreAnnotations' => [
434
+ 'TreeHeadTagAnnotation',
435
+ 'TreeHeadWordAnnotation',
436
+ 'TreeAnnotation'
437
+ ]
438
+ }
439
+
440
+ # Create a list of annotation names => paths.
441
+ annotations_by_name = {}
442
+ Annotations.each do |base_class, annotation_classes|
443
+ annotation_classes.each do |annotation_class|
444
+ annotations_by_name[annotation_class] ||= []
445
+ annotations_by_name[annotation_class] << base_class
446
+ end
447
+ end
448
+
449
+ # Hash of name => path.
450
+ AnnotationsByName = annotations_by_name
451
+
452
+ end
453
+ end
@@ -18,5 +18,32 @@ module StanfordCoreNLP
18
18
  end
19
19
  end
20
20
 
21
+ # Dynamically defined on all proxied annotation classes.
22
+ # Get an annotation using the annotation bridge.
23
+ def get(annotation, anno_base = nil)
24
+ if !java_methods.include?('get(Ljava.lang.Class;)')
25
+ raise'No annotation can be retrieved on this object.'
26
+ else
27
+ anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
28
+ if anno_base
29
+ raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
30
+ anno_bases = [anno_base]
31
+ else
32
+ anno_bases = Config::AnnotationsByName[anno_class]
33
+ raise "The annotation #{anno_class} doesn't exist." unless anno_bases
34
+ end
35
+ if anno_bases.size > 1
36
+ msg = "There are many different annotations bearing the name #{anno_class}. "
37
+ msg << "Please specify one of the following base classes as second parameter to disambiguate: "
38
+ msg << anno_bases.join(',')
39
+ raise msg
40
+ else
41
+ base_class = anno_bases[0]
42
+ end
43
+ url = "edu.stanford.#{base_class}$#{anno_class}"
44
+ AnnotationBridge.getAnnotation(self, url)
45
+ end
46
+ end
47
+
21
48
  end
22
49
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: stanford-core-nlp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.5
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-01-31 00:00:00.000000000 Z
12
+ date: 2012-02-04 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rjb
16
- requirement: &70226234873780 !ruby/object:Gem::Requirement
16
+ requirement: &70191057037760 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70226234873780
24
+ version_requirements: *70191057037760
25
25
  description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
26
26
  language processing \ntools for English, including tokenization, part-of-speech
27
27
  tagging, lemmatization, named entity recognition,\nparsing, and coreference resolution. "
@@ -31,9 +31,9 @@ executables: []
31
31
  extensions: []
32
32
  extra_rdoc_files: []
33
33
  files:
34
+ - lib/stanford-core-nlp/config.rb
34
35
  - lib/stanford-core-nlp/jar_loader.rb
35
36
  - lib/stanford-core-nlp/java_wrapper.rb
36
- - lib/stanford-core-nlp/stanford_annotations.rb
37
37
  - lib/stanford-core-nlp.rb
38
38
  - bin/bridge.jar
39
39
  - bin/INFO
@@ -1,401 +0,0 @@
1
- module StanfordCoreNLP
2
-
3
- # @private
4
- Annotations = {
5
-
6
- 'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
7
- 'AdjectivalModifierGRAnnotation',
8
- 'AdverbialModifierGRAnnotation',
9
- 'ArgumentGRAnnotation',
10
- 'AspectMarkerGRAnnotation',
11
- 'AssociativeMarkerGRAnnotation',
12
- 'AssociativeModifierGRAnnotation',
13
- 'AttributiveGRAnnotation',
14
- 'AuxModifierGRAnnotation',
15
- 'AuxPassiveGRAnnotation',
16
- 'BaGRAnnotation',
17
- 'ClausalComplementGRAnnotation',
18
- 'ClausalSubjectGRAnnotation',
19
- 'ClauseModifierGRAnnotation',
20
- 'ComplementGRAnnotation',
21
- 'ComplementizerGRAnnotation',
22
- 'ControllingSubjectGRAnnotation',
23
- 'CoordinationGRAnnotation',
24
- 'DeterminerGRAnnotation',
25
- 'DirectObjectGRAnnotation',
26
- 'DvpMarkerGRAnnotation',
27
- 'DvpModifierGRAnnotation',
28
- 'EtcGRAnnotation',
29
- 'LocalizerComplementGRAnnotation',
30
- 'ModalGRAnnotation',
31
- 'ModifierGRAnnotation',
32
- 'NegationModifierGRAnnotation',
33
- 'NominalPassiveSubjectGRAnnotation',
34
- 'NominalSubjectGRAnnotation',
35
- 'NounCompoundModifierGRAnnotation',
36
- 'NumberModifierGRAnnotation',
37
- 'NumericModifierGRAnnotation',
38
- 'ObjectGRAnnotation',
39
- 'OrdNumberGRAnnotation',
40
- 'ParentheticalGRAnnotation',
41
- 'ParticipialModifierGRAnnotation',
42
- 'PreconjunctGRAnnotation',
43
- 'PrepositionalLocalizerModifierGRAnnotation',
44
- 'PrepositionalModifierGRAnnotation',
45
- 'PrepositionalObjectGRAnnotation',
46
- 'PunctuationGRAnnotation',
47
- 'RangeGRAnnotation',
48
- 'RelativeClauseModifierGRAnnotation',
49
- 'ResultativeComplementGRAnnotation',
50
- 'SemanticDependentGRAnnotation',
51
- 'SubjectGRAnnotation',
52
- 'TemporalClauseGRAnnotation',
53
- 'TemporalGRAnnotation',
54
- 'TimePostpositionGRAnnotation',
55
- 'TopicGRAnnotation',
56
- 'VerbCompoundGRAnnotation',
57
- 'VerbModifierGRAnnotation',
58
- 'XClausalComplementGRAnnotation'
59
- ],
60
-
61
- 'nlp.dcoref.CoNLL2011DocumentReader' => [
62
- 'CorefMentionAnnotation',
63
- 'NamedEntityAnnotation'
64
- ],
65
-
66
- 'nlp.ling.CoreAnnotations' => [
67
-
68
- 'AbbrAnnotation',
69
- 'AbgeneAnnotation',
70
- 'AbstrAnnotation',
71
- 'AfterAnnotation',
72
- 'AnswerAnnotation',
73
- 'AnswerObjectAnnotation',
74
- 'AntecedentAnnotation',
75
- 'ArgDescendentAnnotation',
76
- 'ArgumentAnnotation',
77
- 'BagOfWordsAnnotation',
78
- 'BeAnnotation',
79
- 'BeforeAnnotation',
80
- 'BeginIndexAnnotation',
81
- 'BestCliquesAnnotation',
82
- 'BestFullAnnotation',
83
- 'CalendarAnnotation',
84
- 'CategoryAnnotation',
85
- 'CategoryFunctionalTagAnnotation',
86
- 'CharacterOffsetBeginAnnotation',
87
- 'CharacterOffsetEndAnnotation',
88
- 'CharAnnotation',
89
- 'ChineseCharAnnotation',
90
- 'ChineseIsSegmentedAnnotation',
91
- 'ChineseOrigSegAnnotation',
92
- 'ChineseSegAnnotation',
93
- 'ChunkAnnotation',
94
- 'CoarseTagAnnotation',
95
- 'CommonWordsAnnotation',
96
- 'CoNLLDepAnnotation',
97
- 'CoNLLDepParentIndexAnnotation',
98
- 'CoNLLDepTypeAnnotation',
99
- 'CoNLLPredicateAnnotation',
100
- 'CoNLLSRLAnnotation',
101
- 'ContextsAnnotation',
102
- 'CopyAnnotation',
103
- 'CostMagnificationAnnotation',
104
- 'CovertIDAnnotation',
105
- 'D2_LBeginAnnotation',
106
- 'D2_LEndAnnotation',
107
- 'D2_LMiddleAnnotation',
108
- 'DayAnnotation',
109
- 'DependentsAnnotation',
110
- 'DictAnnotation',
111
- 'DistSimAnnotation',
112
- 'DoAnnotation',
113
- 'DocDateAnnotation',
114
- 'DocIDAnnotation',
115
- 'DomainAnnotation',
116
- 'EndIndexAnnotation',
117
- 'EntityClassAnnotation',
118
- 'EntityRuleAnnotation',
119
- 'EntityTypeAnnotation',
120
- 'FeaturesAnnotation',
121
- 'FemaleGazAnnotation',
122
- 'FirstChildAnnotation',
123
- 'ForcedSentenceEndAnnotation',
124
- 'FreqAnnotation',
125
- 'GazAnnotation',
126
- 'GazetteerAnnotation',
127
- 'GenericTokensAnnotation',
128
- 'GeniaAnnotation',
129
- 'GoldAnswerAnnotation',
130
- 'GovernorAnnotation',
131
- 'GrandparentAnnotation',
132
- 'HaveAnnotation',
133
- 'HeadWordStringAnnotation',
134
- 'HeightAnnotation',
135
- 'IDAnnotation',
136
- 'IDFAnnotation',
137
- 'INAnnotation',
138
- 'IndexAnnotation',
139
- 'InterpretationAnnotation',
140
- 'IsDateRangeAnnotation',
141
- 'IsURLAnnotation',
142
- 'LabelAnnotation',
143
- 'LastGazAnnotation',
144
- 'LastTaggedAnnotation',
145
- 'LBeginAnnotation',
146
- 'LeftChildrenNodeAnnotation',
147
- 'LeftTermAnnotation',
148
- 'LemmaAnnotation',
149
- 'LEndAnnotation',
150
- 'LengthAnnotation',
151
- 'LMiddleAnnotation',
152
- 'MaleGazAnnotation',
153
- 'MarkingAnnotation',
154
- 'MonthAnnotation',
155
- 'MorphoCaseAnnotation',
156
- 'MorphoGenAnnotation',
157
- 'MorphoNumAnnotation',
158
- 'MorphoPersAnnotation',
159
- 'NamedEntityTagAnnotation',
160
- 'NeighborsAnnotation',
161
- 'NERIDAnnotation',
162
- 'NormalizedNamedEntityTagAnnotation',
163
- 'NotAnnotation',
164
- 'NumericCompositeObjectAnnotation',
165
- 'NumericCompositeTypeAnnotation',
166
- 'NumericCompositeValueAnnotation',
167
- 'NumericObjectAnnotation',
168
- 'NumericTypeAnnotation',
169
- 'NumericValueAnnotation',
170
- 'NumerizedTokensAnnotation',
171
- 'NumTxtSentencesAnnotation',
172
- 'OriginalAnswerAnnotation',
173
- 'OriginalCharAnnotation',
174
- 'OriginalTextAnnotation',
175
- 'ParagraphAnnotation',
176
- 'ParagraphsAnnotation',
177
- 'ParaPositionAnnotation',
178
- 'ParentAnnotation',
179
- 'PartOfSpeechAnnotation',
180
- 'PercentAnnotation',
181
- 'PhraseWordsAnnotation',
182
- 'PhraseWordsTagAnnotation',
183
- 'PolarityAnnotation',
184
- 'PositionAnnotation',
185
- 'PossibleAnswersAnnotation',
186
- 'PredictedAnswerAnnotation',
187
- 'PrevChildAnnotation',
188
- 'PriorAnnotation',
189
- 'ProjectedCategoryAnnotation',
190
- 'ProtoAnnotation',
191
- 'RoleAnnotation',
192
- 'SectionAnnotation',
193
- 'SemanticHeadTagAnnotation',
194
- 'SemanticHeadWordAnnotation',
195
- 'SemanticTagAnnotation',
196
- 'SemanticWordAnnotation',
197
- 'SentenceIDAnnotation',
198
- 'SentenceIndexAnnotation',
199
- 'SentencePositionAnnotation',
200
- 'SentencesAnnotation',
201
- 'ShapeAnnotation',
202
- 'SpaceBeforeAnnotation',
203
- 'SpanAnnotation',
204
- 'SpeakerAnnotation',
205
- 'SRL_ID',
206
- 'SRLIDAnnotation',
207
- 'SRLInstancesAnnotation',
208
- 'StackedNamedEntityTagAnnotation',
209
- 'StateAnnotation',
210
- 'StemAnnotation',
211
- 'SubcategorizationAnnotation',
212
- 'TagLabelAnnotation',
213
- 'TextAnnotation',
214
- 'TokenBeginAnnotation',
215
- 'TokenEndAnnotation',
216
- 'TokensAnnotation',
217
- 'TopicAnnotation',
218
- 'TrueCaseAnnotation',
219
- 'TrueCaseTextAnnotation',
220
- 'TrueTagAnnotation',
221
- 'UBlockAnnotation',
222
- 'UnaryAnnotation',
223
- 'UnknownAnnotation',
224
- 'UtteranceAnnotation',
225
- 'UTypeAnnotation',
226
- 'ValueAnnotation',
227
- 'VerbSenseAnnotation',
228
- 'WebAnnotation',
229
- 'WordFormAnnotation',
230
- 'WordnetSynAnnotation',
231
- 'WordPositionAnnotation',
232
- 'WordSenseAnnotation',
233
- 'XmlContextAnnotation',
234
- 'XmlElementAnnotation',
235
- 'YearAnnotation'
236
- ],
237
-
238
- 'nlp.dcoref.CorefCoreAnnotations' => [
239
-
240
- 'CorefAnnotation',
241
- 'CorefChainAnnotation',
242
- 'CorefClusterAnnotation',
243
- 'CorefClusterIdAnnotation',
244
- 'CorefDestAnnotation',
245
- 'CorefGraphAnnotation'
246
- ],
247
-
248
- 'nlp.ling.CoreLabel' => [
249
- 'GenericAnnotation'
250
- ],
251
-
252
- 'nlp.trees.EnglishGrammaticalRelations' => [
253
- 'AbbreviationModifierGRAnnotation',
254
- 'AdjectivalComplementGRAnnotation',
255
- 'AdjectivalModifierGRAnnotation',
256
- 'AdvClauseModifierGRAnnotation',
257
- 'AdverbialModifierGRAnnotation',
258
- 'AgentGRAnnotation',
259
- 'AppositionalModifierGRAnnotation',
260
- 'ArgumentGRAnnotation',
261
- 'AttributiveGRAnnotation',
262
- 'AuxModifierGRAnnotation',
263
- 'AuxPassiveGRAnnotation',
264
- 'ClausalComplementGRAnnotation',
265
- 'ClausalPassiveSubjectGRAnnotation',
266
- 'ClausalSubjectGRAnnotation',
267
- 'ComplementGRAnnotation',
268
- 'ComplementizerGRAnnotation',
269
- 'ConjunctGRAnnotation',
270
- 'ControllingSubjectGRAnnotation',
271
- 'CoordinationGRAnnotation',
272
- 'CopulaGRAnnotation',
273
- 'DeterminerGRAnnotation',
274
- 'DirectObjectGRAnnotation',
275
- 'ExpletiveGRAnnotation',
276
- 'IndirectObjectGRAnnotation',
277
- 'InfinitivalModifierGRAnnotation',
278
- 'MarkerGRAnnotation',
279
- 'ModifierGRAnnotation',
280
- 'MultiWordExpressionGRAnnotation',
281
- 'NegationModifierGRAnnotation',
282
- 'NominalPassiveSubjectGRAnnotation',
283
- 'NominalSubjectGRAnnotation',
284
- 'NounCompoundModifierGRAnnotation',
285
- 'NpAdverbialModifierGRAnnotation',
286
- 'NumberModifierGRAnnotation',
287
- 'NumericModifierGRAnnotation',
288
- 'ObjectGRAnnotation',
289
- 'ParataxisGRAnnotation',
290
- 'ParticipialModifierGRAnnotation',
291
- 'PhrasalVerbParticleGRAnnotation',
292
- 'PossessionModifierGRAnnotation',
293
- 'PossessiveModifierGRAnnotation',
294
- 'PreconjunctGRAnnotation',
295
- 'PredeterminerGRAnnotation',
296
- 'PredicateGRAnnotation',
297
- 'PrepositionalComplementGRAnnotation',
298
- 'PrepositionalModifierGRAnnotation',
299
- 'PrepositionalObjectGRAnnotation',
300
- 'PunctuationGRAnnotation',
301
- 'PurposeClauseModifierGRAnnotation',
302
- 'QuantifierModifierGRAnnotation',
303
- 'ReferentGRAnnotation',
304
- 'RelativeClauseModifierGRAnnotation',
305
- 'RelativeGRAnnotation',
306
- 'SemanticDependentGRAnnotation',
307
- 'SubjectGRAnnotation',
308
- 'TemporalModifierGRAnnotation',
309
- 'XClausalComplementGRAnnotation'
310
- ],
311
-
312
- 'nlp.trees.GrammaticalRelation' => [
313
- 'DependentGRAnnotation',
314
- 'GovernorGRAnnotation',
315
- 'GrammaticalRelationAnnotation',
316
- 'KillGRAnnotation',
317
- 'Language',
318
- 'RootGRAnnotation'
319
- ],
320
-
321
- 'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
322
- 'DependencyAnnotation',
323
- 'DocumentDirectoryAnnotation',
324
- 'DocumentIdAnnotation',
325
- 'EntityMentionsAnnotation',
326
- 'EventMentionsAnnotation',
327
- 'GenderAnnotation',
328
- 'RelationMentionsAnnotation',
329
- 'TriggerAnnotation'
330
- ],
331
-
332
- 'nlp.parser.lexparser.ParserAnnotations' => [
333
- 'ConstraintAnnotation'
334
- ],
335
-
336
- 'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
337
- 'BasicDependenciesAnnotation',
338
- 'CollapsedCCProcessedDependenciesAnnotation',
339
- 'CollapsedDependenciesAnnotation'
340
- ],
341
-
342
- 'nlp.time.TimeAnnotations' => [
343
- 'TimexAnnotation',
344
- 'TimexAnnotations'
345
- ],
346
-
347
- 'nlp.time.TimeExpression' => [
348
- 'Annotation',
349
- 'ChildrenAnnotation'
350
- ],
351
-
352
- 'nlp.trees.TreeCoreAnnotations' => [
353
- 'TreeHeadTagAnnotation',
354
- 'TreeHeadWordAnnotation',
355
- 'TreeAnnotation'
356
- ]
357
- }
358
-
359
- annotations_by_name = {}
360
- Annotations.each do |base_class, annotation_classes|
361
- annotation_classes.each do |annotation_class|
362
- annotations_by_name[annotation_class] ||= []
363
- annotations_by_name[annotation_class] << base_class
364
- end
365
- end
366
-
367
- AnnotationsByName = annotations_by_name
368
-
369
- # Modify the Rjb JavaProxy class to add our own method to get annotations.
370
- Rjb::Rjb_JavaProxy.class_eval do
371
-
372
- # Dynamically defined on all proxied annotation classes.
373
- # Get an annotation using the annotation bridge.
374
- def get(annotation, anno_base = nil)
375
- if !java_methods.include?('get(Ljava.lang.Class;)')
376
- raise'No annotation can be retrieved on this object.'
377
- else
378
- anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
379
- if anno_base
380
- raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
381
- anno_bases = [anno_base]
382
- else
383
- anno_bases = AnnotationsByName[anno_class]
384
- raise "The annotation #{anno_class} doesn't exist." unless anno_bases
385
- end
386
- if anno_bases.size > 1
387
- msg = "There are many different annotations bearing the name #{anno_class}. "
388
- msg << "Please specify one of the following base classes as second parameter to disambiguate: "
389
- msg << anno_bases.join(',')
390
- raise msg
391
- else
392
- base_class = anno_bases[0]
393
- end
394
- url = "edu.stanford.#{base_class}$#{anno_class}"
395
- AnnotationBridge.getAnnotation(self, url)
396
- end
397
- end
398
-
399
- end
400
-
401
- end