stanford-core-nlp 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.markdown CHANGED
@@ -1,12 +1,12 @@
1
1
  **About**
2
2
 
3
- This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for English, including tokenization, part-of-speech tagging, lemmatization, named entity recognition, parsing, and coreference resolution.
3
+ This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools that features tokenization, part-of-speech tagging, lemmatization, and parsing for five languages (English, French, German, Arabic and Chinese), as well as named entity recognition and coreference resolution for English.
4
4
 
5
5
  **Installing**
6
6
 
7
7
  1. Install the gem: `gem install stanford-core-nlp`.
8
8
 
9
- 2. Download the Stanford Core NLP JAR and model files [here](http://louismullie.com/stanford-core-nlp-english.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/). This package only includes model files for English; see below for information on adding model files for other languages.
9
+ 2. Download the Stanford Core NLP JAR and model files. Two package are available with the necessary files: a package for [English only](http://louismullie.com/stanford-core-nlp-english.zip), or a package with models for [all languages](http://louismullie.com/stanford-core-nlp-all.zip). Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (typically this is /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/).
10
10
 
11
11
  **Configuration**
12
12
 
@@ -23,18 +23,12 @@ After installing and requiring the gem (`require 'stanford-core-nlp'`), you may
23
23
  # Redirect VM output to log.txt
24
24
  StanfordCoreNLP.log_file = 'log.txt'
25
25
 
26
- You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
27
-
28
- # Default base class is edu.stanford.nlp.pipeline.
29
- StanfordCoreNLP.load('PTBTokenizerAnnotator')
30
- puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
31
- # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
32
-
33
- # Here, we specify another base class.
34
- StanfordCoreNLP.load('MaxentTagger', 'edu.stanford.nlp.tagger')
35
- puts StanfordCoreNLP::MaxentTagger.inspect
36
- # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
26
+ # Use the model files for a different language than English.
27
+ StanfordCoreNLP.use(:french)
37
28
 
29
+ # Change a specific model file.
30
+ StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
31
+
38
32
  **Using the gem**
39
33
 
40
34
  text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
@@ -64,22 +58,27 @@ You may also want to load your own classes from the Stanford NLP to do more spec
64
58
  end
65
59
  end
66
60
 
67
- A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'stanford_annotations.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding ot a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
61
+ > Note: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Text class.
68
62
 
69
- **Adding models for other languages for the parser and tagger**
63
+ A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the 'config.rb' file inside the gem. The Ruby symbol (e.g. :named_entity_tag) corresponding to a Java annotation class follows the simple un-camel-casing convention, with 'Annotation' at the end removed. For example, the annotation NamedEntityTagAnnotation translates to :named_entity_tag, PartOfSpeechAnnotation to :part_of_speech, etc.
70
64
 
71
- - For the Stanford Parser, download the [parser files](http://nlp.stanford.edu/software/lex-parser.shtml), and copy from the grammar/ directory the grammars you need into the gem's bin/grammar directory (e.g. /usr/local/lib/ruby/gems/1.9.1/gems/stanford-core-nlp-0.x/bin/grammar). Grammars are available for Arabic, Chinese, French, German and Xinhua.
72
- - For the Stanford Tagger, download the [tagger files](http://nlp.stanford.edu/software/tagger.shtml), and copy from the models/ directory the models you need into the gem's bin/models directory. Models are available for Arabic, Chinese, French and German.
65
+ **Loading specific classes**
73
66
 
74
- Then, configure the gem to use your newly added files, e.g.:
75
-
76
- StanfordCoreNLP.set_model('parser.model', '/path/to/gem/bin/grammar/chinesePCFG.ser.gz')
77
- StanfordCoreNLP.set_model('tagger.model', '/path/to/gem/bin/grammar/chinese.tagger')
78
- pipeline = StanfordCoreNLP.load(:ssplit, :tokenize, :pos, :parse)
67
+ You may also want to load your own classes from the Stanford NLP to do more specific tasks. The gem provides an API to do this:
68
+
69
+ # Default base class is edu.stanford.nlp.pipeline.
70
+ StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
71
+ puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
72
+ # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
73
+
74
+ # Here, we specify another base class.
75
+ StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
76
+ puts StanfordCoreNLP::MaxentTagger.inspect
77
+ # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
79
78
 
80
79
  **Current known issues**
81
80
 
82
- The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be very grateful if somebody could add/e-mail me these files.
81
+ The models included with the gem for the NER system are missing two files: "edu/stanford/nlp/models/dcoref/countries" and "edu/stanford/nlp/models/dcoref/statesandprovinces", which I couldn't find anywhere. I will be grateful if somebody could add/e-mail me these files.
83
82
 
84
83
  **Contributing**
85
84
 
data/bin/INFO CHANGED
@@ -1 +1 @@
1
- This is where you should put the JAR files.
1
+ This is where you should put the JAR files and the folders with the model files.
@@ -1,81 +1,135 @@
1
1
  module StanfordCoreNLP
2
2
 
3
- VERSION = '0.1.4'
4
- require 'stanford-core-nlp/jar_loader.rb'
3
+ VERSION = '0.1.5'
4
+ require 'stanford-core-nlp/jar_loader'
5
5
  require 'stanford-core-nlp/java_wrapper'
6
- require 'stanford-core-nlp/stanford_annotations'
7
-
6
+ require 'stanford-core-nlp/config'
7
+
8
8
  class << self
9
- # The path in which to look for the Stanford JAR files.
10
- # This is passed to JarLoader.
9
+ # The path in which to look for the Stanford JAR files,
10
+ # with a trailing slash.
11
+ #
12
+ # The structure of the JAR folder must be as follows:
13
+ #
14
+ # Files:
15
+ #
16
+ # /stanford-core-nlp.jar
17
+ # /joda-time.jar
18
+ # /xom.jar
19
+ # /bridge.jar*
20
+ #
21
+ # Folders:
22
+ #
23
+ # /classifiers # Models for the NER system.
24
+ # /dcoref # Models for the coreference resolver.
25
+ # /taggers # Models for the POS tagger.
26
+ # /grammar # Models for the parser.
27
+ #
28
+ # *The file bridge.jar is a thin JAVA wrapper over the
29
+ # Stanford Core NLP get() function, which allows to
30
+ # retrieve annotations using static classes as names.
31
+ # This works around one of the lacunae of Rjb.
11
32
  attr_accessor :jar_path
12
- # The flags for starting the JVM machine.
13
- # Parser and named entity recognizer are very memory consuming.
33
+ # The flags for starting the JVM machine. The parser
34
+ # and named entity recognizer are very memory consuming.
14
35
  attr_accessor :jvm_args
15
36
  # A file to redirect JVM output to.
16
37
  attr_accessor :log_file
17
- # The model files. Use #set_model to modify these.
38
+ # The model files for a given language.
18
39
  attr_accessor :model_files
19
40
  end
20
41
 
21
42
  # The default JAR path is the gem's bin folder.
22
43
  self.jar_path = File.dirname(__FILE__) + '/../bin/'
23
- # Load the JVM with a minimum heap size of 512MB and a
44
+ # Load the JVM with a minimum heap size of 512MB and a
24
45
  # maximum heap size of 1024MB.
25
46
  self.jvm_args = ['-Xms512M', '-Xmx1024M']
26
47
  # Turn logging off by default.
27
48
  self.log_file = nil
28
49
 
29
- # Default model files.
30
- self.model_files = {
31
- 'pos.model' => 'taggers/english-left3words-distsim.tagger',
32
- 'ner.model.3class' => 'classifiers/all.3class.distsim.crf.ser.gz',
33
- 'ner.model.7class' => 'classifiers/muc.7class.distsim.crf.ser.gz',
34
- 'ner.model.MISCclass' => 'classifiers/conll.4class.distsim.crf.ser.gz',
35
- 'parser.model' => 'grammar/englishPCFG.ser.gz',
36
- 'dcoref.demonym' => 'dcoref/demonyms.txt',
37
- 'dcoref.animate' => 'dcoref/animate.unigrams.txt',
38
- 'dcoref.female' => 'dcoref/female.unigrams.txt',
39
- 'dcoref.inanimate' => 'dcoref/inanimate.unigrams.txt',
40
- 'dcoref.male' => 'dcoref/male.unigrams.txt',
41
- 'dcoref.neutral' => 'dcoref/neutral.unigrams.txt',
42
- 'dcoref.plural' => 'dcoref/plural.unigrams.txt',
43
- 'dcoref.singular' => 'dcoref/singular.unigrams.txt',
44
- 'dcoref.states' => 'dcoref/state-abbreviations.txt',
45
- 'dcoref.countries' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
46
- 'dcoref.states.provinces' => 'dcoref/unknown.txt', # Fix - can somebody provide this file?
47
- 'dcoref.extra.gender' => 'dcoref/namegender.combine.txt'
48
- }
49
50
 
50
- # Whether the classes are initialized or not.
51
- @@initialized = false
52
- # Whether the jars are loaded or not.
53
- @@loaded = false
51
+ # Use models for a given language. Language can be
52
+ # supplied as full-length, or ISO-639 2 or 3 letter
53
+ # code (e.g. :english, :eng or :en will work).
54
+ def self.use(language)
55
+ lang = nil
56
+ self.model_files = {}
57
+ Config::LanguageCodes.each do |l,codes|
58
+ lang = codes[2] if codes.include?(language)
59
+ end
60
+ Config::Models.each do |n, languages|
61
+ models = languages[lang]
62
+ folder = Config::ModelFolders[n]
63
+ if models.is_a?(Hash)
64
+ n = n.to_s
65
+ n += '.model' if n == 'ner'
66
+ models.each do |m, file|
67
+ self.model_files["#{n}.#{m}"] =
68
+ folder + file
69
+ end
70
+ elsif models.is_a?(String)
71
+ self.model_files["#{n}.model"] =
72
+ folder + models
73
+ end
74
+ end
75
+ end
76
+
77
+ # Use english by default.
78
+ self.use(:english)
54
79
 
55
- # Set a model file.
80
+ # Set a model file. Here are the default models for English:
81
+ #
82
+ # 'pos.model' => 'english-left3words-distsim.tagger',
83
+ # 'ner.model.3class' => 'all.3class.distsim.crf.ser.gz',
84
+ # 'ner.model.7class' => 'muc.7class.distsim.crf.ser.gz',
85
+ # 'ner.model.MISCclass' => 'conll.4class.distsim.crf.ser.gz',
86
+ # 'parser.model' => 'englishPCFG.ser.gz',
87
+ # 'dcoref.demonym' => 'demonyms.txt',
88
+ # 'dcoref.animate' => 'animate.unigrams.txt',
89
+ # 'dcoref.female' => 'female.unigrams.txt',
90
+ # 'dcoref.inanimate' => 'inanimate.unigrams.txt',
91
+ # 'dcoref.male' => 'male.unigrams.txt',
92
+ # 'dcoref.neutral' => 'neutral.unigrams.txt',
93
+ # 'dcoref.plural' => 'plural.unigrams.txt',
94
+ # 'dcoref.singular' => 'singular.unigrams.txt',
95
+ # 'dcoref.states' => 'state-abbreviations.txt',
96
+ # 'dcoref.extra.gender' => 'namegender.combine.txt'
97
+ #
56
98
  def self.set_model(name, file)
57
- unless File.readable?(self.jar_path + file)
58
- raise "JAR file #{self.jar_path + file} could not be found." +
59
- "You may need to download this file manually and/or set paths properly."
60
- end
61
- self.model_files[name] = file
99
+ n = name.split('.')[0].intern
100
+ self.model_files[name] =
101
+ Config::ModelFolders[n] + file
62
102
  end
63
103
 
104
+ # Whether the classes are initialized or not.
105
+ @@initialized = false
106
+ # Whether the JAR files are loaded or not.
107
+ @@loaded = false
108
+
64
109
  # Load the JARs, create the classes.
65
110
  def self.init
66
111
  self.load_jars unless @@loaded
67
112
  self.create_classes
68
113
  @@initialized = true
69
114
  end
70
-
71
- # Load a StanfordCoreNLP pipeline with the specified JVM flags and
72
- # StanfordCoreNLP properties (hash of property => values).
115
+
116
+ # Load a StanfordCoreNLP pipeline with the
117
+ # specified JVM flags and StanfordCoreNLP
118
+ # properties.
73
119
  def self.load(*annotators)
74
120
  self.init unless @@initialized
75
121
  # Prepend the JAR path to the model files.
76
122
  properties = {}
77
- self.model_files.each { |k,v| properties[k] = self.jar_path + v }
78
- properties['annotators'] =
123
+ self.model_files.each do |k,v|
124
+ f = self.jar_path + v
125
+ unless File.readable?(f)
126
+ raise "Model file #{f} could not be found. " +
127
+ "You may need to download this file manually and/or set paths properly."
128
+ else
129
+ properties[k] = f
130
+ end
131
+ end
132
+ properties['annotators'] =
79
133
  annotators.map { |x| x.to_s }.join(', ')
80
134
  CoreNLP.new(get_properties(properties))
81
135
  end
@@ -101,17 +155,37 @@ module StanfordCoreNLP
101
155
  const_set(:Properties, Rjb::import('java.util.Properties'))
102
156
  const_set(:AnnotationBridge, Rjb::import('AnnotationBridge'))
103
157
  end
104
-
158
+
105
159
  # Load a class (e.g. PTBTokenizerAnnotator) in a specific
106
160
  # class path (default is 'edu.stanford.nlp.pipeline').
107
161
  # The class is then accessible under the StanfordCoreNLP
108
162
  # namespace, e.g. StanfordCoreNLP::PTBTokenizerAnnotator.
163
+ #
164
+ # List of annotators:
165
+ #
166
+ # - PTBTokenizingAnnotator - tokenizes the text following Penn Treebank conventions.
167
+ # - WordToSentenceAnnotator - splits a sequence of words into a sequence of sentences.
168
+ # - POSTaggerAnnotator - annotates the text with part-of-speech tags.
169
+ # - MorphaAnnotator - morphological normalizer (generates lemmas).
170
+ # - NERAnnotator - annotates the text with named-entity labels.
171
+ # - NERCombinerAnnotator - combines several NER models (use this instead of NERAnnotator!).
172
+ # - TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text).
173
+ # - ParserAnnotator - generates constituent and dependency trees.
174
+ # - NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates.
175
+ # - TimeWordAnnotator - recognizes common temporal expressions, such as "teatime".
176
+ # - QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities.
177
+ # - SRLAnnotator - annotates predicates and their semantic roles.
178
+ # - CorefAnnotator - implements pronominal anaphora resolution using a statistical model (deprecated!).
179
+ # - DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model (newer model, use this!).
180
+ # - NFLAnnotator - implements entity and relation mention extraction for the NFL domain.
109
181
  def self.load_class(klass, base = 'edu.stanford.nlp.pipeline')
110
182
  self.load_jars unless @@loaded
111
183
  const_set(klass.intern, Rjb::import("#{base}.#{klass}"))
112
184
  end
113
-
114
- # Create a java.util.Properties object from a hash.
185
+
186
+ # Private helper functions.
187
+ private
188
+ # HCreate a java.util.Properties object from a hash.
115
189
  def self.get_properties(properties)
116
190
  props = Properties.new
117
191
  properties.each do |property, value|
@@ -119,10 +193,10 @@ module StanfordCoreNLP
119
193
  end
120
194
  props
121
195
  end
122
-
123
- # Helper function: under_case -> CamelCase.
196
+
197
+ # Under_case -> CamelCase.
124
198
  def self.camel_case(text)
125
199
  text.to_s.gsub(/^[a-z]|_[a-z]/) { |a| a.upcase }.gsub('_', '')
126
200
  end
127
-
128
- end
201
+
202
+ end
@@ -0,0 +1,453 @@
1
+ module StanfordCoreNLP
2
+
3
+ class Config
4
+
5
+ # A hash of language codes in humanized,
6
+ # 2 and 3-letter ISO639 codes.
7
+ LanguageCodes = {
8
+ :english => [:en, :eng, :english],
9
+ :german => [:de, :ger, :german],
10
+ :french => [:fr, :fre, :french],
11
+ :arabic => [:ar, :ara, :arabic],
12
+ :chinese => [:ch, :chi, :chinese],
13
+ :xinhua => [:xi, :xin, :xinhua]
14
+ }
15
+
16
+ # Folders inside the JAR path for the models.
17
+ ModelFolders = {
18
+ :pos => 'taggers/',
19
+ :parser => 'grammar/',
20
+ :ner => 'classifiers/',
21
+ :dcoref => 'dcoref/'
22
+ }
23
+
24
+ # Default models for all languages.
25
+ Models = {
26
+ :pos => {
27
+ :english => 'english-left3words-distsim.tagger',
28
+ :german => 'german-fast.tagger',
29
+ :french => 'french.tagger',
30
+ :arabic => 'arabic-fast.tagger',
31
+ :chinese => 'chinese.tagger',
32
+ :xinhua => nil
33
+ },
34
+ :parser => {
35
+ :english => 'englishPCFG.ser.gz',
36
+ :german => 'germanPCFG.ser.gz',
37
+ :french => 'frenchFactored.ser.gz',
38
+ :arabic => 'arabicFactored.ser.gz',
39
+ :chinese => 'chinesePCFG.ser.gz',
40
+ :xinhua => 'xinhuaPCFG.ser.gz'
41
+ },
42
+ :ner => {
43
+ :english => {
44
+ '3class' => 'all.3class.distsim.crf.ser.gz',
45
+ '7class' => 'muc.7class.distsim.crf.ser.gz',
46
+ 'MISCclass' => 'conll.4class.distsim.crf.ser.gz'
47
+ },
48
+ :german => {},
49
+ :french => {},
50
+ :arabic => {},
51
+ :chinese => {},
52
+ :xinhua => {}
53
+ },
54
+ :dcoref => {
55
+ :english => {
56
+ 'demonym' => 'demonyms.txt',
57
+ 'animate' => 'animate.unigrams.txt',
58
+ 'female' => 'female.unigrams.txt',
59
+ 'inanimate' => 'inanimate.unigrams.txt',
60
+ 'male' => 'male.unigrams.txt',
61
+ 'neutral' => 'neutral.unigrams.txt',
62
+ 'plural' => 'plural.unigrams.txt',
63
+ 'singular' => 'singular.unigrams.txt',
64
+ 'states' => 'state-abbreviations.txt',
65
+ 'countries' => 'unknown.txt', # Fix - can somebody provide this file?
66
+ 'states.provinces' => 'unknown.txt', # Fix - can somebody provide this file?
67
+ 'extra.gender' => 'namegender.combine.txt'
68
+ },
69
+ :german => {},
70
+ :french => {},
71
+ :arabic => {},
72
+ :chinese => {},
73
+ :xinhua => {}
74
+ }
75
+ # Models to add.
76
+
77
+ #"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
78
+ #"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
79
+ #"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
80
+ #"nfl.gazetteer" - path towards the gazetteer for the NFL domain
81
+ #"nfl.relation.model" - path towards the NFL relation extraction model
82
+ }
83
+
84
+ # List of annotations by JAVA class path.
85
+ Annotations = {
86
+
87
+ 'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
88
+ 'AdjectivalModifierGRAnnotation',
89
+ 'AdverbialModifierGRAnnotation',
90
+ 'ArgumentGRAnnotation',
91
+ 'AspectMarkerGRAnnotation',
92
+ 'AssociativeMarkerGRAnnotation',
93
+ 'AssociativeModifierGRAnnotation',
94
+ 'AttributiveGRAnnotation',
95
+ 'AuxModifierGRAnnotation',
96
+ 'AuxPassiveGRAnnotation',
97
+ 'BaGRAnnotation',
98
+ 'ClausalComplementGRAnnotation',
99
+ 'ClausalSubjectGRAnnotation',
100
+ 'ClauseModifierGRAnnotation',
101
+ 'ComplementGRAnnotation',
102
+ 'ComplementizerGRAnnotation',
103
+ 'ControllingSubjectGRAnnotation',
104
+ 'CoordinationGRAnnotation',
105
+ 'DeterminerGRAnnotation',
106
+ 'DirectObjectGRAnnotation',
107
+ 'DvpMarkerGRAnnotation',
108
+ 'DvpModifierGRAnnotation',
109
+ 'EtcGRAnnotation',
110
+ 'LocalizerComplementGRAnnotation',
111
+ 'ModalGRAnnotation',
112
+ 'ModifierGRAnnotation',
113
+ 'NegationModifierGRAnnotation',
114
+ 'NominalPassiveSubjectGRAnnotation',
115
+ 'NominalSubjectGRAnnotation',
116
+ 'NounCompoundModifierGRAnnotation',
117
+ 'NumberModifierGRAnnotation',
118
+ 'NumericModifierGRAnnotation',
119
+ 'ObjectGRAnnotation',
120
+ 'OrdNumberGRAnnotation',
121
+ 'ParentheticalGRAnnotation',
122
+ 'ParticipialModifierGRAnnotation',
123
+ 'PreconjunctGRAnnotation',
124
+ 'PrepositionalLocalizerModifierGRAnnotation',
125
+ 'PrepositionalModifierGRAnnotation',
126
+ 'PrepositionalObjectGRAnnotation',
127
+ 'PunctuationGRAnnotation',
128
+ 'RangeGRAnnotation',
129
+ 'RelativeClauseModifierGRAnnotation',
130
+ 'ResultativeComplementGRAnnotation',
131
+ 'SemanticDependentGRAnnotation',
132
+ 'SubjectGRAnnotation',
133
+ 'TemporalClauseGRAnnotation',
134
+ 'TemporalGRAnnotation',
135
+ 'TimePostpositionGRAnnotation',
136
+ 'TopicGRAnnotation',
137
+ 'VerbCompoundGRAnnotation',
138
+ 'VerbModifierGRAnnotation',
139
+ 'XClausalComplementGRAnnotation'
140
+ ],
141
+
142
+ 'nlp.dcoref.CoNLL2011DocumentReader' => [
143
+ 'CorefMentionAnnotation',
144
+ 'NamedEntityAnnotation'
145
+ ],
146
+
147
+ 'nlp.ling.CoreAnnotations' => [
148
+
149
+ 'AbbrAnnotation',
150
+ 'AbgeneAnnotation',
151
+ 'AbstrAnnotation',
152
+ 'AfterAnnotation',
153
+ 'AnswerAnnotation',
154
+ 'AnswerObjectAnnotation',
155
+ 'AntecedentAnnotation',
156
+ 'ArgDescendentAnnotation',
157
+ 'ArgumentAnnotation',
158
+ 'BagOfWordsAnnotation',
159
+ 'BeAnnotation',
160
+ 'BeforeAnnotation',
161
+ 'BeginIndexAnnotation',
162
+ 'BestCliquesAnnotation',
163
+ 'BestFullAnnotation',
164
+ 'CalendarAnnotation',
165
+ 'CategoryAnnotation',
166
+ 'CategoryFunctionalTagAnnotation',
167
+ 'CharacterOffsetBeginAnnotation',
168
+ 'CharacterOffsetEndAnnotation',
169
+ 'CharAnnotation',
170
+ 'ChineseCharAnnotation',
171
+ 'ChineseIsSegmentedAnnotation',
172
+ 'ChineseOrigSegAnnotation',
173
+ 'ChineseSegAnnotation',
174
+ 'ChunkAnnotation',
175
+ 'CoarseTagAnnotation',
176
+ 'CommonWordsAnnotation',
177
+ 'CoNLLDepAnnotation',
178
+ 'CoNLLDepParentIndexAnnotation',
179
+ 'CoNLLDepTypeAnnotation',
180
+ 'CoNLLPredicateAnnotation',
181
+ 'CoNLLSRLAnnotation',
182
+ 'ContextsAnnotation',
183
+ 'CopyAnnotation',
184
+ 'CostMagnificationAnnotation',
185
+ 'CovertIDAnnotation',
186
+ 'D2_LBeginAnnotation',
187
+ 'D2_LEndAnnotation',
188
+ 'D2_LMiddleAnnotation',
189
+ 'DayAnnotation',
190
+ 'DependentsAnnotation',
191
+ 'DictAnnotation',
192
+ 'DistSimAnnotation',
193
+ 'DoAnnotation',
194
+ 'DocDateAnnotation',
195
+ 'DocIDAnnotation',
196
+ 'DomainAnnotation',
197
+ 'EndIndexAnnotation',
198
+ 'EntityClassAnnotation',
199
+ 'EntityRuleAnnotation',
200
+ 'EntityTypeAnnotation',
201
+ 'FeaturesAnnotation',
202
+ 'FemaleGazAnnotation',
203
+ 'FirstChildAnnotation',
204
+ 'ForcedSentenceEndAnnotation',
205
+ 'FreqAnnotation',
206
+ 'GazAnnotation',
207
+ 'GazetteerAnnotation',
208
+ 'GenericTokensAnnotation',
209
+ 'GeniaAnnotation',
210
+ 'GoldAnswerAnnotation',
211
+ 'GovernorAnnotation',
212
+ 'GrandparentAnnotation',
213
+ 'HaveAnnotation',
214
+ 'HeadWordStringAnnotation',
215
+ 'HeightAnnotation',
216
+ 'IDAnnotation',
217
+ 'IDFAnnotation',
218
+ 'INAnnotation',
219
+ 'IndexAnnotation',
220
+ 'InterpretationAnnotation',
221
+ 'IsDateRangeAnnotation',
222
+ 'IsURLAnnotation',
223
+ 'LabelAnnotation',
224
+ 'LastGazAnnotation',
225
+ 'LastTaggedAnnotation',
226
+ 'LBeginAnnotation',
227
+ 'LeftChildrenNodeAnnotation',
228
+ 'LeftTermAnnotation',
229
+ 'LemmaAnnotation',
230
+ 'LEndAnnotation',
231
+ 'LengthAnnotation',
232
+ 'LMiddleAnnotation',
233
+ 'MaleGazAnnotation',
234
+ 'MarkingAnnotation',
235
+ 'MonthAnnotation',
236
+ 'MorphoCaseAnnotation',
237
+ 'MorphoGenAnnotation',
238
+ 'MorphoNumAnnotation',
239
+ 'MorphoPersAnnotation',
240
+ 'NamedEntityTagAnnotation',
241
+ 'NeighborsAnnotation',
242
+ 'NERIDAnnotation',
243
+ 'NormalizedNamedEntityTagAnnotation',
244
+ 'NotAnnotation',
245
+ 'NumericCompositeObjectAnnotation',
246
+ 'NumericCompositeTypeAnnotation',
247
+ 'NumericCompositeValueAnnotation',
248
+ 'NumericObjectAnnotation',
249
+ 'NumericTypeAnnotation',
250
+ 'NumericValueAnnotation',
251
+ 'NumerizedTokensAnnotation',
252
+ 'NumTxtSentencesAnnotation',
253
+ 'OriginalAnswerAnnotation',
254
+ 'OriginalCharAnnotation',
255
+ 'OriginalTextAnnotation',
256
+ 'ParagraphAnnotation',
257
+ 'ParagraphsAnnotation',
258
+ 'ParaPositionAnnotation',
259
+ 'ParentAnnotation',
260
+ 'PartOfSpeechAnnotation',
261
+ 'PercentAnnotation',
262
+ 'PhraseWordsAnnotation',
263
+ 'PhraseWordsTagAnnotation',
264
+ 'PolarityAnnotation',
265
+ 'PositionAnnotation',
266
+ 'PossibleAnswersAnnotation',
267
+ 'PredictedAnswerAnnotation',
268
+ 'PrevChildAnnotation',
269
+ 'PriorAnnotation',
270
+ 'ProjectedCategoryAnnotation',
271
+ 'ProtoAnnotation',
272
+ 'RoleAnnotation',
273
+ 'SectionAnnotation',
274
+ 'SemanticHeadTagAnnotation',
275
+ 'SemanticHeadWordAnnotation',
276
+ 'SemanticTagAnnotation',
277
+ 'SemanticWordAnnotation',
278
+ 'SentenceIDAnnotation',
279
+ 'SentenceIndexAnnotation',
280
+ 'SentencePositionAnnotation',
281
+ 'SentencesAnnotation',
282
+ 'ShapeAnnotation',
283
+ 'SpaceBeforeAnnotation',
284
+ 'SpanAnnotation',
285
+ 'SpeakerAnnotation',
286
+ 'SRL_ID',
287
+ 'SRLIDAnnotation',
288
+ 'SRLInstancesAnnotation',
289
+ 'StackedNamedEntityTagAnnotation',
290
+ 'StateAnnotation',
291
+ 'StemAnnotation',
292
+ 'SubcategorizationAnnotation',
293
+ 'TagLabelAnnotation',
294
+ 'TextAnnotation',
295
+ 'TokenBeginAnnotation',
296
+ 'TokenEndAnnotation',
297
+ 'TokensAnnotation',
298
+ 'TopicAnnotation',
299
+ 'TrueCaseAnnotation',
300
+ 'TrueCaseTextAnnotation',
301
+ 'TrueTagAnnotation',
302
+ 'UBlockAnnotation',
303
+ 'UnaryAnnotation',
304
+ 'UnknownAnnotation',
305
+ 'UtteranceAnnotation',
306
+ 'UTypeAnnotation',
307
+ 'ValueAnnotation',
308
+ 'VerbSenseAnnotation',
309
+ 'WebAnnotation',
310
+ 'WordFormAnnotation',
311
+ 'WordnetSynAnnotation',
312
+ 'WordPositionAnnotation',
313
+ 'WordSenseAnnotation',
314
+ 'XmlContextAnnotation',
315
+ 'XmlElementAnnotation',
316
+ 'YearAnnotation'
317
+ ],
318
+
319
+ 'nlp.dcoref.CorefCoreAnnotations' => [
320
+
321
+ 'CorefAnnotation',
322
+ 'CorefChainAnnotation',
323
+ 'CorefClusterAnnotation',
324
+ 'CorefClusterIdAnnotation',
325
+ 'CorefDestAnnotation',
326
+ 'CorefGraphAnnotation'
327
+ ],
328
+
329
+ 'nlp.ling.CoreLabel' => [
330
+ 'GenericAnnotation'
331
+ ],
332
+
333
+ 'nlp.trees.EnglishGrammaticalRelations' => [
334
+ 'AbbreviationModifierGRAnnotation',
335
+ 'AdjectivalComplementGRAnnotation',
336
+ 'AdjectivalModifierGRAnnotation',
337
+ 'AdvClauseModifierGRAnnotation',
338
+ 'AdverbialModifierGRAnnotation',
339
+ 'AgentGRAnnotation',
340
+ 'AppositionalModifierGRAnnotation',
341
+ 'ArgumentGRAnnotation',
342
+ 'AttributiveGRAnnotation',
343
+ 'AuxModifierGRAnnotation',
344
+ 'AuxPassiveGRAnnotation',
345
+ 'ClausalComplementGRAnnotation',
346
+ 'ClausalPassiveSubjectGRAnnotation',
347
+ 'ClausalSubjectGRAnnotation',
348
+ 'ComplementGRAnnotation',
349
+ 'ComplementizerGRAnnotation',
350
+ 'ConjunctGRAnnotation',
351
+ 'ControllingSubjectGRAnnotation',
352
+ 'CoordinationGRAnnotation',
353
+ 'CopulaGRAnnotation',
354
+ 'DeterminerGRAnnotation',
355
+ 'DirectObjectGRAnnotation',
356
+ 'ExpletiveGRAnnotation',
357
+ 'IndirectObjectGRAnnotation',
358
+ 'InfinitivalModifierGRAnnotation',
359
+ 'MarkerGRAnnotation',
360
+ 'ModifierGRAnnotation',
361
+ 'MultiWordExpressionGRAnnotation',
362
+ 'NegationModifierGRAnnotation',
363
+ 'NominalPassiveSubjectGRAnnotation',
364
+ 'NominalSubjectGRAnnotation',
365
+ 'NounCompoundModifierGRAnnotation',
366
+ 'NpAdverbialModifierGRAnnotation',
367
+ 'NumberModifierGRAnnotation',
368
+ 'NumericModifierGRAnnotation',
369
+ 'ObjectGRAnnotation',
370
+ 'ParataxisGRAnnotation',
371
+ 'ParticipialModifierGRAnnotation',
372
+ 'PhrasalVerbParticleGRAnnotation',
373
+ 'PossessionModifierGRAnnotation',
374
+ 'PossessiveModifierGRAnnotation',
375
+ 'PreconjunctGRAnnotation',
376
+ 'PredeterminerGRAnnotation',
377
+ 'PredicateGRAnnotation',
378
+ 'PrepositionalComplementGRAnnotation',
379
+ 'PrepositionalModifierGRAnnotation',
380
+ 'PrepositionalObjectGRAnnotation',
381
+ 'PunctuationGRAnnotation',
382
+ 'PurposeClauseModifierGRAnnotation',
383
+ 'QuantifierModifierGRAnnotation',
384
+ 'ReferentGRAnnotation',
385
+ 'RelativeClauseModifierGRAnnotation',
386
+ 'RelativeGRAnnotation',
387
+ 'SemanticDependentGRAnnotation',
388
+ 'SubjectGRAnnotation',
389
+ 'TemporalModifierGRAnnotation',
390
+ 'XClausalComplementGRAnnotation'
391
+ ],
392
+
393
+ 'nlp.trees.GrammaticalRelation' => [
394
+ 'DependentGRAnnotation',
395
+ 'GovernorGRAnnotation',
396
+ 'GrammaticalRelationAnnotation',
397
+ 'KillGRAnnotation',
398
+ 'Language',
399
+ 'RootGRAnnotation'
400
+ ],
401
+
402
+ 'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
403
+ 'DependencyAnnotation',
404
+ 'DocumentDirectoryAnnotation',
405
+ 'DocumentIdAnnotation',
406
+ 'EntityMentionsAnnotation',
407
+ 'EventMentionsAnnotation',
408
+ 'GenderAnnotation',
409
+ 'RelationMentionsAnnotation',
410
+ 'TriggerAnnotation'
411
+ ],
412
+
413
+ 'nlp.parser.lexparser.ParserAnnotations' => [
414
+ 'ConstraintAnnotation'
415
+ ],
416
+
417
+ 'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
418
+ 'BasicDependenciesAnnotation',
419
+ 'CollapsedCCProcessedDependenciesAnnotation',
420
+ 'CollapsedDependenciesAnnotation'
421
+ ],
422
+
423
+ 'nlp.time.TimeAnnotations' => [
424
+ 'TimexAnnotation',
425
+ 'TimexAnnotations'
426
+ ],
427
+
428
+ 'nlp.time.TimeExpression' => [
429
+ 'Annotation',
430
+ 'ChildrenAnnotation'
431
+ ],
432
+
433
+ 'nlp.trees.TreeCoreAnnotations' => [
434
+ 'TreeHeadTagAnnotation',
435
+ 'TreeHeadWordAnnotation',
436
+ 'TreeAnnotation'
437
+ ]
438
+ }
439
+
440
+ # Create a list of annotation names => paths.
441
+ annotations_by_name = {}
442
+ Annotations.each do |base_class, annotation_classes|
443
+ annotation_classes.each do |annotation_class|
444
+ annotations_by_name[annotation_class] ||= []
445
+ annotations_by_name[annotation_class] << base_class
446
+ end
447
+ end
448
+
449
+ # Hash of name => path.
450
+ AnnotationsByName = annotations_by_name
451
+
452
+ end
453
+ end
@@ -18,5 +18,32 @@ module StanfordCoreNLP
18
18
  end
19
19
  end
20
20
 
21
+ # Dynamically defined on all proxied annotation classes.
22
+ # Get an annotation using the annotation bridge.
23
+ def get(annotation, anno_base = nil)
24
+ if !java_methods.include?('get(Ljava.lang.Class;)')
25
+ raise'No annotation can be retrieved on this object.'
26
+ else
27
+ anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
28
+ if anno_base
29
+ raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
30
+ anno_bases = [anno_base]
31
+ else
32
+ anno_bases = Config::AnnotationsByName[anno_class]
33
+ raise "The annotation #{anno_class} doesn't exist." unless anno_bases
34
+ end
35
+ if anno_bases.size > 1
36
+ msg = "There are many different annotations bearing the name #{anno_class}. "
37
+ msg << "Please specify one of the following base classes as second parameter to disambiguate: "
38
+ msg << anno_bases.join(',')
39
+ raise msg
40
+ else
41
+ base_class = anno_bases[0]
42
+ end
43
+ url = "edu.stanford.#{base_class}$#{anno_class}"
44
+ AnnotationBridge.getAnnotation(self, url)
45
+ end
46
+ end
47
+
21
48
  end
22
49
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: stanford-core-nlp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.5
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-01-31 00:00:00.000000000 Z
12
+ date: 2012-02-04 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rjb
16
- requirement: &70226234873780 !ruby/object:Gem::Requirement
16
+ requirement: &70191057037760 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *70226234873780
24
+ version_requirements: *70191057037760
25
25
  description: ! " High-level Ruby bindings to the Stanford CoreNLP package, a set natural
26
26
  language processing \ntools for English, including tokenization, part-of-speech
27
27
  tagging, lemmatization, named entity recognition,\nparsing, and coreference resolution. "
@@ -31,9 +31,9 @@ executables: []
31
31
  extensions: []
32
32
  extra_rdoc_files: []
33
33
  files:
34
+ - lib/stanford-core-nlp/config.rb
34
35
  - lib/stanford-core-nlp/jar_loader.rb
35
36
  - lib/stanford-core-nlp/java_wrapper.rb
36
- - lib/stanford-core-nlp/stanford_annotations.rb
37
37
  - lib/stanford-core-nlp.rb
38
38
  - bin/bridge.jar
39
39
  - bin/INFO
@@ -1,401 +0,0 @@
1
- module StanfordCoreNLP
2
-
3
- # @private
4
- Annotations = {
5
-
6
- 'nlp.trees.international.pennchinese.ChineseGrammaticalRelations' => [
7
- 'AdjectivalModifierGRAnnotation',
8
- 'AdverbialModifierGRAnnotation',
9
- 'ArgumentGRAnnotation',
10
- 'AspectMarkerGRAnnotation',
11
- 'AssociativeMarkerGRAnnotation',
12
- 'AssociativeModifierGRAnnotation',
13
- 'AttributiveGRAnnotation',
14
- 'AuxModifierGRAnnotation',
15
- 'AuxPassiveGRAnnotation',
16
- 'BaGRAnnotation',
17
- 'ClausalComplementGRAnnotation',
18
- 'ClausalSubjectGRAnnotation',
19
- 'ClauseModifierGRAnnotation',
20
- 'ComplementGRAnnotation',
21
- 'ComplementizerGRAnnotation',
22
- 'ControllingSubjectGRAnnotation',
23
- 'CoordinationGRAnnotation',
24
- 'DeterminerGRAnnotation',
25
- 'DirectObjectGRAnnotation',
26
- 'DvpMarkerGRAnnotation',
27
- 'DvpModifierGRAnnotation',
28
- 'EtcGRAnnotation',
29
- 'LocalizerComplementGRAnnotation',
30
- 'ModalGRAnnotation',
31
- 'ModifierGRAnnotation',
32
- 'NegationModifierGRAnnotation',
33
- 'NominalPassiveSubjectGRAnnotation',
34
- 'NominalSubjectGRAnnotation',
35
- 'NounCompoundModifierGRAnnotation',
36
- 'NumberModifierGRAnnotation',
37
- 'NumericModifierGRAnnotation',
38
- 'ObjectGRAnnotation',
39
- 'OrdNumberGRAnnotation',
40
- 'ParentheticalGRAnnotation',
41
- 'ParticipialModifierGRAnnotation',
42
- 'PreconjunctGRAnnotation',
43
- 'PrepositionalLocalizerModifierGRAnnotation',
44
- 'PrepositionalModifierGRAnnotation',
45
- 'PrepositionalObjectGRAnnotation',
46
- 'PunctuationGRAnnotation',
47
- 'RangeGRAnnotation',
48
- 'RelativeClauseModifierGRAnnotation',
49
- 'ResultativeComplementGRAnnotation',
50
- 'SemanticDependentGRAnnotation',
51
- 'SubjectGRAnnotation',
52
- 'TemporalClauseGRAnnotation',
53
- 'TemporalGRAnnotation',
54
- 'TimePostpositionGRAnnotation',
55
- 'TopicGRAnnotation',
56
- 'VerbCompoundGRAnnotation',
57
- 'VerbModifierGRAnnotation',
58
- 'XClausalComplementGRAnnotation'
59
- ],
60
-
61
- 'nlp.dcoref.CoNLL2011DocumentReader' => [
62
- 'CorefMentionAnnotation',
63
- 'NamedEntityAnnotation'
64
- ],
65
-
66
- 'nlp.ling.CoreAnnotations' => [
67
-
68
- 'AbbrAnnotation',
69
- 'AbgeneAnnotation',
70
- 'AbstrAnnotation',
71
- 'AfterAnnotation',
72
- 'AnswerAnnotation',
73
- 'AnswerObjectAnnotation',
74
- 'AntecedentAnnotation',
75
- 'ArgDescendentAnnotation',
76
- 'ArgumentAnnotation',
77
- 'BagOfWordsAnnotation',
78
- 'BeAnnotation',
79
- 'BeforeAnnotation',
80
- 'BeginIndexAnnotation',
81
- 'BestCliquesAnnotation',
82
- 'BestFullAnnotation',
83
- 'CalendarAnnotation',
84
- 'CategoryAnnotation',
85
- 'CategoryFunctionalTagAnnotation',
86
- 'CharacterOffsetBeginAnnotation',
87
- 'CharacterOffsetEndAnnotation',
88
- 'CharAnnotation',
89
- 'ChineseCharAnnotation',
90
- 'ChineseIsSegmentedAnnotation',
91
- 'ChineseOrigSegAnnotation',
92
- 'ChineseSegAnnotation',
93
- 'ChunkAnnotation',
94
- 'CoarseTagAnnotation',
95
- 'CommonWordsAnnotation',
96
- 'CoNLLDepAnnotation',
97
- 'CoNLLDepParentIndexAnnotation',
98
- 'CoNLLDepTypeAnnotation',
99
- 'CoNLLPredicateAnnotation',
100
- 'CoNLLSRLAnnotation',
101
- 'ContextsAnnotation',
102
- 'CopyAnnotation',
103
- 'CostMagnificationAnnotation',
104
- 'CovertIDAnnotation',
105
- 'D2_LBeginAnnotation',
106
- 'D2_LEndAnnotation',
107
- 'D2_LMiddleAnnotation',
108
- 'DayAnnotation',
109
- 'DependentsAnnotation',
110
- 'DictAnnotation',
111
- 'DistSimAnnotation',
112
- 'DoAnnotation',
113
- 'DocDateAnnotation',
114
- 'DocIDAnnotation',
115
- 'DomainAnnotation',
116
- 'EndIndexAnnotation',
117
- 'EntityClassAnnotation',
118
- 'EntityRuleAnnotation',
119
- 'EntityTypeAnnotation',
120
- 'FeaturesAnnotation',
121
- 'FemaleGazAnnotation',
122
- 'FirstChildAnnotation',
123
- 'ForcedSentenceEndAnnotation',
124
- 'FreqAnnotation',
125
- 'GazAnnotation',
126
- 'GazetteerAnnotation',
127
- 'GenericTokensAnnotation',
128
- 'GeniaAnnotation',
129
- 'GoldAnswerAnnotation',
130
- 'GovernorAnnotation',
131
- 'GrandparentAnnotation',
132
- 'HaveAnnotation',
133
- 'HeadWordStringAnnotation',
134
- 'HeightAnnotation',
135
- 'IDAnnotation',
136
- 'IDFAnnotation',
137
- 'INAnnotation',
138
- 'IndexAnnotation',
139
- 'InterpretationAnnotation',
140
- 'IsDateRangeAnnotation',
141
- 'IsURLAnnotation',
142
- 'LabelAnnotation',
143
- 'LastGazAnnotation',
144
- 'LastTaggedAnnotation',
145
- 'LBeginAnnotation',
146
- 'LeftChildrenNodeAnnotation',
147
- 'LeftTermAnnotation',
148
- 'LemmaAnnotation',
149
- 'LEndAnnotation',
150
- 'LengthAnnotation',
151
- 'LMiddleAnnotation',
152
- 'MaleGazAnnotation',
153
- 'MarkingAnnotation',
154
- 'MonthAnnotation',
155
- 'MorphoCaseAnnotation',
156
- 'MorphoGenAnnotation',
157
- 'MorphoNumAnnotation',
158
- 'MorphoPersAnnotation',
159
- 'NamedEntityTagAnnotation',
160
- 'NeighborsAnnotation',
161
- 'NERIDAnnotation',
162
- 'NormalizedNamedEntityTagAnnotation',
163
- 'NotAnnotation',
164
- 'NumericCompositeObjectAnnotation',
165
- 'NumericCompositeTypeAnnotation',
166
- 'NumericCompositeValueAnnotation',
167
- 'NumericObjectAnnotation',
168
- 'NumericTypeAnnotation',
169
- 'NumericValueAnnotation',
170
- 'NumerizedTokensAnnotation',
171
- 'NumTxtSentencesAnnotation',
172
- 'OriginalAnswerAnnotation',
173
- 'OriginalCharAnnotation',
174
- 'OriginalTextAnnotation',
175
- 'ParagraphAnnotation',
176
- 'ParagraphsAnnotation',
177
- 'ParaPositionAnnotation',
178
- 'ParentAnnotation',
179
- 'PartOfSpeechAnnotation',
180
- 'PercentAnnotation',
181
- 'PhraseWordsAnnotation',
182
- 'PhraseWordsTagAnnotation',
183
- 'PolarityAnnotation',
184
- 'PositionAnnotation',
185
- 'PossibleAnswersAnnotation',
186
- 'PredictedAnswerAnnotation',
187
- 'PrevChildAnnotation',
188
- 'PriorAnnotation',
189
- 'ProjectedCategoryAnnotation',
190
- 'ProtoAnnotation',
191
- 'RoleAnnotation',
192
- 'SectionAnnotation',
193
- 'SemanticHeadTagAnnotation',
194
- 'SemanticHeadWordAnnotation',
195
- 'SemanticTagAnnotation',
196
- 'SemanticWordAnnotation',
197
- 'SentenceIDAnnotation',
198
- 'SentenceIndexAnnotation',
199
- 'SentencePositionAnnotation',
200
- 'SentencesAnnotation',
201
- 'ShapeAnnotation',
202
- 'SpaceBeforeAnnotation',
203
- 'SpanAnnotation',
204
- 'SpeakerAnnotation',
205
- 'SRL_ID',
206
- 'SRLIDAnnotation',
207
- 'SRLInstancesAnnotation',
208
- 'StackedNamedEntityTagAnnotation',
209
- 'StateAnnotation',
210
- 'StemAnnotation',
211
- 'SubcategorizationAnnotation',
212
- 'TagLabelAnnotation',
213
- 'TextAnnotation',
214
- 'TokenBeginAnnotation',
215
- 'TokenEndAnnotation',
216
- 'TokensAnnotation',
217
- 'TopicAnnotation',
218
- 'TrueCaseAnnotation',
219
- 'TrueCaseTextAnnotation',
220
- 'TrueTagAnnotation',
221
- 'UBlockAnnotation',
222
- 'UnaryAnnotation',
223
- 'UnknownAnnotation',
224
- 'UtteranceAnnotation',
225
- 'UTypeAnnotation',
226
- 'ValueAnnotation',
227
- 'VerbSenseAnnotation',
228
- 'WebAnnotation',
229
- 'WordFormAnnotation',
230
- 'WordnetSynAnnotation',
231
- 'WordPositionAnnotation',
232
- 'WordSenseAnnotation',
233
- 'XmlContextAnnotation',
234
- 'XmlElementAnnotation',
235
- 'YearAnnotation'
236
- ],
237
-
238
- 'nlp.dcoref.CorefCoreAnnotations' => [
239
-
240
- 'CorefAnnotation',
241
- 'CorefChainAnnotation',
242
- 'CorefClusterAnnotation',
243
- 'CorefClusterIdAnnotation',
244
- 'CorefDestAnnotation',
245
- 'CorefGraphAnnotation'
246
- ],
247
-
248
- 'nlp.ling.CoreLabel' => [
249
- 'GenericAnnotation'
250
- ],
251
-
252
- 'nlp.trees.EnglishGrammaticalRelations' => [
253
- 'AbbreviationModifierGRAnnotation',
254
- 'AdjectivalComplementGRAnnotation',
255
- 'AdjectivalModifierGRAnnotation',
256
- 'AdvClauseModifierGRAnnotation',
257
- 'AdverbialModifierGRAnnotation',
258
- 'AgentGRAnnotation',
259
- 'AppositionalModifierGRAnnotation',
260
- 'ArgumentGRAnnotation',
261
- 'AttributiveGRAnnotation',
262
- 'AuxModifierGRAnnotation',
263
- 'AuxPassiveGRAnnotation',
264
- 'ClausalComplementGRAnnotation',
265
- 'ClausalPassiveSubjectGRAnnotation',
266
- 'ClausalSubjectGRAnnotation',
267
- 'ComplementGRAnnotation',
268
- 'ComplementizerGRAnnotation',
269
- 'ConjunctGRAnnotation',
270
- 'ControllingSubjectGRAnnotation',
271
- 'CoordinationGRAnnotation',
272
- 'CopulaGRAnnotation',
273
- 'DeterminerGRAnnotation',
274
- 'DirectObjectGRAnnotation',
275
- 'ExpletiveGRAnnotation',
276
- 'IndirectObjectGRAnnotation',
277
- 'InfinitivalModifierGRAnnotation',
278
- 'MarkerGRAnnotation',
279
- 'ModifierGRAnnotation',
280
- 'MultiWordExpressionGRAnnotation',
281
- 'NegationModifierGRAnnotation',
282
- 'NominalPassiveSubjectGRAnnotation',
283
- 'NominalSubjectGRAnnotation',
284
- 'NounCompoundModifierGRAnnotation',
285
- 'NpAdverbialModifierGRAnnotation',
286
- 'NumberModifierGRAnnotation',
287
- 'NumericModifierGRAnnotation',
288
- 'ObjectGRAnnotation',
289
- 'ParataxisGRAnnotation',
290
- 'ParticipialModifierGRAnnotation',
291
- 'PhrasalVerbParticleGRAnnotation',
292
- 'PossessionModifierGRAnnotation',
293
- 'PossessiveModifierGRAnnotation',
294
- 'PreconjunctGRAnnotation',
295
- 'PredeterminerGRAnnotation',
296
- 'PredicateGRAnnotation',
297
- 'PrepositionalComplementGRAnnotation',
298
- 'PrepositionalModifierGRAnnotation',
299
- 'PrepositionalObjectGRAnnotation',
300
- 'PunctuationGRAnnotation',
301
- 'PurposeClauseModifierGRAnnotation',
302
- 'QuantifierModifierGRAnnotation',
303
- 'ReferentGRAnnotation',
304
- 'RelativeClauseModifierGRAnnotation',
305
- 'RelativeGRAnnotation',
306
- 'SemanticDependentGRAnnotation',
307
- 'SubjectGRAnnotation',
308
- 'TemporalModifierGRAnnotation',
309
- 'XClausalComplementGRAnnotation'
310
- ],
311
-
312
- 'nlp.trees.GrammaticalRelation' => [
313
- 'DependentGRAnnotation',
314
- 'GovernorGRAnnotation',
315
- 'GrammaticalRelationAnnotation',
316
- 'KillGRAnnotation',
317
- 'Language',
318
- 'RootGRAnnotation'
319
- ],
320
-
321
- 'nlp.ie.machinereading.structure.MachineReadingAnnotations' => [
322
- 'DependencyAnnotation',
323
- 'DocumentDirectoryAnnotation',
324
- 'DocumentIdAnnotation',
325
- 'EntityMentionsAnnotation',
326
- 'EventMentionsAnnotation',
327
- 'GenderAnnotation',
328
- 'RelationMentionsAnnotation',
329
- 'TriggerAnnotation'
330
- ],
331
-
332
- 'nlp.parser.lexparser.ParserAnnotations' => [
333
- 'ConstraintAnnotation'
334
- ],
335
-
336
- 'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
337
- 'BasicDependenciesAnnotation',
338
- 'CollapsedCCProcessedDependenciesAnnotation',
339
- 'CollapsedDependenciesAnnotation'
340
- ],
341
-
342
- 'nlp.time.TimeAnnotations' => [
343
- 'TimexAnnotation',
344
- 'TimexAnnotations'
345
- ],
346
-
347
- 'nlp.time.TimeExpression' => [
348
- 'Annotation',
349
- 'ChildrenAnnotation'
350
- ],
351
-
352
- 'nlp.trees.TreeCoreAnnotations' => [
353
- 'TreeHeadTagAnnotation',
354
- 'TreeHeadWordAnnotation',
355
- 'TreeAnnotation'
356
- ]
357
- }
358
-
359
- annotations_by_name = {}
360
- Annotations.each do |base_class, annotation_classes|
361
- annotation_classes.each do |annotation_class|
362
- annotations_by_name[annotation_class] ||= []
363
- annotations_by_name[annotation_class] << base_class
364
- end
365
- end
366
-
367
- AnnotationsByName = annotations_by_name
368
-
369
- # Modify the Rjb JavaProxy class to add our own method to get annotations.
370
- Rjb::Rjb_JavaProxy.class_eval do
371
-
372
- # Dynamically defined on all proxied annotation classes.
373
- # Get an annotation using the annotation bridge.
374
- def get(annotation, anno_base = nil)
375
- if !java_methods.include?('get(Ljava.lang.Class;)')
376
- raise'No annotation can be retrieved on this object.'
377
- else
378
- anno_class = "#{StanfordCoreNLP.camel_case(annotation)}Annotation"
379
- if anno_base
380
- raise "The path #{anno_base} doesn't exist." unless Annotations[anno_base]
381
- anno_bases = [anno_base]
382
- else
383
- anno_bases = AnnotationsByName[anno_class]
384
- raise "The annotation #{anno_class} doesn't exist." unless anno_bases
385
- end
386
- if anno_bases.size > 1
387
- msg = "There are many different annotations bearing the name #{anno_class}. "
388
- msg << "Please specify one of the following base classes as second parameter to disambiguate: "
389
- msg << anno_bases.join(',')
390
- raise msg
391
- else
392
- base_class = anno_bases[0]
393
- end
394
- url = "edu.stanford.#{base_class}$#{anno_class}"
395
- AnnotationBridge.getAnnotation(self, url)
396
- end
397
- end
398
-
399
- end
400
-
401
- end