stanford-core-nlp 0.5.0 → 0.5.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +17 -12
- data/lib/stanford-core-nlp.rb +16 -8
- metadata +32 -28
data/README.md
CHANGED
@@ -8,7 +8,7 @@ This gem is compatible with Ruby 1.9.2 and 1.9.3 as well as JRuby 1.7.1. It is t
|
|
8
8
|
|
9
9
|
**Installing**
|
10
10
|
|
11
|
-
First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files.
|
11
|
+
First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Two packages are available:
|
12
12
|
|
13
13
|
* A [minimal package](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with the default tagger and parser models for English, French and German.
|
14
14
|
* A [full package](http://louismullie.com/treat/stanford-core-nlp-full.zip), with all of the tagger and parser models for English, French and German, as well as named entity and coreference resolution models for English.
|
@@ -17,7 +17,7 @@ Place the contents of the extracted archive inside the /bin/ folder of the stanf
|
|
17
17
|
|
18
18
|
**Configuration**
|
19
19
|
|
20
|
-
|
20
|
+
You may want to set some optional configuration options. Here are some examples:
|
21
21
|
|
22
22
|
```ruby
|
23
23
|
# Set an alternative path to look for the JAR files
|
@@ -36,9 +36,6 @@ StanfordCoreNLP.jvm_args = ['-option1', '-option2']
|
|
36
36
|
# Redirect VM output to log.txt
|
37
37
|
StanfordCoreNLP.log_file = 'log.txt'
|
38
38
|
|
39
|
-
# Use the model files for a different language than English.
|
40
|
-
StanfordCoreNLP.use(:french) # or :german
|
41
|
-
|
42
39
|
# Change a specific model file.
|
43
40
|
StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
|
44
41
|
```
|
@@ -46,6 +43,9 @@ StanfordCoreNLP.set_model('pos.model', 'english-left3words-distsim.tagger')
|
|
46
43
|
**Using the gem**
|
47
44
|
|
48
45
|
```ruby
|
46
|
+
# Use the model files for a different language than English.
|
47
|
+
StanfordCoreNLP.use :french # or :german
|
48
|
+
|
49
49
|
text = 'Angela Merkel met Nicolas Sarkozy on January 25th in ' +
|
50
50
|
'Berlin to discuss a new austerity package. Sarkozy ' +
|
51
51
|
'looked pleased, but Merkel was dismayed.'
|
@@ -71,18 +71,22 @@ text.get(:sentences).each do |sentence|
|
|
71
71
|
puts token.get(:named_entity_tag).to_s
|
72
72
|
# Coreference
|
73
73
|
puts token.get(:coref_cluster_id).to_s
|
74
|
-
# Also of interest: coref, coref_chain,
|
74
|
+
# Also of interest: coref, coref_chain,
|
75
|
+
# coref_cluster, coref_dest, coref_graph.
|
75
76
|
end
|
76
77
|
end
|
77
78
|
```
|
78
79
|
|
79
80
|
> Important: You need to load the StanfordCoreNLP pipeline before using the StanfordCoreNLP::Annotation class.
|
80
81
|
|
81
|
-
|
82
|
+
The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class is the `snake_case` of the class name, with 'Annotation' at the end removed. For example, `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
|
83
|
+
|
84
|
+
A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the `config.rb` file inside the gem.
|
85
|
+
|
82
86
|
|
83
87
|
**Loading specific classes**
|
84
88
|
|
85
|
-
You may
|
89
|
+
You may want to load additional Java classes (including any class from the Stanford NLP packages). The gem provides an API for this:
|
86
90
|
|
87
91
|
```ruby
|
88
92
|
# Default base class is edu.stanford.nlp.pipeline.
|
@@ -120,9 +124,7 @@ Here is a full list of annotator classes provided by the Stanford Core NLP packa
|
|
120
124
|
Here is a full list of the default models for the Stanford Core NLP pipeline. You can change these models individually using `StanfordCoreNLP.set_model` (see above).
|
121
125
|
|
122
126
|
* 'pos.model' - 'english-left3words-distsim.tagger'
|
123
|
-
* 'ner.model
|
124
|
-
* 'ner.model.7class' - 'muc.7class.distsim.crf.ser.gz'
|
125
|
-
* 'ner.model.MISCclass' -- 'conll.4class.distsim.crf.ser.gz'
|
127
|
+
* 'ner.model' - 'all.3class.distsim.crf.ser.gz'
|
126
128
|
* 'parse.model' - 'englishPCFG.ser.gz'
|
127
129
|
* 'dcoref.demonym' - 'demonyms.txt'
|
128
130
|
* 'dcoref.animate' - 'animate.unigrams.txt'
|
@@ -137,4 +139,7 @@ Here is a full list of the default models for the Stanford Core NLP pipeline. Yo
|
|
137
139
|
|
138
140
|
**Contributing**
|
139
141
|
|
140
|
-
|
142
|
+
Simple.
|
143
|
+
|
144
|
+
1. Fork the project.
|
145
|
+
2. Send me a pull request!
|
data/lib/stanford-core-nlp.rb
CHANGED
@@ -2,7 +2,7 @@ require 'stanford-core-nlp/config'
|
|
2
2
|
|
3
3
|
module StanfordCoreNLP
|
4
4
|
|
5
|
-
VERSION = '0.5.
|
5
|
+
VERSION = '0.5.1'
|
6
6
|
|
7
7
|
require 'bind-it'
|
8
8
|
extend BindIt::Binding
|
@@ -44,6 +44,8 @@ module StanfordCoreNLP
|
|
44
44
|
['CoreLabel', 'edu.stanford.nlp.ling'],
|
45
45
|
['MaxentTagger', 'edu.stanford.nlp.tagger.maxent'],
|
46
46
|
['CRFClassifier', 'edu.stanford.nlp.ie.crf'],
|
47
|
+
['LexicalizedParser', 'edu.stanford.nlp.parser.lexparser'],
|
48
|
+
['Options', 'edu.stanford.nlp.parser.lexparser'],
|
47
49
|
['Properties', 'java.util'],
|
48
50
|
['ArrayList', 'java.util'],
|
49
51
|
['AnnotationBridge', '']
|
@@ -111,11 +113,8 @@ module StanfordCoreNLP
|
|
111
113
|
# Public API methods #
|
112
114
|
# ########################### #
|
113
115
|
|
114
|
-
|
115
|
-
|
116
|
-
# properties.
|
117
|
-
def self.load(*annotators)
|
118
|
-
|
116
|
+
def self.bind
|
117
|
+
|
119
118
|
# Take care of Windows users.
|
120
119
|
if self.running_on_windows?
|
121
120
|
self.jar_path.gsub!('/', '\\')
|
@@ -123,14 +122,23 @@ module StanfordCoreNLP
|
|
123
122
|
end
|
124
123
|
|
125
124
|
# Make the bindings.
|
126
|
-
|
125
|
+
super
|
127
126
|
|
128
127
|
# Bind annotation bridge.
|
129
128
|
self.default_classes.each do |info|
|
130
129
|
klass = const_get(info.first)
|
131
130
|
self.inject_get_method(klass)
|
132
131
|
end
|
133
|
-
|
132
|
+
|
133
|
+
end
|
134
|
+
|
135
|
+
# Load a StanfordCoreNLP pipeline with the
|
136
|
+
# specified JVM flags and StanfordCoreNLP
|
137
|
+
# properties.
|
138
|
+
def self.load(*annotators)
|
139
|
+
|
140
|
+
self.bind unless self.bound
|
141
|
+
|
134
142
|
# Prepend the JAR path to the model files.
|
135
143
|
properties = {}
|
136
144
|
self.model_files.each do |k,v|
|
metadata
CHANGED
@@ -1,87 +1,91 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: stanford-core-nlp
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.5.
|
5
|
-
prerelease:
|
4
|
+
version: 0.5.1
|
5
|
+
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- Louis Mullie
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date:
|
12
|
+
date: 2013-01-07 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bind-it
|
16
|
-
|
17
|
-
none: false
|
16
|
+
version_requirements: !ruby/object:Gem::Requirement
|
18
17
|
requirements:
|
19
18
|
- - ~>
|
20
19
|
- !ruby/object:Gem::Version
|
21
20
|
version: 0.2.5
|
22
|
-
type: :runtime
|
23
|
-
prerelease: false
|
24
|
-
version_requirements: !ruby/object:Gem::Requirement
|
25
21
|
none: false
|
22
|
+
requirement: !ruby/object:Gem::Requirement
|
26
23
|
requirements:
|
27
24
|
- - ~>
|
28
25
|
- !ruby/object:Gem::Version
|
29
26
|
version: 0.2.5
|
27
|
+
none: false
|
28
|
+
prerelease: false
|
29
|
+
type: :runtime
|
30
30
|
- !ruby/object:Gem::Dependency
|
31
31
|
name: rspec
|
32
|
-
|
33
|
-
none: false
|
32
|
+
version_requirements: !ruby/object:Gem::Requirement
|
34
33
|
requirements:
|
35
34
|
- - ! '>='
|
36
35
|
- !ruby/object:Gem::Version
|
37
|
-
version:
|
38
|
-
|
39
|
-
prerelease: false
|
40
|
-
version_requirements: !ruby/object:Gem::Requirement
|
36
|
+
version: !binary |-
|
37
|
+
MA==
|
41
38
|
none: false
|
39
|
+
requirement: !ruby/object:Gem::Requirement
|
42
40
|
requirements:
|
43
41
|
- - ! '>='
|
44
42
|
- !ruby/object:Gem::Version
|
45
|
-
version:
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
43
|
+
version: !binary |-
|
44
|
+
MA==
|
45
|
+
none: false
|
46
|
+
prerelease: false
|
47
|
+
type: :development
|
48
|
+
description: " High-level Ruby bindings to the Stanford CoreNLP package, a set natural\
|
49
|
+
\ language processing \ntools that provides tokenization, part-of-speech tagging\
|
50
|
+
\ and parsing for several languages, as well as named entity \nrecognition and coreference\
|
51
|
+
\ resolution for English. "
|
50
52
|
email:
|
51
53
|
- louis.mullie@gmail.com
|
52
54
|
executables: []
|
53
55
|
extensions: []
|
54
56
|
extra_rdoc_files: []
|
55
57
|
files:
|
58
|
+
- lib/stanford-core-nlp.rb
|
56
59
|
- lib/stanford-core-nlp/bridge.rb
|
57
60
|
- lib/stanford-core-nlp/config.rb
|
58
|
-
- lib/stanford-core-nlp.rb
|
59
61
|
- bin/AnnotationBridge.java
|
60
62
|
- bin/bridge.jar
|
61
63
|
- README.md
|
62
64
|
- LICENSE
|
63
65
|
homepage: https://github.com/louismullie/stanford-core-nlp
|
64
66
|
licenses: []
|
65
|
-
post_install_message:
|
67
|
+
post_install_message:
|
66
68
|
rdoc_options: []
|
67
69
|
require_paths:
|
68
70
|
- lib
|
69
71
|
required_ruby_version: !ruby/object:Gem::Requirement
|
70
|
-
none: false
|
71
72
|
requirements:
|
72
73
|
- - ! '>='
|
73
74
|
- !ruby/object:Gem::Version
|
74
|
-
version:
|
75
|
-
|
75
|
+
version: !binary |-
|
76
|
+
MA==
|
76
77
|
none: false
|
78
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
77
79
|
requirements:
|
78
80
|
- - ! '>='
|
79
81
|
- !ruby/object:Gem::Version
|
80
|
-
version:
|
82
|
+
version: !binary |-
|
83
|
+
MA==
|
84
|
+
none: false
|
81
85
|
requirements: []
|
82
|
-
rubyforge_project:
|
86
|
+
rubyforge_project:
|
83
87
|
rubygems_version: 1.8.24
|
84
|
-
signing_key:
|
88
|
+
signing_key:
|
85
89
|
specification_version: 3
|
86
90
|
summary: Ruby bindings to the Stanford Core NLP tools.
|
87
91
|
test_files: []
|