corenlp 0.0.4 → 0.0.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +2 -1
- data/lib/corenlp.rb +3 -3
- data/lib/corenlp/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 13fb6d3e676a78f59359715c392e857d4836198e
|
4
|
+
data.tar.gz: ee58cbc6af1c899a1ef7cb070e2bad0c939c3765
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 649e14c7fc8936da85e307bde80eaf1b3615c87faa61b2137a8a673aafd8dfea18b8ca300851fb6107b55519edb7dd42782213272e9585553d77878a5674f53b
|
7
|
+
data.tar.gz: c0fc54d340e2d2c03067066989d9a9b70e866c49f5f5250c1512735ab2b8bc742d0d894b8c7edc3504edd4f00d7b947135e0e9f5e8998a19e21e1b63df81278e
|
data/README.md
CHANGED
@@ -38,11 +38,12 @@ The following code will build up a treebank structure for the raw text "Put the
|
|
38
38
|
|
39
39
|
## Options
|
40
40
|
|
41
|
-
The Treebank object can be
|
41
|
+
The Treebank object can be initialized with various options.
|
42
42
|
|
43
43
|
* `java_max_memory` - set to 3GB by default. This can be customized via the Treebank initializer to be `-Xmx2g`, which would use a max of 2GB of memory, for example.
|
44
44
|
* `threads_to_use` - number of threads Stanford CoreNLP uses to parse text. This is set to 4 by default. This option is passed to the Java executable.
|
45
45
|
* `output_directory` - by default this is `./tmp/language_processing`, which already exists. This is where Stanford CoreNLP XML files are placed. These XML files represented the structured parser output.
|
46
|
+
* `deps_dir` - the directory where the Stanford CoreNLP dependencies files are. By default this is './lib/ext`.
|
46
47
|
|
47
48
|
## Tests
|
48
49
|
|
data/lib/corenlp.rb
CHANGED
@@ -4,7 +4,7 @@ Bundler.require
|
|
4
4
|
|
5
5
|
module Corenlp
|
6
6
|
class Treebank
|
7
|
-
attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences
|
7
|
+
attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences, :deps_dir
|
8
8
|
|
9
9
|
def initialize(attrs = {})
|
10
10
|
self.raw_text = attrs[:raw_text] || ""
|
@@ -15,6 +15,7 @@ module Corenlp
|
|
15
15
|
self.threads_to_use = attrs[:threads_to_use] || 4
|
16
16
|
self.java_max_memory = attrs[:java_max_memory] || "-Xmx3g"
|
17
17
|
self.sentences = []
|
18
|
+
self.deps_dir = attrs[:deps_dir] || "./lib/ext"
|
18
19
|
end
|
19
20
|
|
20
21
|
def write_output_file_and_summary_file
|
@@ -25,8 +26,7 @@ module Corenlp
|
|
25
26
|
end
|
26
27
|
|
27
28
|
def process_files_with_stanford_corenlp
|
28
|
-
|
29
|
-
classpath = "#{deps}/stanford-corenlp-3.4.jar:#{deps}/stanford-corenlp-3.4-models.jar:#{deps}/xom.jar:#{deps}/joda-time.jar:#{deps}/jollyday.jar:#{deps}/ejml-0.23.jar"
|
29
|
+
classpath = "#{deps_dir}/stanford-corenlp-3.4.jar:#{deps_dir}/stanford-corenlp-3.4-models.jar:#{deps_dir}/xom.jar:#{deps_dir}/joda-time.jar:#{deps_dir}/jollyday.jar:#{deps_dir}/ejml-0.23.jar"
|
30
30
|
stanford_bin = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
|
31
31
|
annotators = "tokenize,ssplit,pos,lemma,parse,ner"
|
32
32
|
|
data/lib/corenlp/version.rb
CHANGED