corenlp 0.0.4 → 0.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +2 -1
- data/lib/corenlp.rb +3 -3
- data/lib/corenlp/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 13fb6d3e676a78f59359715c392e857d4836198e
|
4
|
+
data.tar.gz: ee58cbc6af1c899a1ef7cb070e2bad0c939c3765
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 649e14c7fc8936da85e307bde80eaf1b3615c87faa61b2137a8a673aafd8dfea18b8ca300851fb6107b55519edb7dd42782213272e9585553d77878a5674f53b
|
7
|
+
data.tar.gz: c0fc54d340e2d2c03067066989d9a9b70e866c49f5f5250c1512735ab2b8bc742d0d894b8c7edc3504edd4f00d7b947135e0e9f5e8998a19e21e1b63df81278e
|
data/README.md
CHANGED
@@ -38,11 +38,12 @@ The following code will build up a treebank structure for the raw text "Put the
|
|
38
38
|
|
39
39
|
## Options
|
40
40
|
|
41
|
-
The Treebank object can be
|
41
|
+
The Treebank object can be initialized with various options.
|
42
42
|
|
43
43
|
* `java_max_memory` - set to 3GB by default. This can be customized via the Treebank initializer to be `-Xmx2g`, which would use a max of 2GB of memory, for example.
|
44
44
|
* `threads_to_use` - number of threads Stanford CoreNLP uses to parse text. This is set to 4 by default. This option is passed to the Java executable.
|
45
45
|
* `output_directory` - by default this is `./tmp/language_processing`, which already exists. This is where Stanford CoreNLP XML files are placed. These XML files represented the structured parser output.
|
46
|
+
* `deps_dir` - the directory where the Stanford CoreNLP dependencies files are. By default this is './lib/ext`.
|
46
47
|
|
47
48
|
## Tests
|
48
49
|
|
data/lib/corenlp.rb
CHANGED
@@ -4,7 +4,7 @@ Bundler.require
|
|
4
4
|
|
5
5
|
module Corenlp
|
6
6
|
class Treebank
|
7
|
-
attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences
|
7
|
+
attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences, :deps_dir
|
8
8
|
|
9
9
|
def initialize(attrs = {})
|
10
10
|
self.raw_text = attrs[:raw_text] || ""
|
@@ -15,6 +15,7 @@ module Corenlp
|
|
15
15
|
self.threads_to_use = attrs[:threads_to_use] || 4
|
16
16
|
self.java_max_memory = attrs[:java_max_memory] || "-Xmx3g"
|
17
17
|
self.sentences = []
|
18
|
+
self.deps_dir = attrs[:deps_dir] || "./lib/ext"
|
18
19
|
end
|
19
20
|
|
20
21
|
def write_output_file_and_summary_file
|
@@ -25,8 +26,7 @@ module Corenlp
|
|
25
26
|
end
|
26
27
|
|
27
28
|
def process_files_with_stanford_corenlp
|
28
|
-
|
29
|
-
classpath = "#{deps}/stanford-corenlp-3.4.jar:#{deps}/stanford-corenlp-3.4-models.jar:#{deps}/xom.jar:#{deps}/joda-time.jar:#{deps}/jollyday.jar:#{deps}/ejml-0.23.jar"
|
29
|
+
classpath = "#{deps_dir}/stanford-corenlp-3.4.jar:#{deps_dir}/stanford-corenlp-3.4-models.jar:#{deps_dir}/xom.jar:#{deps_dir}/joda-time.jar:#{deps_dir}/jollyday.jar:#{deps_dir}/ejml-0.23.jar"
|
30
30
|
stanford_bin = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
|
31
31
|
annotators = "tokenize,ssplit,pos,lemma,parse,ner"
|
32
32
|
|
data/lib/corenlp/version.rb
CHANGED