corenlp 0.0.4 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a88d19d7dc8eae9e7df59d4fe9b1e0c492aa194f
4
- data.tar.gz: 5c6a32994f6720210b7a909c5b839503530793b3
3
+ metadata.gz: 13fb6d3e676a78f59359715c392e857d4836198e
4
+ data.tar.gz: ee58cbc6af1c899a1ef7cb070e2bad0c939c3765
5
5
  SHA512:
6
- metadata.gz: b5f185cda3feb604e97e5682440a01f763773e80631bf5a68aa19a1a35e2874999770dd8707af2d74cb9baf630491219fdba498284f37a784813510d48c6549f
7
- data.tar.gz: 99cfc0054a47e92c517b6ba9025e063d6fb316a2788acf9255079ab828a7f046ff82057fee9b2b1a154127f7825ce6abcef0eae52a08e59175c0d94bcb58233b
6
+ metadata.gz: 649e14c7fc8936da85e307bde80eaf1b3615c87faa61b2137a8a673aafd8dfea18b8ca300851fb6107b55519edb7dd42782213272e9585553d77878a5674f53b
7
+ data.tar.gz: c0fc54d340e2d2c03067066989d9a9b70e866c49f5f5250c1512735ab2b8bc742d0d894b8c7edc3504edd4f00d7b947135e0e9f5e8998a19e21e1b63df81278e
data/README.md CHANGED
@@ -38,11 +38,12 @@ The following code will build up a treebank structure for the raw text "Put the
38
38
 
39
39
  ## Options
40
40
 
41
- The Treebank object can be initialize with various options.
41
+ The Treebank object can be initialized with various options.
42
42
 
43
43
  * `java_max_memory` - set to 3GB by default. This can be customized via the Treebank initializer to be `-Xmx2g`, which would use a max of 2GB of memory, for example.
44
44
  * `threads_to_use` - number of threads Stanford CoreNLP uses to parse text. This is set to 4 by default. This option is passed to the Java executable.
45
45
  * `output_directory` - by default this is `./tmp/language_processing`, which already exists. This is where Stanford CoreNLP XML files are placed. These XML files represented the structured parser output.
46
+ * `deps_dir` - the directory where the Stanford CoreNLP dependencies files are. By default this is './lib/ext`.
46
47
 
47
48
  ## Tests
48
49
 
data/lib/corenlp.rb CHANGED
@@ -4,7 +4,7 @@ Bundler.require
4
4
 
5
5
  module Corenlp
6
6
  class Treebank
7
- attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences
7
+ attr_accessor :raw_text, :filenames, :output_directory, :summary_file, :threads_to_use, :java_max_memory, :sentences, :deps_dir
8
8
 
9
9
  def initialize(attrs = {})
10
10
  self.raw_text = attrs[:raw_text] || ""
@@ -15,6 +15,7 @@ module Corenlp
15
15
  self.threads_to_use = attrs[:threads_to_use] || 4
16
16
  self.java_max_memory = attrs[:java_max_memory] || "-Xmx3g"
17
17
  self.sentences = []
18
+ self.deps_dir = attrs[:deps_dir] || "./lib/ext"
18
19
  end
19
20
 
20
21
  def write_output_file_and_summary_file
@@ -25,8 +26,7 @@ module Corenlp
25
26
  end
26
27
 
27
28
  def process_files_with_stanford_corenlp
28
- deps = "./lib/ext" # dependencies directory: JARs, model files, taggers, etc.
29
- classpath = "#{deps}/stanford-corenlp-3.4.jar:#{deps}/stanford-corenlp-3.4-models.jar:#{deps}/xom.jar:#{deps}/joda-time.jar:#{deps}/jollyday.jar:#{deps}/ejml-0.23.jar"
29
+ classpath = "#{deps_dir}/stanford-corenlp-3.4.jar:#{deps_dir}/stanford-corenlp-3.4-models.jar:#{deps_dir}/xom.jar:#{deps_dir}/joda-time.jar:#{deps_dir}/jollyday.jar:#{deps_dir}/ejml-0.23.jar"
30
30
  stanford_bin = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
31
31
  annotators = "tokenize,ssplit,pos,lemma,parse,ner"
32
32
 
@@ -1,3 +1,3 @@
1
1
  module Corenlp
2
- VERSION = "0.0.4"
2
+ VERSION = "0.0.5"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: corenlp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Lengio Corporation