opener-tokenizer 1.0.2 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 68417ce5d0cd433b5d46037849fe28f2d9672352
4
- data.tar.gz: 481f30d8f1a16a929665b6895a7fe6d4689035b9
3
+ metadata.gz: 24743835d9c215785f3fcf7d17cd46dc1f9cdaba
4
+ data.tar.gz: 97944541373e69d385eb6f2ee1fa603c9041cbf9
5
5
  SHA512:
6
- metadata.gz: f6ade2200023fe2a04cb89f490abdcd87254df45b53124a6965336e3a11309c01b01593f497c783ead7a213d87d07c0adc315e6236944bbdc632c5186f2149b0
7
- data.tar.gz: 72f1d3c07c0fece79fb507cc7ad73d6cb0adab54982f1acceae57ddbabeaf6704d3a70975df184c414ef8ce00823486849f84f458a0064e0827e54edc5a24a64
6
+ metadata.gz: 13915496abaea92c1b5cd0f60d6de07c3e5428875e43bb1d948ad772251306049eb0ca1da6553cbe6cb3e10d92e5528473b49d088b8cc7303ea0b6d4e4bfaaeb
7
+ data.tar.gz: 8b9319a115bd2896a277ba0e845137c1574a6121c4bd3ce3f2ddde03266ed488a92cf68e84985e37bc3a9463adec0100c08122e9701b083833184ade5147c361
data/README.md CHANGED
@@ -5,11 +5,7 @@ The tokenizer tokenizes a text into sentences and words.
5
5
 
6
6
  ### Confused by some terminology?
7
7
 
8
- This software is part of a larger collection of natural language processing
9
- tools known as "the OpeNER project". You can find more information about the
10
- project at [the OpeNER portal](http://opener-project.github.io). There you can
11
- also find references to terms like KAF (an XML standard to represent linguistic
12
- annotations in texts), component, cores, scenario's and pipelines.
8
+ This software is part of a larger collection of natural language processing tools known as "the OpeNER project". You can find more information about the project at [the OpeNER portal](http://opener-project.github.io). There you can also find references to terms like KAF (an XML standard to represent linguistic annotations in texts), component, cores, scenario's and pipelines.
13
9
 
14
10
  Quick Use Example
15
11
  -----------------
@@ -53,8 +49,7 @@ Will result in
53
49
 
54
50
  #### KAF input format
55
51
 
56
- The tokenizer is capable of taking KAF as input, and actually does so by
57
- default. You can do so like this:
52
+ The tokenizer is capable of taking KAF as input, and actually does so by default. You can do so like this:
58
53
 
59
54
  echo "<?xml version='1.0' encoding='UTF-8' standalone='no'?><KAF version='v1.opener' xml:lang='en'><raw>This is what I call, a test!</raw></KAF>" | tokenizer
60
55
 
@@ -85,34 +80,27 @@ You can launch a language identification webservice by executing:
85
80
 
86
81
  tokenizer-server
87
82
 
88
- This will launch a mini webserver with the webservice. It defaults to port 9292,
89
- so you can access it at <http://localhost:9292>.
83
+ This will launch a mini webserver with the webservice. It defaults to port 9292, so you can access it at <http://localhost:9292>.
90
84
 
91
- To launch it on a different port provide the `-p [port-number]` option like
92
- this:
85
+ To launch it on a different port provide the `-p [port-number]` option like this:
93
86
 
94
87
  tokenizer-server -p 1234
95
88
 
96
89
  It then launches at <http://localhost:1234>
97
90
 
98
- Documentation on the Webservice is provided by surfing to the urls provided
99
- above. For more information on how to launch a webservice run the command with
100
- the ```-h``` option.
91
+ Documentation on the Webservice is provided by surfing to the urls provided above. For more information on how to launch a webservice run the command with the ```-h``` option.
101
92
 
102
93
 
103
94
  ### Daemon
104
95
 
105
- Last but not least the tokenizer comes shipped with a daemon that
106
- can read jobs (and write) jobs to and from Amazon SQS queues. For more
107
- information type:
96
+ Last but not least the tokenizer comes shipped with a daemon that can read jobs (and write) jobs to and from Amazon SQS queues. For more information type:
108
97
 
109
98
  tokenizer-daemon -h
110
99
 
111
100
  Description of dependencies
112
101
  ---------------------------
113
102
 
114
- This component runs best if you run it in an environment suited for OpeNER
115
- components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
103
+ This component runs best if you run it in an environment suited for OpeNER components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
116
104
 
117
105
  At least you need the following system setup:
118
106
 
@@ -147,8 +135,8 @@ Where to go from here
147
135
  Report problem/Get help
148
136
  -----------------------
149
137
 
150
- If you encounter problems, please email <support@opener-project.eu> or leave an
151
- issue in the [issue tracker](https://github.com/opener-project/tokenizer/issues).
138
+ If you encounter problems, please email <support@opener-project.eu> or leave an issue in the
139
+ [issue tracker](https://github.com/opener-project/tokenizer/issues).
152
140
 
153
141
 
154
142
  Contributing
@@ -88,13 +88,7 @@ Sample KAF syntax:
88
88
 
89
89
  stdout, stderr, process = tokenizer.run(input)
90
90
 
91
- if process.success?
92
- puts stdout
93
-
94
- STDERR.puts(stderr) unless stderr.empty?
95
- else
96
- abort stderr
97
- end
91
+ puts stdout
98
92
  end
99
93
 
100
94
  private
@@ -0,0 +1,86 @@
1
+ module Opener
2
+ class Tokenizer
3
+ ##
4
+ # Add Error Layer to KAF file instead of throwing an error.
5
+ #
6
+ class ErrorLayer
7
+ attr_accessor :input, :document, :error, :klass
8
+
9
+ def initialize(input, error, klass)
10
+ @input = input.to_s
11
+ # Make sure there is always a document, even if it is empty.
12
+ @document = Nokogiri::XML(input) rescue Nokogiri::XML(nil)
13
+ @error = error
14
+ @klass = klass
15
+ end
16
+
17
+ def add
18
+ if is_xml?
19
+ unless has_errors_layer?
20
+ add_errors_layer
21
+ end
22
+ else
23
+ add_root
24
+ add_text
25
+ add_errors_layer
26
+ end
27
+ add_error
28
+
29
+ return document.to_xml
30
+ end
31
+
32
+ ##
33
+ # Check if the document is a valid XML file.
34
+ #
35
+ def is_xml?
36
+ !!document.root
37
+ end
38
+
39
+ ##
40
+ # Add root element to the XML file.
41
+ #
42
+ def add_root
43
+ root = Nokogiri::XML::Node.new "KAF", document
44
+ document.add_child(root)
45
+ end
46
+
47
+ ##
48
+ # Check if the document already has an errors layer.
49
+ #
50
+ def has_errors_layer?
51
+ !!document.at('errors')
52
+ end
53
+
54
+ ##
55
+ # Add errors element to the XML file.
56
+ #
57
+ def add_errors_layer
58
+ node = Nokogiri::XML::Node.new "errors", document
59
+ document.root.add_child(node)
60
+ end
61
+
62
+ ##
63
+ # Add the text file incase it is not a valid XML document. More
64
+ # info for debugging.
65
+ #
66
+ def add_text
67
+ node = Nokogiri::XML::Node.new "raw", document
68
+ node.inner_html = input
69
+ document.root.add_child(node)
70
+
71
+ end
72
+
73
+ ##
74
+ # Add the actual error to the errors layer.
75
+ #
76
+ def add_error
77
+ node = document.at('errors')
78
+ error_node = Nokogiri::XML::Node.new "error", node
79
+ error_node['class'] = "#{klass.to_s} #{klass::VERSION}"
80
+ error_node.inner_html = error
81
+ node.add_child(error_node)
82
+ end
83
+
84
+ end # ErrorLayer
85
+ end # Tokenizer
86
+ end # Opener
@@ -1,5 +1,5 @@
1
1
  module Opener
2
2
  class Tokenizer
3
- VERSION = "1.0.2"
3
+ VERSION = "1.0.3"
4
4
  end
5
5
  end
@@ -5,6 +5,7 @@ require 'optparse'
5
5
 
6
6
  require_relative 'tokenizer/version'
7
7
  require_relative 'tokenizer/cli'
8
+ require_relative 'tokenizer/error_layer'
8
9
 
9
10
  module Opener
10
11
  ##
@@ -57,20 +58,19 @@ module Opener
57
58
  # @return [Array]
58
59
  #
59
60
  def run(input)
60
-
61
- if options[:kaf]
62
- language, input = kaf_elements(input)
63
- else
64
- language = options[:language]
65
- end
66
-
67
- unless valid_language?(language)
68
- raise ArgumentError, "The specified language (#{language}) is invalid"
61
+ begin
62
+ if options[:kaf]
63
+ language, input = kaf_elements(input)
64
+ else
65
+ language = options[:language]
66
+ end
67
+
68
+ kernel = language_constant(language).new(:args => options[:args])
69
+
70
+ return Open3.capture3(*kernel.command.split(" "), :stdin_data => input)
71
+ rescue Exception => error
72
+ return ErrorLayer.new(input, error.message, self.class).add
69
73
  end
70
-
71
- kernel = language_constant(language).new(:args => options[:args])
72
-
73
- return Open3.capture3(*kernel.command.split(" "), :stdin_data => input)
74
74
  end
75
75
 
76
76
  alias tokenize run
metadata CHANGED
@@ -1,171 +1,171 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opener-tokenizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.2
4
+ version: 1.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - development@olery.com
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-23 00:00:00.000000000 Z
11
+ date: 2014-06-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: opener-tokenizer-base
15
- requirement: !ruby/object:Gem::Requirement
15
+ version_requirements: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ">="
17
+ - - '>='
18
18
  - !ruby/object:Gem::Version
19
19
  version: 0.3.1
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
20
+ requirement: !ruby/object:Gem::Requirement
23
21
  requirements:
24
- - - ">="
22
+ - - '>='
25
23
  - !ruby/object:Gem::Version
26
24
  version: 0.3.1
25
+ prerelease: false
26
+ type: :runtime
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: opener-webservice
29
+ version_requirements: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
29
34
  requirement: !ruby/object:Gem::Requirement
30
35
  requirements:
31
- - - ">="
36
+ - - '>='
32
37
  - !ruby/object:Gem::Version
33
38
  version: '0'
34
- type: :runtime
35
39
  prerelease: false
40
+ type: :runtime
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
36
43
  version_requirements: !ruby/object:Gem::Requirement
37
44
  requirements:
38
- - - ">="
45
+ - - '>='
39
46
  - !ruby/object:Gem::Version
40
47
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: nokogiri
43
48
  requirement: !ruby/object:Gem::Requirement
44
49
  requirements:
45
- - - ">="
50
+ - - '>='
46
51
  - !ruby/object:Gem::Version
47
52
  version: '0'
48
- type: :runtime
49
53
  prerelease: false
54
+ type: :runtime
55
+ - !ruby/object:Gem::Dependency
56
+ name: sinatra
50
57
  version_requirements: !ruby/object:Gem::Requirement
51
58
  requirements:
52
- - - ">="
59
+ - - ~>
53
60
  - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: sinatra
61
+ version: 1.4.2
57
62
  requirement: !ruby/object:Gem::Requirement
58
63
  requirements:
59
- - - "~>"
64
+ - - ~>
60
65
  - !ruby/object:Gem::Version
61
66
  version: 1.4.2
62
- type: :runtime
63
67
  prerelease: false
68
+ type: :runtime
69
+ - !ruby/object:Gem::Dependency
70
+ name: httpclient
64
71
  version_requirements: !ruby/object:Gem::Requirement
65
72
  requirements:
66
- - - "~>"
73
+ - - '>='
67
74
  - !ruby/object:Gem::Version
68
- version: 1.4.2
69
- - !ruby/object:Gem::Dependency
70
- name: httpclient
75
+ version: '0'
71
76
  requirement: !ruby/object:Gem::Requirement
72
77
  requirements:
73
- - - ">="
78
+ - - '>='
74
79
  - !ruby/object:Gem::Version
75
80
  version: '0'
76
- type: :runtime
77
81
  prerelease: false
82
+ type: :runtime
83
+ - !ruby/object:Gem::Dependency
84
+ name: opener-daemons
78
85
  version_requirements: !ruby/object:Gem::Requirement
79
86
  requirements:
80
- - - ">="
87
+ - - '>='
81
88
  - !ruby/object:Gem::Version
82
89
  version: '0'
83
- - !ruby/object:Gem::Dependency
84
- name: opener-daemons
85
90
  requirement: !ruby/object:Gem::Requirement
86
91
  requirements:
87
- - - ">="
92
+ - - '>='
88
93
  - !ruby/object:Gem::Version
89
94
  version: '0'
90
- type: :runtime
91
95
  prerelease: false
96
+ type: :runtime
97
+ - !ruby/object:Gem::Dependency
98
+ name: puma
92
99
  version_requirements: !ruby/object:Gem::Requirement
93
100
  requirements:
94
- - - ">="
101
+ - - '>='
95
102
  - !ruby/object:Gem::Version
96
103
  version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: puma
99
104
  requirement: !ruby/object:Gem::Requirement
100
105
  requirements:
101
- - - ">="
106
+ - - '>='
102
107
  - !ruby/object:Gem::Version
103
108
  version: '0'
104
- type: :runtime
105
109
  prerelease: false
110
+ type: :runtime
111
+ - !ruby/object:Gem::Dependency
112
+ name: rspec
106
113
  version_requirements: !ruby/object:Gem::Requirement
107
114
  requirements:
108
- - - ">="
115
+ - - '>='
109
116
  - !ruby/object:Gem::Version
110
117
  version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: rspec
113
118
  requirement: !ruby/object:Gem::Requirement
114
119
  requirements:
115
- - - ">="
120
+ - - '>='
116
121
  - !ruby/object:Gem::Version
117
122
  version: '0'
118
- type: :development
119
123
  prerelease: false
124
+ type: :development
125
+ - !ruby/object:Gem::Dependency
126
+ name: cucumber
120
127
  version_requirements: !ruby/object:Gem::Requirement
121
128
  requirements:
122
- - - ">="
129
+ - - '>='
123
130
  - !ruby/object:Gem::Version
124
131
  version: '0'
125
- - !ruby/object:Gem::Dependency
126
- name: cucumber
127
132
  requirement: !ruby/object:Gem::Requirement
128
133
  requirements:
129
- - - ">="
134
+ - - '>='
130
135
  - !ruby/object:Gem::Version
131
136
  version: '0'
132
- type: :development
133
137
  prerelease: false
138
+ type: :development
139
+ - !ruby/object:Gem::Dependency
140
+ name: pry
134
141
  version_requirements: !ruby/object:Gem::Requirement
135
142
  requirements:
136
- - - ">="
143
+ - - '>='
137
144
  - !ruby/object:Gem::Version
138
145
  version: '0'
139
- - !ruby/object:Gem::Dependency
140
- name: pry
141
146
  requirement: !ruby/object:Gem::Requirement
142
147
  requirements:
143
- - - ">="
148
+ - - '>='
144
149
  - !ruby/object:Gem::Version
145
150
  version: '0'
146
- type: :development
147
151
  prerelease: false
152
+ type: :development
153
+ - !ruby/object:Gem::Dependency
154
+ name: rake
148
155
  version_requirements: !ruby/object:Gem::Requirement
149
156
  requirements:
150
- - - ">="
157
+ - - '>='
151
158
  - !ruby/object:Gem::Version
152
159
  version: '0'
153
- - !ruby/object:Gem::Dependency
154
- name: rake
155
160
  requirement: !ruby/object:Gem::Requirement
156
161
  requirements:
157
- - - ">="
162
+ - - '>='
158
163
  - !ruby/object:Gem::Version
159
164
  version: '0'
160
- type: :development
161
165
  prerelease: false
162
- version_requirements: !ruby/object:Gem::Requirement
163
- requirements:
164
- - - ">="
165
- - !ruby/object:Gem::Version
166
- version: '0'
166
+ type: :development
167
167
  description: Gem that wraps up the the tokenizer cores
168
- email:
168
+ email:
169
169
  executables:
170
170
  - tokenizer
171
171
  - tokenizer-daemon
@@ -181,6 +181,7 @@ files:
181
181
  - exec/tokenizer.rb
182
182
  - lib/opener/tokenizer.rb
183
183
  - lib/opener/tokenizer/cli.rb
184
+ - lib/opener/tokenizer/error_layer.rb
184
185
  - lib/opener/tokenizer/public/markdown.css
185
186
  - lib/opener/tokenizer/server.rb
186
187
  - lib/opener/tokenizer/version.rb
@@ -190,25 +191,24 @@ files:
190
191
  homepage: http://opener-project.github.com/
191
192
  licenses: []
192
193
  metadata: {}
193
- post_install_message:
194
+ post_install_message:
194
195
  rdoc_options: []
195
196
  require_paths:
196
197
  - lib
197
198
  required_ruby_version: !ruby/object:Gem::Requirement
198
199
  requirements:
199
- - - ">="
200
+ - - '>='
200
201
  - !ruby/object:Gem::Version
201
202
  version: 1.9.2
202
203
  required_rubygems_version: !ruby/object:Gem::Requirement
203
204
  requirements:
204
- - - ">="
205
+ - - '>='
205
206
  - !ruby/object:Gem::Version
206
207
  version: '0'
207
208
  requirements: []
208
- rubyforge_project:
209
+ rubyforge_project:
209
210
  rubygems_version: 2.2.2
210
- signing_key:
211
+ signing_key:
211
212
  specification_version: 4
212
213
  summary: Gem that wraps up the the tokenizer cores
213
214
  test_files: []
214
- has_rdoc: yard