opener-tokenizer 1.0.2 → 1.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 68417ce5d0cd433b5d46037849fe28f2d9672352
4
- data.tar.gz: 481f30d8f1a16a929665b6895a7fe6d4689035b9
3
+ metadata.gz: 24743835d9c215785f3fcf7d17cd46dc1f9cdaba
4
+ data.tar.gz: 97944541373e69d385eb6f2ee1fa603c9041cbf9
5
5
  SHA512:
6
- metadata.gz: f6ade2200023fe2a04cb89f490abdcd87254df45b53124a6965336e3a11309c01b01593f497c783ead7a213d87d07c0adc315e6236944bbdc632c5186f2149b0
7
- data.tar.gz: 72f1d3c07c0fece79fb507cc7ad73d6cb0adab54982f1acceae57ddbabeaf6704d3a70975df184c414ef8ce00823486849f84f458a0064e0827e54edc5a24a64
6
+ metadata.gz: 13915496abaea92c1b5cd0f60d6de07c3e5428875e43bb1d948ad772251306049eb0ca1da6553cbe6cb3e10d92e5528473b49d088b8cc7303ea0b6d4e4bfaaeb
7
+ data.tar.gz: 8b9319a115bd2896a277ba0e845137c1574a6121c4bd3ce3f2ddde03266ed488a92cf68e84985e37bc3a9463adec0100c08122e9701b083833184ade5147c361
data/README.md CHANGED
@@ -5,11 +5,7 @@ The tokenizer tokenizes a text into sentences and words.
5
5
 
6
6
  ### Confused by some terminology?
7
7
 
8
- This software is part of a larger collection of natural language processing
9
- tools known as "the OpeNER project". You can find more information about the
10
- project at [the OpeNER portal](http://opener-project.github.io). There you can
11
- also find references to terms like KAF (an XML standard to represent linguistic
12
- annotations in texts), component, cores, scenario's and pipelines.
8
+ This software is part of a larger collection of natural language processing tools known as "the OpeNER project". You can find more information about the project at [the OpeNER portal](http://opener-project.github.io). There you can also find references to terms like KAF (an XML standard to represent linguistic annotations in texts), component, cores, scenario's and pipelines.
13
9
 
14
10
  Quick Use Example
15
11
  -----------------
@@ -53,8 +49,7 @@ Will result in
53
49
 
54
50
  #### KAF input format
55
51
 
56
- The tokenizer is capable of taking KAF as input, and actually does so by
57
- default. You can do so like this:
52
+ The tokenizer is capable of taking KAF as input, and actually does so by default. You can do so like this:
58
53
 
59
54
  echo "<?xml version='1.0' encoding='UTF-8' standalone='no'?><KAF version='v1.opener' xml:lang='en'><raw>This is what I call, a test!</raw></KAF>" | tokenizer
60
55
 
@@ -85,34 +80,27 @@ You can launch a language identification webservice by executing:
85
80
 
86
81
  tokenizer-server
87
82
 
88
- This will launch a mini webserver with the webservice. It defaults to port 9292,
89
- so you can access it at <http://localhost:9292>.
83
+ This will launch a mini webserver with the webservice. It defaults to port 9292, so you can access it at <http://localhost:9292>.
90
84
 
91
- To launch it on a different port provide the `-p [port-number]` option like
92
- this:
85
+ To launch it on a different port provide the `-p [port-number]` option like this:
93
86
 
94
87
  tokenizer-server -p 1234
95
88
 
96
89
  It then launches at <http://localhost:1234>
97
90
 
98
- Documentation on the Webservice is provided by surfing to the urls provided
99
- above. For more information on how to launch a webservice run the command with
100
- the ```-h``` option.
91
+ Documentation on the Webservice is provided by surfing to the urls provided above. For more information on how to launch a webservice run the command with the ```-h``` option.
101
92
 
102
93
 
103
94
  ### Daemon
104
95
 
105
- Last but not least the tokenizer comes shipped with a daemon that
106
- can read jobs (and write) jobs to and from Amazon SQS queues. For more
107
- information type:
96
+ Last but not least the tokenizer comes shipped with a daemon that can read jobs (and write) jobs to and from Amazon SQS queues. For more information type:
108
97
 
109
98
  tokenizer-daemon -h
110
99
 
111
100
  Description of dependencies
112
101
  ---------------------------
113
102
 
114
- This component runs best if you run it in an environment suited for OpeNER
115
- components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
103
+ This component runs best if you run it in an environment suited for OpeNER components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
116
104
 
117
105
  At least you need the following system setup:
118
106
 
@@ -147,8 +135,8 @@ Where to go from here
147
135
  Report problem/Get help
148
136
  -----------------------
149
137
 
150
- If you encounter problems, please email <support@opener-project.eu> or leave an
151
- issue in the [issue tracker](https://github.com/opener-project/tokenizer/issues).
138
+ If you encounter problems, please email <support@opener-project.eu> or leave an issue in the
139
+ [issue tracker](https://github.com/opener-project/tokenizer/issues).
152
140
 
153
141
 
154
142
  Contributing
@@ -88,13 +88,7 @@ Sample KAF syntax:
88
88
 
89
89
  stdout, stderr, process = tokenizer.run(input)
90
90
 
91
- if process.success?
92
- puts stdout
93
-
94
- STDERR.puts(stderr) unless stderr.empty?
95
- else
96
- abort stderr
97
- end
91
+ puts stdout
98
92
  end
99
93
 
100
94
  private
@@ -0,0 +1,86 @@
1
+ module Opener
2
+ class Tokenizer
3
+ ##
4
+ # Add Error Layer to KAF file instead of throwing an error.
5
+ #
6
+ class ErrorLayer
7
+ attr_accessor :input, :document, :error, :klass
8
+
9
+ def initialize(input, error, klass)
10
+ @input = input.to_s
11
+ # Make sure there is always a document, even if it is empty.
12
+ @document = Nokogiri::XML(input) rescue Nokogiri::XML(nil)
13
+ @error = error
14
+ @klass = klass
15
+ end
16
+
17
+ def add
18
+ if is_xml?
19
+ unless has_errors_layer?
20
+ add_errors_layer
21
+ end
22
+ else
23
+ add_root
24
+ add_text
25
+ add_errors_layer
26
+ end
27
+ add_error
28
+
29
+ return document.to_xml
30
+ end
31
+
32
+ ##
33
+ # Check if the document is a valid XML file.
34
+ #
35
+ def is_xml?
36
+ !!document.root
37
+ end
38
+
39
+ ##
40
+ # Add root element to the XML file.
41
+ #
42
+ def add_root
43
+ root = Nokogiri::XML::Node.new "KAF", document
44
+ document.add_child(root)
45
+ end
46
+
47
+ ##
48
+ # Check if the document already has an errors layer.
49
+ #
50
+ def has_errors_layer?
51
+ !!document.at('errors')
52
+ end
53
+
54
+ ##
55
+ # Add errors element to the XML file.
56
+ #
57
+ def add_errors_layer
58
+ node = Nokogiri::XML::Node.new "errors", document
59
+ document.root.add_child(node)
60
+ end
61
+
62
+ ##
63
+ # Add the text file incase it is not a valid XML document. More
64
+ # info for debugging.
65
+ #
66
+ def add_text
67
+ node = Nokogiri::XML::Node.new "raw", document
68
+ node.inner_html = input
69
+ document.root.add_child(node)
70
+
71
+ end
72
+
73
+ ##
74
+ # Add the actual error to the errors layer.
75
+ #
76
+ def add_error
77
+ node = document.at('errors')
78
+ error_node = Nokogiri::XML::Node.new "error", node
79
+ error_node['class'] = "#{klass.to_s} #{klass::VERSION}"
80
+ error_node.inner_html = error
81
+ node.add_child(error_node)
82
+ end
83
+
84
+ end # ErrorLayer
85
+ end # Tokenizer
86
+ end # Opener
@@ -1,5 +1,5 @@
1
1
  module Opener
2
2
  class Tokenizer
3
- VERSION = "1.0.2"
3
+ VERSION = "1.0.3"
4
4
  end
5
5
  end
@@ -5,6 +5,7 @@ require 'optparse'
5
5
 
6
6
  require_relative 'tokenizer/version'
7
7
  require_relative 'tokenizer/cli'
8
+ require_relative 'tokenizer/error_layer'
8
9
 
9
10
  module Opener
10
11
  ##
@@ -57,20 +58,19 @@ module Opener
57
58
  # @return [Array]
58
59
  #
59
60
  def run(input)
60
-
61
- if options[:kaf]
62
- language, input = kaf_elements(input)
63
- else
64
- language = options[:language]
65
- end
66
-
67
- unless valid_language?(language)
68
- raise ArgumentError, "The specified language (#{language}) is invalid"
61
+ begin
62
+ if options[:kaf]
63
+ language, input = kaf_elements(input)
64
+ else
65
+ language = options[:language]
66
+ end
67
+
68
+ kernel = language_constant(language).new(:args => options[:args])
69
+
70
+ return Open3.capture3(*kernel.command.split(" "), :stdin_data => input)
71
+ rescue Exception => error
72
+ return ErrorLayer.new(input, error.message, self.class).add
69
73
  end
70
-
71
- kernel = language_constant(language).new(:args => options[:args])
72
-
73
- return Open3.capture3(*kernel.command.split(" "), :stdin_data => input)
74
74
  end
75
75
 
76
76
  alias tokenize run
metadata CHANGED
@@ -1,171 +1,171 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opener-tokenizer
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.2
4
+ version: 1.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - development@olery.com
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-23 00:00:00.000000000 Z
11
+ date: 2014-06-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: opener-tokenizer-base
15
- requirement: !ruby/object:Gem::Requirement
15
+ version_requirements: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ">="
17
+ - - '>='
18
18
  - !ruby/object:Gem::Version
19
19
  version: 0.3.1
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
20
+ requirement: !ruby/object:Gem::Requirement
23
21
  requirements:
24
- - - ">="
22
+ - - '>='
25
23
  - !ruby/object:Gem::Version
26
24
  version: 0.3.1
25
+ prerelease: false
26
+ type: :runtime
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: opener-webservice
29
+ version_requirements: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
29
34
  requirement: !ruby/object:Gem::Requirement
30
35
  requirements:
31
- - - ">="
36
+ - - '>='
32
37
  - !ruby/object:Gem::Version
33
38
  version: '0'
34
- type: :runtime
35
39
  prerelease: false
40
+ type: :runtime
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
36
43
  version_requirements: !ruby/object:Gem::Requirement
37
44
  requirements:
38
- - - ">="
45
+ - - '>='
39
46
  - !ruby/object:Gem::Version
40
47
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: nokogiri
43
48
  requirement: !ruby/object:Gem::Requirement
44
49
  requirements:
45
- - - ">="
50
+ - - '>='
46
51
  - !ruby/object:Gem::Version
47
52
  version: '0'
48
- type: :runtime
49
53
  prerelease: false
54
+ type: :runtime
55
+ - !ruby/object:Gem::Dependency
56
+ name: sinatra
50
57
  version_requirements: !ruby/object:Gem::Requirement
51
58
  requirements:
52
- - - ">="
59
+ - - ~>
53
60
  - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: sinatra
61
+ version: 1.4.2
57
62
  requirement: !ruby/object:Gem::Requirement
58
63
  requirements:
59
- - - "~>"
64
+ - - ~>
60
65
  - !ruby/object:Gem::Version
61
66
  version: 1.4.2
62
- type: :runtime
63
67
  prerelease: false
68
+ type: :runtime
69
+ - !ruby/object:Gem::Dependency
70
+ name: httpclient
64
71
  version_requirements: !ruby/object:Gem::Requirement
65
72
  requirements:
66
- - - "~>"
73
+ - - '>='
67
74
  - !ruby/object:Gem::Version
68
- version: 1.4.2
69
- - !ruby/object:Gem::Dependency
70
- name: httpclient
75
+ version: '0'
71
76
  requirement: !ruby/object:Gem::Requirement
72
77
  requirements:
73
- - - ">="
78
+ - - '>='
74
79
  - !ruby/object:Gem::Version
75
80
  version: '0'
76
- type: :runtime
77
81
  prerelease: false
82
+ type: :runtime
83
+ - !ruby/object:Gem::Dependency
84
+ name: opener-daemons
78
85
  version_requirements: !ruby/object:Gem::Requirement
79
86
  requirements:
80
- - - ">="
87
+ - - '>='
81
88
  - !ruby/object:Gem::Version
82
89
  version: '0'
83
- - !ruby/object:Gem::Dependency
84
- name: opener-daemons
85
90
  requirement: !ruby/object:Gem::Requirement
86
91
  requirements:
87
- - - ">="
92
+ - - '>='
88
93
  - !ruby/object:Gem::Version
89
94
  version: '0'
90
- type: :runtime
91
95
  prerelease: false
96
+ type: :runtime
97
+ - !ruby/object:Gem::Dependency
98
+ name: puma
92
99
  version_requirements: !ruby/object:Gem::Requirement
93
100
  requirements:
94
- - - ">="
101
+ - - '>='
95
102
  - !ruby/object:Gem::Version
96
103
  version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: puma
99
104
  requirement: !ruby/object:Gem::Requirement
100
105
  requirements:
101
- - - ">="
106
+ - - '>='
102
107
  - !ruby/object:Gem::Version
103
108
  version: '0'
104
- type: :runtime
105
109
  prerelease: false
110
+ type: :runtime
111
+ - !ruby/object:Gem::Dependency
112
+ name: rspec
106
113
  version_requirements: !ruby/object:Gem::Requirement
107
114
  requirements:
108
- - - ">="
115
+ - - '>='
109
116
  - !ruby/object:Gem::Version
110
117
  version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: rspec
113
118
  requirement: !ruby/object:Gem::Requirement
114
119
  requirements:
115
- - - ">="
120
+ - - '>='
116
121
  - !ruby/object:Gem::Version
117
122
  version: '0'
118
- type: :development
119
123
  prerelease: false
124
+ type: :development
125
+ - !ruby/object:Gem::Dependency
126
+ name: cucumber
120
127
  version_requirements: !ruby/object:Gem::Requirement
121
128
  requirements:
122
- - - ">="
129
+ - - '>='
123
130
  - !ruby/object:Gem::Version
124
131
  version: '0'
125
- - !ruby/object:Gem::Dependency
126
- name: cucumber
127
132
  requirement: !ruby/object:Gem::Requirement
128
133
  requirements:
129
- - - ">="
134
+ - - '>='
130
135
  - !ruby/object:Gem::Version
131
136
  version: '0'
132
- type: :development
133
137
  prerelease: false
138
+ type: :development
139
+ - !ruby/object:Gem::Dependency
140
+ name: pry
134
141
  version_requirements: !ruby/object:Gem::Requirement
135
142
  requirements:
136
- - - ">="
143
+ - - '>='
137
144
  - !ruby/object:Gem::Version
138
145
  version: '0'
139
- - !ruby/object:Gem::Dependency
140
- name: pry
141
146
  requirement: !ruby/object:Gem::Requirement
142
147
  requirements:
143
- - - ">="
148
+ - - '>='
144
149
  - !ruby/object:Gem::Version
145
150
  version: '0'
146
- type: :development
147
151
  prerelease: false
152
+ type: :development
153
+ - !ruby/object:Gem::Dependency
154
+ name: rake
148
155
  version_requirements: !ruby/object:Gem::Requirement
149
156
  requirements:
150
- - - ">="
157
+ - - '>='
151
158
  - !ruby/object:Gem::Version
152
159
  version: '0'
153
- - !ruby/object:Gem::Dependency
154
- name: rake
155
160
  requirement: !ruby/object:Gem::Requirement
156
161
  requirements:
157
- - - ">="
162
+ - - '>='
158
163
  - !ruby/object:Gem::Version
159
164
  version: '0'
160
- type: :development
161
165
  prerelease: false
162
- version_requirements: !ruby/object:Gem::Requirement
163
- requirements:
164
- - - ">="
165
- - !ruby/object:Gem::Version
166
- version: '0'
166
+ type: :development
167
167
  description: Gem that wraps up the the tokenizer cores
168
- email:
168
+ email:
169
169
  executables:
170
170
  - tokenizer
171
171
  - tokenizer-daemon
@@ -181,6 +181,7 @@ files:
181
181
  - exec/tokenizer.rb
182
182
  - lib/opener/tokenizer.rb
183
183
  - lib/opener/tokenizer/cli.rb
184
+ - lib/opener/tokenizer/error_layer.rb
184
185
  - lib/opener/tokenizer/public/markdown.css
185
186
  - lib/opener/tokenizer/server.rb
186
187
  - lib/opener/tokenizer/version.rb
@@ -190,25 +191,24 @@ files:
190
191
  homepage: http://opener-project.github.com/
191
192
  licenses: []
192
193
  metadata: {}
193
- post_install_message:
194
+ post_install_message:
194
195
  rdoc_options: []
195
196
  require_paths:
196
197
  - lib
197
198
  required_ruby_version: !ruby/object:Gem::Requirement
198
199
  requirements:
199
- - - ">="
200
+ - - '>='
200
201
  - !ruby/object:Gem::Version
201
202
  version: 1.9.2
202
203
  required_rubygems_version: !ruby/object:Gem::Requirement
203
204
  requirements:
204
- - - ">="
205
+ - - '>='
205
206
  - !ruby/object:Gem::Version
206
207
  version: '0'
207
208
  requirements: []
208
- rubyforge_project:
209
+ rubyforge_project:
209
210
  rubygems_version: 2.2.2
210
- signing_key:
211
+ signing_key:
211
212
  specification_version: 4
212
213
  summary: Gem that wraps up the the tokenizer cores
213
214
  test_files: []
214
- has_rdoc: yard