opener-language-identifier 3.0.3 → 3.0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: dcf50a5847ebd111af672d41d6f2a40dfc6e5c2b
4
- data.tar.gz: de82ac2bcc8485d334cfd9a8b84c373781abda89
3
+ metadata.gz: ce2775b5964868c6ad0e00519dde29c5dc1654a4
4
+ data.tar.gz: be35d8f78c6a32b39a77e41b9ba673c70e37c728
5
5
  SHA512:
6
- metadata.gz: 45a785402359135646b38000ad9502ebbe44389d38f91c92eed73b315d2397524a3522b9d8fa9f2a6a42626bad9ae30971ed90f6301af38d0f843b197beaca58
7
- data.tar.gz: 571cbb5c6a46acabedbb572832a46c3cfe565fdc0d2ceeb78168d7e58843b88659f764fcfa810bb6f418d3b054b21b3d9f5aa9556b50d3a9430e2fac704e3e2e
6
+ metadata.gz: 527280005269de7dadc0e7a4c8169c8f9da4f922e596e04e2765f8e69b0d5873c96e1f2c5f4d902ff7ca5fd4fa85fa9758c693f6fa1eb0cc5a905c5b41365337
7
+ data.tar.gz: d4cf50e2110aa86c9c908068e9fc339e4124523edeef301fcd49b42c0776cc41d3e5100681b2f12f6e480ebef4a2abb743036ac430f7abe86882a27fe38f1586
data/README.md CHANGED
@@ -2,41 +2,48 @@
2
2
 
3
3
  # Language Identifier
4
4
 
5
- The language identifier takes raw text and tries to figure out what language it was written in. The output can either be a plain-text i18n language code or a basic KAF document containing the language and raw input text.
5
+ The language identifier takes raw text and tries to figure out what language it
6
+ was written in. The output can either be a plain-text i18n language code or a
7
+ basic KAF document containing the language and raw input text.
6
8
 
7
- The output of the language identifier can then be used to drive further text analysis of for example sentiments and or entities.
9
+ The output of the language identifier can then be used to drive further text
10
+ analysis of for example sentiments and or entities.
8
11
 
9
- ### Confused by some terminology?
12
+ ## Confused by some terminology?
10
13
 
11
- This software is part of a larger collection of natural language processing tools known as "the OpeNER project". You can find more information about the project at [the OpeNER portal](http://opener-project.github.io). There you can also find references to terms like KAF (an XML standard to represent linguistic annotations in texts), component, cores, scenario's and pipelines.
14
+ This software is part of a larger collection of natural language processing
15
+ tools known as "the OpeNER project". You can find more information about the
16
+ project at [the OpeNER portal](http://opener-project.github.io). There you can
17
+ also find references to terms like KAF (an XML standard to represent linguistic
18
+ annotations in texts), component, cores, scenario's and pipelines.
12
19
 
13
- Quick Use Example
14
- -----------------
20
+ ## Quick Use Example
15
21
 
16
22
  Install the Gem:
17
23
 
18
24
  gem install opener-language-identifier
19
25
 
20
- Make sure you run ```jruby``` since the language-identifier uses Java.
26
+ Make sure you run `jruby` since the language-identifier uses Java.
21
27
 
22
28
  ### Command line interface
23
29
 
24
- You should now be able to call the language indentifier as a regular shell command: by its name. Once installed the gem normally sits in your path so you can call it directly from anywhere.
30
+ You should now be able to call the language indentifier as a regular shell
31
+ command: by its name. Once installed the gem normally sits in your path so you
32
+ can call it directly from anywhere.
25
33
 
26
- This aplication reads a text from standard input in order to identify the language.
34
+ This aplication reads a text from standard input in order to identify the
35
+ language.
27
36
 
28
37
  echo "This is an English text." | language-identifier
29
38
 
30
39
  This will output:
31
40
 
32
- ~~~~
33
- <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
34
- <KAF xml:lang="en" version="2.1">
35
- <raw>This is an English text.</raw>
36
- </KAF>
37
- ~~~~
41
+ <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
42
+ <KAF xml:lang="en" version="2.1">
43
+ <raw>This is an English text.</raw>
44
+ </KAF>
38
45
 
39
- If you just want the language code returned add the ```--no-kaf``` option like this
46
+ If you just want the language code returned add the `--no-kaf` option like this
40
47
 
41
48
  echo "This is an English text." | language-identifier --no-kaf
42
49
 
@@ -50,7 +57,8 @@ You can launch a language identification webservice by executing:
50
57
 
51
58
  $ language-identifier-server
52
59
 
53
- This will launch a mini webserver with the webservice. It defaults to port 9292, so you can access it at <http://localhost:9292/>.
60
+ This will launch a mini webserver with the webservice. It defaults to port
61
+ 9292, so you can access it at <http://localhost:9292/>.
54
62
 
55
63
  To launch it on a different port provide the `-p [port-number]` option like
56
64
  this:
@@ -61,61 +69,58 @@ It then launches at <http://localhost:1234/>
61
69
 
62
70
  Documentation on the Webservice is provided by surfing to the urls provided
63
71
  above. For more information on how to launch a webservice run the command with
64
- the ```-h``` option.
72
+ the `-h` option.
65
73
 
66
74
  ### Daemon
67
75
 
68
- Last but not least the language identifier comes shipped with a daemon that
69
- can read jobs (and write) jobs to and from Amazon SQS queues. For more
70
- information type:
76
+ Last but not least the language identifier comes shipped with a daemon that can
77
+ read jobs (and write) jobs to and from Amazon SQS queues. For more information
78
+ type:
71
79
 
72
80
  $ language-identifier-daemon -h
73
81
 
74
- Description of dependencies
75
- ---------------------------
82
+ ## Description of dependencies
76
83
 
77
84
  This component runs best if you run it in an environment suited for OpeNER
78
- components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the OpenerWebsite](http://opener-project.github.io/getting-started/how-to/local-installation.html)
85
+ components. You can find an installation guide and helper tools in the
86
+ [OpeNER installer](https://github.com/opener-project/opener-installer) and
87
+ [an installation guide on the OpenerWebsite](http://opener-project.github.io/getting-started/how-to/local-installation.html).
79
88
 
80
89
  At least you need the following system setup:
81
90
 
82
91
  ### Dependencies for normal use:
83
92
 
84
- * Python 2.6 - PIP, possibly VirtualEnv
85
- * Jruby
86
- * Java 1.7 or newer (There are problems with encoding in older versions).
93
+ * JRuby 1.7 or newer
94
+ * Java 1.7 or newer (there are problems with encodings in older versions).
87
95
 
88
96
  ### Dependencies if you want to modify the component:
89
97
 
90
98
  * Maven (for building the Gem)
91
99
 
92
- Language Extension
93
- ------------------
100
+ ## Language Extension
94
101
 
95
- The internal library that actually performs the language identification already supports a lot of languages.
96
- For more information about how to extends it for more languages or functionalities, please, visit the website of the tool at <a href="https://code.google.com/p/language-detection/">https://code.google.com/p/language-detection/</a>
102
+ The internal library that actually performs the language identification already
103
+ supports a lot of languages. For more information about how to extends it for
104
+ more languages or functionalities, please, visit the website of the tool at
105
+ <https://code.google.com/p/language-detection/>.
106
+
107
+ ## The Core
97
108
 
98
- The Core
99
- --------
100
-
101
109
  The component is a fat wrapper around the actual language technology core.
102
110
  Written in Java. Checkout the core/src directory of the package to get to the
103
111
  actual working component.
104
112
 
105
- Where to go from here
106
- ---------------------
113
+ ## Where to go from here
107
114
 
108
115
  * [Check the project website](http://opener-project.github.io)
109
116
  * [Checkout the webservice](http://opener.olery.com/language-identifier)
110
117
 
111
- Report problem/Get help
112
- -----------------------
118
+ ## Report problem/Get help
113
119
 
114
120
  If you encounter problems, please email support@opener-project.eu or leave an
115
- issue in the [issue tracker](https://github.com/opener-project/language-identifier/issues).
121
+ issue in the [issue tracker](https://github.com/opener-project/language-identifier/issues).
116
122
 
117
- Contributing
118
- ------------
123
+ ## Contributing
119
124
 
120
125
  1. Fork it <http://github.com/opener-project/language-identifier/fork>
121
126
  2. Create your feature branch (`git checkout -b my-new-feature`)
@@ -9,6 +9,7 @@ import 'org.vicomtech.opennlp.LanguageDetection.CybozuDetector'
9
9
  require_relative 'language_identifier/version'
10
10
  require_relative 'language_identifier/kaf_builder'
11
11
  require_relative 'language_identifier/cli'
12
+ require_relative 'language_identifier/error_layer'
12
13
  require_relative 'language_identifier/detector.rb'
13
14
 
14
15
  module Opener
@@ -57,14 +58,19 @@ module Opener
57
58
  # @return [Array]
58
59
  #
59
60
  def run(input)
60
- if options[:probs]
61
- output = @detector.probabilities(input)
62
- else
63
- output = @detector.detect(input)
64
- output = build_kaf(input, output) if @options[:kaf]
65
- end
61
+ begin
62
+ if options[:probs]
63
+ output = @detector.probabilities(input)
64
+ else
65
+ output = @detector.detect(input)
66
+ output = build_kaf(input, output) if @options[:kaf]
67
+ end
66
68
 
67
- return output
69
+ return output
70
+
71
+ rescue Exception => error
72
+ return ErrorLayer.new(input, error.message, self.class).add
73
+ end
68
74
  end
69
75
 
70
76
  alias identify run
@@ -0,0 +1,91 @@
1
+ require 'nokogiri'
2
+
3
+ module Opener
4
+ class LanguageIdentifier
5
+ ##
6
+ # Add Error Layer to KAF file instead of throwing an error.
7
+ #
8
+ class ErrorLayer
9
+ attr_accessor :input, :document, :error, :klass
10
+
11
+ def initialize(input, error, klass)
12
+ @input = input.to_s
13
+ # Make sure there is always a document, even if it is empty.
14
+ @document = Nokogiri::XML(input) rescue Nokogiri::XML(nil)
15
+ @error = error
16
+ @klass = klass
17
+ end
18
+
19
+ def add
20
+ if is_xml?
21
+ unless has_errors_layer?
22
+ add_errors_layer
23
+ end
24
+ else
25
+ add_root
26
+ add_text
27
+ add_errors_layer
28
+ end
29
+ add_error
30
+
31
+ xml = !!document.encoding ? document.to_xml : document.to_xml(:encoding => "UTF-8")
32
+
33
+ return xml
34
+ end
35
+
36
+ ##
37
+ # Check if the document is a valid XML file.
38
+ #
39
+ def is_xml?
40
+ !!document.root
41
+ end
42
+
43
+ ##
44
+ # Add root element to the XML file.
45
+ #
46
+ def add_root
47
+ root = Nokogiri::XML::Node.new "KAF", document
48
+ document.add_child(root)
49
+ end
50
+
51
+ ##
52
+ # Check if the document already has an errors layer.
53
+ #
54
+ def has_errors_layer?
55
+ !!document.at('errors')
56
+ end
57
+
58
+ ##
59
+ # Add errors element to the XML file.
60
+ #
61
+ def add_errors_layer
62
+ node = Nokogiri::XML::Node.new "errors", document
63
+ document.root.add_child(node)
64
+ end
65
+
66
+ ##
67
+ # Add the text file incase it is not a valid XML document. More
68
+ # info for debugging.
69
+ #
70
+ def add_text
71
+ node = Nokogiri::XML::Node.new "raw", document
72
+ node.inner_html = input
73
+ document.root.add_child(node)
74
+
75
+ end
76
+
77
+ ##
78
+ # Add the actual error to the errors layer.
79
+ #
80
+ def add_error
81
+ node = document.at('errors')
82
+ error_node = Nokogiri::XML::Node.new "error", node
83
+ error_node['class'] = klass.to_s
84
+ error_node['version'] = klass::VERSION
85
+ error_node.inner_html = error
86
+ node.add_child(error_node)
87
+ end
88
+
89
+ end # ErrorLayer
90
+ end # LanguageIdentifier
91
+ end # Opener
@@ -1,5 +1,5 @@
1
1
  module Opener
2
2
  class LanguageIdentifier
3
- VERSION = "3.0.3"
3
+ VERSION = "3.0.4"
4
4
  end
5
5
  end
@@ -12,7 +12,6 @@ Gem::Specification.new do |gem|
12
12
 
13
13
  gem.files = Dir.glob([
14
14
  'core/target/LanguageDetection-*.jar',
15
- 'core/target/classes/**/*.*',
16
15
  'core/target/classes/**/*',
17
16
  'exec/**/*',
18
17
  'lib/**/*',
@@ -28,11 +27,12 @@ Gem::Specification.new do |gem|
28
27
  gem.add_dependency 'sinatra', '~>1.4.2'
29
28
  gem.add_dependency 'httpclient'
30
29
  gem.add_dependency 'uuidtools'
31
- gem.add_dependency 'opener-build-tools'
32
30
  gem.add_dependency 'opener-webservice'
33
31
  gem.add_dependency 'opener-daemons'
32
+ gem.add_dependency 'nokogiri'
34
33
 
35
- gem.add_development_dependency 'rspec'
34
+ gem.add_development_dependency 'rspec', '~> 3.0'
36
35
  gem.add_development_dependency 'cucumber'
37
36
  gem.add_development_dependency 'rake'
37
+ gem.add_development_dependency 'cliver'
38
38
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opener-language-identifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.3
4
+ version: 3.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - development@olery.com
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-24 00:00:00.000000000 Z
11
+ date: 2014-06-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: builder
@@ -81,7 +81,7 @@ dependencies:
81
81
  prerelease: false
82
82
  type: :runtime
83
83
  - !ruby/object:Gem::Dependency
84
- name: opener-build-tools
84
+ name: opener-webservice
85
85
  version_requirements: !ruby/object:Gem::Requirement
86
86
  requirements:
87
87
  - - '>='
@@ -95,7 +95,7 @@ dependencies:
95
95
  prerelease: false
96
96
  type: :runtime
97
97
  - !ruby/object:Gem::Dependency
98
- name: opener-webservice
98
+ name: opener-daemons
99
99
  version_requirements: !ruby/object:Gem::Requirement
100
100
  requirements:
101
101
  - - '>='
@@ -109,7 +109,7 @@ dependencies:
109
109
  prerelease: false
110
110
  type: :runtime
111
111
  - !ruby/object:Gem::Dependency
112
- name: opener-daemons
112
+ name: nokogiri
113
113
  version_requirements: !ruby/object:Gem::Requirement
114
114
  requirements:
115
115
  - - '>='
@@ -124,6 +124,20 @@ dependencies:
124
124
  type: :runtime
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: rspec
127
+ version_requirements: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ~>
130
+ - !ruby/object:Gem::Version
131
+ version: '3.0'
132
+ requirement: !ruby/object:Gem::Requirement
133
+ requirements:
134
+ - - ~>
135
+ - !ruby/object:Gem::Version
136
+ version: '3.0'
137
+ prerelease: false
138
+ type: :development
139
+ - !ruby/object:Gem::Dependency
140
+ name: cucumber
127
141
  version_requirements: !ruby/object:Gem::Requirement
128
142
  requirements:
129
143
  - - '>='
@@ -137,7 +151,7 @@ dependencies:
137
151
  prerelease: false
138
152
  type: :development
139
153
  - !ruby/object:Gem::Dependency
140
- name: cucumber
154
+ name: rake
141
155
  version_requirements: !ruby/object:Gem::Requirement
142
156
  requirements:
143
157
  - - '>='
@@ -151,7 +165,7 @@ dependencies:
151
165
  prerelease: false
152
166
  type: :development
153
167
  - !ruby/object:Gem::Dependency
154
- name: rake
168
+ name: cliver
155
169
  version_requirements: !ruby/object:Gem::Requirement
156
170
  requirements:
157
171
  - - '>='
@@ -253,6 +267,7 @@ files:
253
267
  - lib/opener/language_identifier.rb
254
268
  - lib/opener/language_identifier/cli.rb
255
269
  - lib/opener/language_identifier/detector.rb
270
+ - lib/opener/language_identifier/error_layer.rb
256
271
  - lib/opener/language_identifier/kaf_builder.rb
257
272
  - lib/opener/language_identifier/public/markdown.css
258
273
  - lib/opener/language_identifier/server.rb