opener-language-identifier 3.0.3 → 3.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: dcf50a5847ebd111af672d41d6f2a40dfc6e5c2b
4
- data.tar.gz: de82ac2bcc8485d334cfd9a8b84c373781abda89
3
+ metadata.gz: ce2775b5964868c6ad0e00519dde29c5dc1654a4
4
+ data.tar.gz: be35d8f78c6a32b39a77e41b9ba673c70e37c728
5
5
  SHA512:
6
- metadata.gz: 45a785402359135646b38000ad9502ebbe44389d38f91c92eed73b315d2397524a3522b9d8fa9f2a6a42626bad9ae30971ed90f6301af38d0f843b197beaca58
7
- data.tar.gz: 571cbb5c6a46acabedbb572832a46c3cfe565fdc0d2ceeb78168d7e58843b88659f764fcfa810bb6f418d3b054b21b3d9f5aa9556b50d3a9430e2fac704e3e2e
6
+ metadata.gz: 527280005269de7dadc0e7a4c8169c8f9da4f922e596e04e2765f8e69b0d5873c96e1f2c5f4d902ff7ca5fd4fa85fa9758c693f6fa1eb0cc5a905c5b41365337
7
+ data.tar.gz: d4cf50e2110aa86c9c908068e9fc339e4124523edeef301fcd49b42c0776cc41d3e5100681b2f12f6e480ebef4a2abb743036ac430f7abe86882a27fe38f1586
data/README.md CHANGED
@@ -2,41 +2,48 @@
2
2
 
3
3
  # Language Identifier
4
4
 
5
- The language identifier takes raw text and tries to figure out what language it was written in. The output can either be a plain-text i18n language code or a basic KAF document containing the language and raw input text.
5
+ The language identifier takes raw text and tries to figure out what language it
6
+ was written in. The output can either be a plain-text i18n language code or a
7
+ basic KAF document containing the language and raw input text.
6
8
 
7
- The output of the language identifier can then be used to drive further text analysis of for example sentiments and or entities.
9
+ The output of the language identifier can then be used to drive further text
10
+ analysis of for example sentiments and or entities.
8
11
 
9
- ### Confused by some terminology?
12
+ ## Confused by some terminology?
10
13
 
11
- This software is part of a larger collection of natural language processing tools known as "the OpeNER project". You can find more information about the project at [the OpeNER portal](http://opener-project.github.io). There you can also find references to terms like KAF (an XML standard to represent linguistic annotations in texts), component, cores, scenario's and pipelines.
14
+ This software is part of a larger collection of natural language processing
15
+ tools known as "the OpeNER project". You can find more information about the
16
+ project at [the OpeNER portal](http://opener-project.github.io). There you can
17
+ also find references to terms like KAF (an XML standard to represent linguistic
18
+ annotations in texts), component, cores, scenario's and pipelines.
12
19
 
13
- Quick Use Example
14
- -----------------
20
+ ## Quick Use Example
15
21
 
16
22
  Install the Gem:
17
23
 
18
24
  gem install opener-language-identifier
19
25
 
20
- Make sure you run ```jruby``` since the language-identifier uses Java.
26
+ Make sure you run `jruby` since the language-identifier uses Java.
21
27
 
22
28
  ### Command line interface
23
29
 
24
- You should now be able to call the language indentifier as a regular shell command: by its name. Once installed the gem normally sits in your path so you can call it directly from anywhere.
30
+ You should now be able to call the language indentifier as a regular shell
31
+ command: by its name. Once installed the gem normally sits in your path so you
32
+ can call it directly from anywhere.
25
33
 
26
- This aplication reads a text from standard input in order to identify the language.
34
+ This aplication reads a text from standard input in order to identify the
35
+ language.
27
36
 
28
37
  echo "This is an English text." | language-identifier
29
38
 
30
39
  This will output:
31
40
 
32
- ~~~~
33
- <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
34
- <KAF xml:lang="en" version="2.1">
35
- <raw>This is an English text.</raw>
36
- </KAF>
37
- ~~~~
41
+ <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
42
+ <KAF xml:lang="en" version="2.1">
43
+ <raw>This is an English text.</raw>
44
+ </KAF>
38
45
 
39
- If you just want the language code returned add the ```--no-kaf``` option like this
46
+ If you just want the language code returned add the `--no-kaf` option like this
40
47
 
41
48
  echo "This is an English text." | language-identifier --no-kaf
42
49
 
@@ -50,7 +57,8 @@ You can launch a language identification webservice by executing:
50
57
 
51
58
  $ language-identifier-server
52
59
 
53
- This will launch a mini webserver with the webservice. It defaults to port 9292, so you can access it at <http://localhost:9292/>.
60
+ This will launch a mini webserver with the webservice. It defaults to port
61
+ 9292, so you can access it at <http://localhost:9292/>.
54
62
 
55
63
  To launch it on a different port provide the `-p [port-number]` option like
56
64
  this:
@@ -61,61 +69,58 @@ It then launches at <http://localhost:1234/>
61
69
 
62
70
  Documentation on the Webservice is provided by surfing to the urls provided
63
71
  above. For more information on how to launch a webservice run the command with
64
- the ```-h``` option.
72
+ the `-h` option.
65
73
 
66
74
  ### Daemon
67
75
 
68
- Last but not least the language identifier comes shipped with a daemon that
69
- can read jobs (and write) jobs to and from Amazon SQS queues. For more
70
- information type:
76
+ Last but not least the language identifier comes shipped with a daemon that can
77
+ read jobs (and write) jobs to and from Amazon SQS queues. For more information
78
+ type:
71
79
 
72
80
  $ language-identifier-daemon -h
73
81
 
74
- Description of dependencies
75
- ---------------------------
82
+ ## Description of dependencies
76
83
 
77
84
  This component runs best if you run it in an environment suited for OpeNER
78
- components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the OpenerWebsite](http://opener-project.github.io/getting-started/how-to/local-installation.html)
85
+ components. You can find an installation guide and helper tools in the
86
+ [OpeNER installer](https://github.com/opener-project/opener-installer) and
87
+ [an installation guide on the OpenerWebsite](http://opener-project.github.io/getting-started/how-to/local-installation.html).
79
88
 
80
89
  At least you need the following system setup:
81
90
 
82
91
  ### Dependencies for normal use:
83
92
 
84
- * Python 2.6 - PIP, possibly VirtualEnv
85
- * Jruby
86
- * Java 1.7 or newer (There are problems with encoding in older versions).
93
+ * JRuby 1.7 or newer
94
+ * Java 1.7 or newer (there are problems with encodings in older versions).
87
95
 
88
96
  ### Dependencies if you want to modify the component:
89
97
 
90
98
  * Maven (for building the Gem)
91
99
 
92
- Language Extension
93
- ------------------
100
+ ## Language Extension
94
101
 
95
- The internal library that actually performs the language identification already supports a lot of languages.
96
- For more information about how to extends it for more languages or functionalities, please, visit the website of the tool at <a href="https://code.google.com/p/language-detection/">https://code.google.com/p/language-detection/</a>
102
+ The internal library that actually performs the language identification already
103
+ supports a lot of languages. For more information about how to extends it for
104
+ more languages or functionalities, please, visit the website of the tool at
105
+ <https://code.google.com/p/language-detection/>.
106
+
107
+ ## The Core
97
108
 
98
- The Core
99
- --------
100
-
101
109
  The component is a fat wrapper around the actual language technology core.
102
110
  Written in Java. Checkout the core/src directory of the package to get to the
103
111
  actual working component.
104
112
 
105
- Where to go from here
106
- ---------------------
113
+ ## Where to go from here
107
114
 
108
115
  * [Check the project website](http://opener-project.github.io)
109
116
  * [Checkout the webservice](http://opener.olery.com/language-identifier)
110
117
 
111
- Report problem/Get help
112
- -----------------------
118
+ ## Report problem/Get help
113
119
 
114
120
  If you encounter problems, please email support@opener-project.eu or leave an
115
- issue in the [issue tracker](https://github.com/opener-project/language-identifier/issues).
121
+ issue in the [issue tracker](https://github.com/opener-project/language-identifier/issues).
116
122
 
117
- Contributing
118
- ------------
123
+ ## Contributing
119
124
 
120
125
  1. Fork it <http://github.com/opener-project/language-identifier/fork>
121
126
  2. Create your feature branch (`git checkout -b my-new-feature`)
@@ -9,6 +9,7 @@ import 'org.vicomtech.opennlp.LanguageDetection.CybozuDetector'
9
9
  require_relative 'language_identifier/version'
10
10
  require_relative 'language_identifier/kaf_builder'
11
11
  require_relative 'language_identifier/cli'
12
+ require_relative 'language_identifier/error_layer'
12
13
  require_relative 'language_identifier/detector.rb'
13
14
 
14
15
  module Opener
@@ -57,14 +58,19 @@ module Opener
57
58
  # @return [Array]
58
59
  #
59
60
  def run(input)
60
- if options[:probs]
61
- output = @detector.probabilities(input)
62
- else
63
- output = @detector.detect(input)
64
- output = build_kaf(input, output) if @options[:kaf]
65
- end
61
+ begin
62
+ if options[:probs]
63
+ output = @detector.probabilities(input)
64
+ else
65
+ output = @detector.detect(input)
66
+ output = build_kaf(input, output) if @options[:kaf]
67
+ end
66
68
 
67
- return output
69
+ return output
70
+
71
+ rescue Exception => error
72
+ return ErrorLayer.new(input, error.message, self.class).add
73
+ end
68
74
  end
69
75
 
70
76
  alias identify run
@@ -0,0 +1,91 @@
1
+ require 'nokogiri'
2
+
3
+ module Opener
4
+ class LanguageIdentifier
5
+ ##
6
+ # Add Error Layer to KAF file instead of throwing an error.
7
+ #
8
+ class ErrorLayer
9
+ attr_accessor :input, :document, :error, :klass
10
+
11
+ def initialize(input, error, klass)
12
+ @input = input.to_s
13
+ # Make sure there is always a document, even if it is empty.
14
+ @document = Nokogiri::XML(input) rescue Nokogiri::XML(nil)
15
+ @error = error
16
+ @klass = klass
17
+ end
18
+
19
+ def add
20
+ if is_xml?
21
+ unless has_errors_layer?
22
+ add_errors_layer
23
+ end
24
+ else
25
+ add_root
26
+ add_text
27
+ add_errors_layer
28
+ end
29
+ add_error
30
+
31
+ xml = !!document.encoding ? document.to_xml : document.to_xml(:encoding => "UTF-8")
32
+
33
+ return xml
34
+ end
35
+
36
+ ##
37
+ # Check if the document is a valid XML file.
38
+ #
39
+ def is_xml?
40
+ !!document.root
41
+ end
42
+
43
+ ##
44
+ # Add root element to the XML file.
45
+ #
46
+ def add_root
47
+ root = Nokogiri::XML::Node.new "KAF", document
48
+ document.add_child(root)
49
+ end
50
+
51
+ ##
52
+ # Check if the document already has an errors layer.
53
+ #
54
+ def has_errors_layer?
55
+ !!document.at('errors')
56
+ end
57
+
58
+ ##
59
+ # Add errors element to the XML file.
60
+ #
61
+ def add_errors_layer
62
+ node = Nokogiri::XML::Node.new "errors", document
63
+ document.root.add_child(node)
64
+ end
65
+
66
+ ##
67
+ # Add the text file incase it is not a valid XML document. More
68
+ # info for debugging.
69
+ #
70
+ def add_text
71
+ node = Nokogiri::XML::Node.new "raw", document
72
+ node.inner_html = input
73
+ document.root.add_child(node)
74
+
75
+ end
76
+
77
+ ##
78
+ # Add the actual error to the errors layer.
79
+ #
80
+ def add_error
81
+ node = document.at('errors')
82
+ error_node = Nokogiri::XML::Node.new "error", node
83
+ error_node['class'] = klass.to_s
84
+ error_node['version'] = klass::VERSION
85
+ error_node.inner_html = error
86
+ node.add_child(error_node)
87
+ end
88
+
89
+ end # ErrorLayer
90
+ end # LanguageIdentifier
91
+ end # Opener
@@ -1,5 +1,5 @@
1
1
  module Opener
2
2
  class LanguageIdentifier
3
- VERSION = "3.0.3"
3
+ VERSION = "3.0.4"
4
4
  end
5
5
  end
@@ -12,7 +12,6 @@ Gem::Specification.new do |gem|
12
12
 
13
13
  gem.files = Dir.glob([
14
14
  'core/target/LanguageDetection-*.jar',
15
- 'core/target/classes/**/*.*',
16
15
  'core/target/classes/**/*',
17
16
  'exec/**/*',
18
17
  'lib/**/*',
@@ -28,11 +27,12 @@ Gem::Specification.new do |gem|
28
27
  gem.add_dependency 'sinatra', '~>1.4.2'
29
28
  gem.add_dependency 'httpclient'
30
29
  gem.add_dependency 'uuidtools'
31
- gem.add_dependency 'opener-build-tools'
32
30
  gem.add_dependency 'opener-webservice'
33
31
  gem.add_dependency 'opener-daemons'
32
+ gem.add_dependency 'nokogiri'
34
33
 
35
- gem.add_development_dependency 'rspec'
34
+ gem.add_development_dependency 'rspec', '~> 3.0'
36
35
  gem.add_development_dependency 'cucumber'
37
36
  gem.add_development_dependency 'rake'
37
+ gem.add_development_dependency 'cliver'
38
38
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: opener-language-identifier
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.3
4
+ version: 3.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - development@olery.com
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-05-24 00:00:00.000000000 Z
11
+ date: 2014-06-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: builder
@@ -81,7 +81,7 @@ dependencies:
81
81
  prerelease: false
82
82
  type: :runtime
83
83
  - !ruby/object:Gem::Dependency
84
- name: opener-build-tools
84
+ name: opener-webservice
85
85
  version_requirements: !ruby/object:Gem::Requirement
86
86
  requirements:
87
87
  - - '>='
@@ -95,7 +95,7 @@ dependencies:
95
95
  prerelease: false
96
96
  type: :runtime
97
97
  - !ruby/object:Gem::Dependency
98
- name: opener-webservice
98
+ name: opener-daemons
99
99
  version_requirements: !ruby/object:Gem::Requirement
100
100
  requirements:
101
101
  - - '>='
@@ -109,7 +109,7 @@ dependencies:
109
109
  prerelease: false
110
110
  type: :runtime
111
111
  - !ruby/object:Gem::Dependency
112
- name: opener-daemons
112
+ name: nokogiri
113
113
  version_requirements: !ruby/object:Gem::Requirement
114
114
  requirements:
115
115
  - - '>='
@@ -124,6 +124,20 @@ dependencies:
124
124
  type: :runtime
125
125
  - !ruby/object:Gem::Dependency
126
126
  name: rspec
127
+ version_requirements: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ~>
130
+ - !ruby/object:Gem::Version
131
+ version: '3.0'
132
+ requirement: !ruby/object:Gem::Requirement
133
+ requirements:
134
+ - - ~>
135
+ - !ruby/object:Gem::Version
136
+ version: '3.0'
137
+ prerelease: false
138
+ type: :development
139
+ - !ruby/object:Gem::Dependency
140
+ name: cucumber
127
141
  version_requirements: !ruby/object:Gem::Requirement
128
142
  requirements:
129
143
  - - '>='
@@ -137,7 +151,7 @@ dependencies:
137
151
  prerelease: false
138
152
  type: :development
139
153
  - !ruby/object:Gem::Dependency
140
- name: cucumber
154
+ name: rake
141
155
  version_requirements: !ruby/object:Gem::Requirement
142
156
  requirements:
143
157
  - - '>='
@@ -151,7 +165,7 @@ dependencies:
151
165
  prerelease: false
152
166
  type: :development
153
167
  - !ruby/object:Gem::Dependency
154
- name: rake
168
+ name: cliver
155
169
  version_requirements: !ruby/object:Gem::Requirement
156
170
  requirements:
157
171
  - - '>='
@@ -253,6 +267,7 @@ files:
253
267
  - lib/opener/language_identifier.rb
254
268
  - lib/opener/language_identifier/cli.rb
255
269
  - lib/opener/language_identifier/detector.rb
270
+ - lib/opener/language_identifier/error_layer.rb
256
271
  - lib/opener/language_identifier/kaf_builder.rb
257
272
  - lib/opener/language_identifier/public/markdown.css
258
273
  - lib/opener/language_identifier/server.rb