opener-tree-tagger 3.3.0 → 4.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/LICENSE.txt +13 -0
- data/README.md +65 -31
- data/bin/tree-tagger-daemon +5 -2
- data/bin/tree-tagger-server +6 -4
- data/exec/tree-tagger.rb +2 -2
- data/lib/opener/tree_tagger.rb +0 -4
- data/lib/opener/tree_tagger/server.rb +4 -5
- data/lib/opener/tree_tagger/version.rb +1 -1
- data/opener-tree-tagger.gemspec +8 -7
- metadata +21 -67
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: be445f3932e633ffbe7c11fd31393801c72cc025
|
4
|
+
data.tar.gz: f3900dd4dd481cac768268bed936db806dfa710c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a3abb1d49a7e01e084b6c4b451405586f6e83ce67ff1c086d5965bb6b906b1687e748ca7a3312a26e2d431c10b48a716ea892a72b882dcdde1a70e0dcf8d4f4f
|
7
|
+
data.tar.gz: bae2c53f3f670d810b53e35cf126791a09f6106b0841c3b24d90165abb91a0c4234be6d4606dff2a63a721bf3129a665355aa2e307cd38934f698bd2e2cc3095
|
data/LICENSE.txt
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
Copyright 2014 OpeNER Project Consortium
|
2
|
+
|
3
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
4
|
+
you may not use this file except in compliance with the License.
|
5
|
+
You may obtain a copy of the License at
|
6
|
+
|
7
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
8
|
+
|
9
|
+
Unless required by applicable law or agreed to in writing, software
|
10
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
11
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12
|
+
See the License for the specific language governing permissions and
|
13
|
+
limitations under the License.
|
data/README.md
CHANGED
@@ -3,13 +3,28 @@
|
|
3
3
|
Introduction
|
4
4
|
------------
|
5
5
|
|
6
|
-
This module implements a wrapper to process text with the PoS tagger TreeTagger.
|
7
|
-
|
8
|
-
|
6
|
+
This module implements a wrapper to process text with the PoS tagger TreeTagger.
|
7
|
+
TreeTagger is a tool that assigns the lemmas and part-of-speech information to
|
8
|
+
an input text. This module takes KAF as input, with the token layer created (for
|
9
|
+
instance by one of our tokenizer modules) and outputs KAF with a new term layer.
|
10
|
+
It is important to note that the token layer in the input is not modified in the
|
11
|
+
output, so the program takes care of performing the correct matching between the
|
12
|
+
term and the token layer.
|
13
|
+
|
14
|
+
The language of the input KAF text has to be specified through the attribute
|
15
|
+
xml:lang in the main KAF element. This module works for text in all the
|
16
|
+
languages covered by the OpeNER project (English, Dutch,German, Italian, Spanish
|
17
|
+
and French). It can be easily extended to other languages by downloading the
|
18
|
+
specific TreeTagger models for that language and providing a mapping from the
|
19
|
+
tagset used by these models to the tagset defined in KAF.
|
9
20
|
|
10
21
|
### Confused by some terminology?
|
11
22
|
|
12
|
-
This software is part of a larger collection of natural language processing
|
23
|
+
This software is part of a larger collection of natural language processing
|
24
|
+
tools known as "the OpeNER project". You can find more information about the
|
25
|
+
project at [the OpeNER portal](http://opener-project.github.io). There you can
|
26
|
+
also find references to terms like KAF (an XML standard to represent linguistic
|
27
|
+
annotations in texts), component, cores, scenario's and pipelines.
|
13
28
|
|
14
29
|
Quick Use Example
|
15
30
|
-----------------
|
@@ -18,25 +33,32 @@ Installing the tree-tagger can be done by executing:
|
|
18
33
|
|
19
34
|
gem install opener-tree-tagger
|
20
35
|
|
21
|
-
Also make sure you have tree-tagger and the proper language files installed AND
|
36
|
+
Also make sure you have tree-tagger and the proper language files installed AND
|
37
|
+
that you set the path to the tree-tagger in the `TREE_TAGGER_PATH` environment
|
38
|
+
variable.
|
22
39
|
|
23
|
-
Besides that, make sure you install lxml. You can probably achieve this by
|
40
|
+
Besides that, make sure you install lxml. You can probably achieve this by
|
41
|
+
typing
|
24
42
|
|
25
43
|
pip install lxml
|
26
44
|
|
27
|
-
If that doesn't work, please check the
|
45
|
+
If that doesn't work, please check the
|
46
|
+
[installation guide on the OpeNER portal](http://opener-project.github.io/getting-started/how-to/local-installation.html).
|
28
47
|
|
29
|
-
Please
|
48
|
+
Please keep in mind that all components in OpeNER take KAF as an input and
|
49
|
+
output KAF by default.
|
30
50
|
|
31
51
|
### Command line interface
|
32
52
|
|
33
|
-
You should now be able to call the tree-tagger as a regular shell command: by
|
53
|
+
You should now be able to call the tree-tagger as a regular shell command: by
|
54
|
+
its name. Once installed the gem normalyl sits in your path so you can call it
|
55
|
+
directly from anywhere.
|
34
56
|
|
35
|
-
This aplication reads a text from standard input in order to identify the
|
57
|
+
This aplication reads a text from standard input in order to identify the
|
58
|
+
language:
|
36
59
|
|
37
60
|
cat some_kind_of_kaf_file.kaf | tree-tagger
|
38
61
|
|
39
|
-
|
40
62
|
This will output KAF xml.
|
41
63
|
|
42
64
|
### Webservices
|
@@ -45,58 +67,70 @@ You can launch a language identification webservice by executing:
|
|
45
67
|
|
46
68
|
tree-tagger-server
|
47
69
|
|
48
|
-
This will launch a
|
70
|
+
This will launch a webserver with the webservice. It defaults to port 9292,
|
71
|
+
so you can access it at <http://localhost:9292>.
|
49
72
|
|
50
|
-
To launch it on a different port provide the `-p [port-number]` option like
|
73
|
+
To launch it on a different port provide the `-p [port-number]` option like
|
74
|
+
this:
|
51
75
|
|
52
76
|
tree-tagger-server -p 1234
|
53
77
|
|
54
78
|
It then launches at <http://localhost:1234>
|
55
79
|
|
56
|
-
Documentation on the Webservice is provided by surfing to the urls provided
|
57
|
-
|
80
|
+
Documentation on the Webservice is provided by surfing to the urls provided
|
81
|
+
above. For more information on how to launch a webservice run the command with
|
82
|
+
the `--help` option.
|
58
83
|
|
59
84
|
### Daemon
|
60
85
|
|
61
|
-
Last but not least the tree-tagger comes shipped with a daemon that can read
|
86
|
+
Last but not least the tree-tagger comes shipped with a daemon that can read
|
87
|
+
jobs (and write) jobs to and from Amazon SQS queues. For more information type:
|
62
88
|
|
63
89
|
tree-tagger-daemon -h
|
64
90
|
|
65
|
-
|
66
91
|
Description of dependencies
|
67
92
|
---------------------------
|
68
93
|
|
69
|
-
This component runs best if you run it in an environment suited for OpeNER
|
94
|
+
This component runs best if you run it in an environment suited for OpeNER
|
95
|
+
components. You can find an installation guide and helper tools in the
|
96
|
+
[OpeNER installer](https://github.com/opener-project/opener-installer) and an
|
97
|
+
[installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html).
|
70
98
|
|
71
99
|
At least you need the following system setup:
|
72
100
|
|
73
101
|
### Depenencies for normal use:
|
74
102
|
|
75
|
-
* Ruby (Tested on MRI and JRuby) 1.9.3
|
103
|
+
* Ruby (Tested on MRI and JRuby) 1.9.3
|
76
104
|
* Python 2.6
|
77
105
|
* LXML installed
|
78
|
-
* This module has a dependency on the following external module: TreeTagger
|
79
|
-
|
106
|
+
* This module has a dependency on the following external module: TreeTagger
|
107
|
+
(<http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/>) More
|
108
|
+
information is further down in this document.
|
109
|
+
* Tree tagger installed and it's path know in `TREE_TAGGER_PATH` environment
|
80
110
|
variable.
|
81
111
|
|
82
|
-
If TreeTagger is not installed in your machine you can download it from
|
112
|
+
If TreeTagger is not installed in your machine you can download it from
|
113
|
+
<http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/> and follow the
|
114
|
+
installation instructions. To indicate to our scripts where TreeTagger is
|
115
|
+
located, you have to set an environment variable with the location:
|
83
116
|
|
84
|
-
|
85
|
-
export TREE_TAGGER_PATH=/usr/local/TreeTagger/
|
86
|
-
```
|
117
|
+
export TREE_TAGGER_PATH=/usr/local/TreeTagger/
|
87
118
|
|
88
119
|
It is advised you add the path to the tree tagger in your bash or zsh profile by
|
89
|
-
adding it to
|
120
|
+
adding it to `~/.bash_profile` or `~/.zshrc`.
|
90
121
|
|
91
122
|
Language Extension
|
92
123
|
------------------
|
93
124
|
|
94
|
-
The tree-tagger depends on the availability of Tree Tagger models. Check out the
|
125
|
+
The tree-tagger depends on the availability of Tree Tagger models. Check out the
|
126
|
+
tree tagger website for more languages. Also you'll have to update the Python
|
127
|
+
files in the core directory.
|
95
128
|
|
96
129
|
The Core
|
97
130
|
--------
|
98
131
|
|
99
|
-
The component is a
|
132
|
+
The component is a wrapper around the actual language technology core. You
|
133
|
+
can find the core technolies in the core directory of this repository.
|
100
134
|
|
101
135
|
Where to go from here
|
102
136
|
---------------------
|
@@ -107,9 +141,9 @@ Where to go from here
|
|
107
141
|
Report problem/Get help
|
108
142
|
-----------------------
|
109
143
|
|
110
|
-
If you encounter problems, please email support@opener-project.eu or leave an
|
111
|
-
|
112
|
-
|
144
|
+
If you encounter problems, please email <support@opener-project.eu> or leave an
|
145
|
+
issue in the
|
146
|
+
[issue tracker](https://github.com/opener-project/tree-tagger/issues).
|
113
147
|
|
114
148
|
Contributing
|
115
149
|
------------
|
data/bin/tree-tagger-daemon
CHANGED
@@ -2,6 +2,9 @@
|
|
2
2
|
|
3
3
|
require 'opener/daemons'
|
4
4
|
|
5
|
-
|
5
|
+
controller = Opener::Daemons::Controller.new(
|
6
|
+
:name => 'opener-tree-tagger',
|
7
|
+
:exec_path => File.expand_path('../../exec/tree-tagger.rb', __FILE__)
|
8
|
+
)
|
6
9
|
|
7
|
-
|
10
|
+
controller.run
|
data/bin/tree-tagger-server
CHANGED
@@ -1,8 +1,10 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
|
-
require '
|
3
|
+
require 'opener/webservice'
|
4
4
|
|
5
|
-
|
5
|
+
parser = Opener::Webservice::OptionParser.new(
|
6
|
+
'opener-tree-tagger',
|
7
|
+
File.expand_path('../../config.ru', __FILE__)
|
8
|
+
)
|
6
9
|
|
7
|
-
|
8
|
-
cli.run
|
10
|
+
parser.run
|
data/exec/tree-tagger.rb
CHANGED
@@ -1,9 +1,9 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
3
|
require 'opener/daemons'
|
4
|
+
|
4
5
|
require_relative '../lib/opener/tree_tagger'
|
5
6
|
|
6
|
-
|
7
|
-
daemon = Opener::Daemons::Daemon.new(Opener::TreeTagger, options)
|
7
|
+
daemon = Opener::Daemons::Daemon.new(Opener::TreeTagger)
|
8
8
|
|
9
9
|
daemon.start
|
data/lib/opener/tree_tagger.rb
CHANGED
@@ -1,7 +1,6 @@
|
|
1
1
|
require 'open3'
|
2
2
|
require 'optparse'
|
3
3
|
require 'nokogiri'
|
4
|
-
require 'opener/core'
|
5
4
|
|
6
5
|
require_relative 'tree_tagger/version'
|
7
6
|
require_relative 'tree_tagger/cli'
|
@@ -36,9 +35,6 @@ module Opener
|
|
36
35
|
raise stderr unless process.success?
|
37
36
|
|
38
37
|
return stdout
|
39
|
-
|
40
|
-
rescue Exception => error
|
41
|
-
return Opener::Core::ErrorLayer.new(input, error.message, self.class).add
|
42
38
|
end
|
43
39
|
|
44
40
|
def capture(input)
|
@@ -1,5 +1,3 @@
|
|
1
|
-
require 'sinatra/base'
|
2
|
-
require 'httpclient'
|
3
1
|
require 'opener/webservice'
|
4
2
|
|
5
3
|
module Opener
|
@@ -7,10 +5,11 @@ module Opener
|
|
7
5
|
##
|
8
6
|
# Polarity tagger server powered by Sinatra.
|
9
7
|
#
|
10
|
-
class Server < Webservice
|
8
|
+
class Server < Webservice::Server
|
11
9
|
set :views, File.expand_path('../views', __FILE__)
|
12
|
-
|
13
|
-
|
10
|
+
|
11
|
+
self.text_processor = TreeTagger
|
12
|
+
self.accepted_params = [:input]
|
14
13
|
end # Server
|
15
14
|
end # PolarityTagger
|
16
15
|
end # Opener
|
data/opener-tree-tagger.gemspec
CHANGED
@@ -5,11 +5,13 @@ Gem::Specification.new do |gem|
|
|
5
5
|
gem.version = Opener::TreeTagger::VERSION
|
6
6
|
gem.authors = ["rubenIzquierdo", "sparkboxx"]
|
7
7
|
gem.email = ["ruben.izquierdobevia@vu.nl", "wilco@olery.com"]
|
8
|
-
gem.description = %q{Ruby wrapped KAF based Tree Tagger for 6 languages
|
8
|
+
gem.description = %q{Ruby wrapped KAF based Tree Tagger for 6 languages}
|
9
9
|
gem.summary = gem.description
|
10
10
|
gem.homepage = "http://opener-project.github.com/"
|
11
11
|
gem.extensions = ['ext/hack/Rakefile']
|
12
12
|
|
13
|
+
gem.license = 'Apache 2.0'
|
14
|
+
|
13
15
|
gem.files = Dir.glob([
|
14
16
|
'core/*',
|
15
17
|
'exec/*',
|
@@ -19,18 +21,17 @@ Gem::Specification.new do |gem|
|
|
19
21
|
'*.gemspec',
|
20
22
|
'*_requirements.txt',
|
21
23
|
'README.md',
|
24
|
+
'LICENSE.txt',
|
22
25
|
'task/*'
|
23
26
|
]).select { |file| File.file?(file) }
|
24
27
|
|
25
28
|
gem.executables = Dir.glob('bin/*').map { |file| File.basename(file) }
|
26
29
|
|
27
|
-
gem.add_dependency 'opener-daemons'
|
30
|
+
gem.add_dependency 'opener-daemons', '~> 2.1'
|
31
|
+
gem.add_dependency 'opener-webservice', '~> 2.1'
|
32
|
+
gem.add_dependency 'opener-core', '~> 2.0'
|
33
|
+
|
28
34
|
gem.add_dependency 'rake'
|
29
|
-
gem.add_dependency 'sinatra'
|
30
|
-
gem.add_dependency 'httpclient'
|
31
|
-
gem.add_dependency 'puma'
|
32
|
-
gem.add_dependency 'opener-webservice'
|
33
|
-
gem.add_dependency 'opener-core', ['>= 1.0.2', '~> 1.0']
|
34
35
|
gem.add_dependency 'nokogiri'
|
35
36
|
gem.add_dependency 'cliver'
|
36
37
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: opener-tree-tagger
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 4.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- rubenIzquierdo
|
@@ -9,80 +9,52 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2014-
|
12
|
+
date: 2014-11-24 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: opener-daemons
|
16
16
|
requirement: !ruby/object:Gem::Requirement
|
17
17
|
requirements:
|
18
|
-
- - "
|
19
|
-
- !ruby/object:Gem::Version
|
20
|
-
version: '0'
|
21
|
-
type: :runtime
|
22
|
-
prerelease: false
|
23
|
-
version_requirements: !ruby/object:Gem::Requirement
|
24
|
-
requirements:
|
25
|
-
- - ">="
|
26
|
-
- !ruby/object:Gem::Version
|
27
|
-
version: '0'
|
28
|
-
- !ruby/object:Gem::Dependency
|
29
|
-
name: rake
|
30
|
-
requirement: !ruby/object:Gem::Requirement
|
31
|
-
requirements:
|
32
|
-
- - ">="
|
33
|
-
- !ruby/object:Gem::Version
|
34
|
-
version: '0'
|
35
|
-
type: :runtime
|
36
|
-
prerelease: false
|
37
|
-
version_requirements: !ruby/object:Gem::Requirement
|
38
|
-
requirements:
|
39
|
-
- - ">="
|
40
|
-
- !ruby/object:Gem::Version
|
41
|
-
version: '0'
|
42
|
-
- !ruby/object:Gem::Dependency
|
43
|
-
name: sinatra
|
44
|
-
requirement: !ruby/object:Gem::Requirement
|
45
|
-
requirements:
|
46
|
-
- - ">="
|
18
|
+
- - "~>"
|
47
19
|
- !ruby/object:Gem::Version
|
48
|
-
version: '
|
20
|
+
version: '2.1'
|
49
21
|
type: :runtime
|
50
22
|
prerelease: false
|
51
23
|
version_requirements: !ruby/object:Gem::Requirement
|
52
24
|
requirements:
|
53
|
-
- - "
|
25
|
+
- - "~>"
|
54
26
|
- !ruby/object:Gem::Version
|
55
|
-
version: '
|
27
|
+
version: '2.1'
|
56
28
|
- !ruby/object:Gem::Dependency
|
57
|
-
name:
|
29
|
+
name: opener-webservice
|
58
30
|
requirement: !ruby/object:Gem::Requirement
|
59
31
|
requirements:
|
60
|
-
- - "
|
32
|
+
- - "~>"
|
61
33
|
- !ruby/object:Gem::Version
|
62
|
-
version: '
|
34
|
+
version: '2.1'
|
63
35
|
type: :runtime
|
64
36
|
prerelease: false
|
65
37
|
version_requirements: !ruby/object:Gem::Requirement
|
66
38
|
requirements:
|
67
|
-
- - "
|
39
|
+
- - "~>"
|
68
40
|
- !ruby/object:Gem::Version
|
69
|
-
version: '
|
41
|
+
version: '2.1'
|
70
42
|
- !ruby/object:Gem::Dependency
|
71
|
-
name:
|
43
|
+
name: opener-core
|
72
44
|
requirement: !ruby/object:Gem::Requirement
|
73
45
|
requirements:
|
74
|
-
- - "
|
46
|
+
- - "~>"
|
75
47
|
- !ruby/object:Gem::Version
|
76
|
-
version: '0'
|
48
|
+
version: '2.0'
|
77
49
|
type: :runtime
|
78
50
|
prerelease: false
|
79
51
|
version_requirements: !ruby/object:Gem::Requirement
|
80
52
|
requirements:
|
81
|
-
- - "
|
53
|
+
- - "~>"
|
82
54
|
- !ruby/object:Gem::Version
|
83
|
-
version: '0'
|
55
|
+
version: '2.0'
|
84
56
|
- !ruby/object:Gem::Dependency
|
85
|
-
name:
|
57
|
+
name: rake
|
86
58
|
requirement: !ruby/object:Gem::Requirement
|
87
59
|
requirements:
|
88
60
|
- - ">="
|
@@ -95,26 +67,6 @@ dependencies:
|
|
95
67
|
- - ">="
|
96
68
|
- !ruby/object:Gem::Version
|
97
69
|
version: '0'
|
98
|
-
- !ruby/object:Gem::Dependency
|
99
|
-
name: opener-core
|
100
|
-
requirement: !ruby/object:Gem::Requirement
|
101
|
-
requirements:
|
102
|
-
- - ">="
|
103
|
-
- !ruby/object:Gem::Version
|
104
|
-
version: 1.0.2
|
105
|
-
- - "~>"
|
106
|
-
- !ruby/object:Gem::Version
|
107
|
-
version: '1.0'
|
108
|
-
type: :runtime
|
109
|
-
prerelease: false
|
110
|
-
version_requirements: !ruby/object:Gem::Requirement
|
111
|
-
requirements:
|
112
|
-
- - ">="
|
113
|
-
- !ruby/object:Gem::Version
|
114
|
-
version: 1.0.2
|
115
|
-
- - "~>"
|
116
|
-
- !ruby/object:Gem::Version
|
117
|
-
version: '1.0'
|
118
70
|
- !ruby/object:Gem::Dependency
|
119
71
|
name: nokogiri
|
120
72
|
requirement: !ruby/object:Gem::Requirement
|
@@ -171,7 +123,7 @@ dependencies:
|
|
171
123
|
- - ">="
|
172
124
|
- !ruby/object:Gem::Version
|
173
125
|
version: '0'
|
174
|
-
description:
|
126
|
+
description: Ruby wrapped KAF based Tree Tagger for 6 languages
|
175
127
|
email:
|
176
128
|
- ruben.izquierdobevia@vu.nl
|
177
129
|
- wilco@olery.com
|
@@ -183,6 +135,7 @@ extensions:
|
|
183
135
|
- ext/hack/Rakefile
|
184
136
|
extra_rdoc_files: []
|
185
137
|
files:
|
138
|
+
- LICENSE.txt
|
186
139
|
- README.md
|
187
140
|
- bin/tree-tagger
|
188
141
|
- bin/tree-tagger-daemon
|
@@ -214,7 +167,8 @@ files:
|
|
214
167
|
- task/requirements.rake
|
215
168
|
- task/test.rake
|
216
169
|
homepage: http://opener-project.github.com/
|
217
|
-
licenses:
|
170
|
+
licenses:
|
171
|
+
- Apache 2.0
|
218
172
|
metadata: {}
|
219
173
|
post_install_message:
|
220
174
|
rdoc_options: []
|