opener-tokenizer 1.0.1 → 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +11 -15
- data/lib/opener/tokenizer/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 68417ce5d0cd433b5d46037849fe28f2d9672352
|
4
|
+
data.tar.gz: 481f30d8f1a16a929665b6895a7fe6d4689035b9
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f6ade2200023fe2a04cb89f490abdcd87254df45b53124a6965336e3a11309c01b01593f497c783ead7a213d87d07c0adc315e6236944bbdc632c5186f2149b0
|
7
|
+
data.tar.gz: 72f1d3c07c0fece79fb507cc7ad73d6cb0adab54982f1acceae57ddbabeaf6704d3a70975df184c414ef8ce00823486849f84f458a0064e0827e54edc5a24a64
|
data/README.md
CHANGED
@@ -7,7 +7,7 @@ The tokenizer tokenizes a text into sentences and words.
|
|
7
7
|
|
8
8
|
This software is part of a larger collection of natural language processing
|
9
9
|
tools known as "the OpeNER project". You can find more information about the
|
10
|
-
project at
|
10
|
+
project at [the OpeNER portal](http://opener-project.github.io). There you can
|
11
11
|
also find references to terms like KAF (an XML standard to represent linguistic
|
12
12
|
annotations in texts), component, cores, scenario's and pipelines.
|
13
13
|
|
@@ -25,7 +25,7 @@ output KAF by default.
|
|
25
25
|
### Command line interface
|
26
26
|
|
27
27
|
You should now be able to call the tokenizer as a regular shell
|
28
|
-
command: by its name. Once installed the gem
|
28
|
+
command: by its name. Once installed the gem normally sits in your path so you can call it directly from anywhere.
|
29
29
|
|
30
30
|
Tokenizing some text:
|
31
31
|
|
@@ -112,14 +112,11 @@ Description of dependencies
|
|
112
112
|
---------------------------
|
113
113
|
|
114
114
|
This component runs best if you run it in an environment suited for OpeNER
|
115
|
-
components. You can find an installation guide and helper tools in the (
|
116
|
-
installer)[https://github.com/opener-project/opener-installer] and (an
|
117
|
-
installation guide on the Opener
|
118
|
-
Website)[http://opener-project.github.io/getting-started/how-to/local-installation.html]
|
115
|
+
components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and [an installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
|
119
116
|
|
120
117
|
At least you need the following system setup:
|
121
118
|
|
122
|
-
###
|
119
|
+
### Dependencies for normal use:
|
123
120
|
|
124
121
|
* Perl 5
|
125
122
|
* MRI 1.9.3
|
@@ -132,27 +129,26 @@ At least you need the following system setup:
|
|
132
129
|
Language Extension
|
133
130
|
------------------
|
134
131
|
|
135
|
-
|
132
|
+
The tokenizer module is a wrapping around a Perl script, which performs the actual tokenization based on rules (when to break a character sequence). The tokenizer already supports a lot of languages. Have a look to the core script to figure out how to extend to new languages.
|
136
133
|
|
137
134
|
The Core
|
138
135
|
--------
|
139
136
|
|
140
|
-
The component is a fat wrapper around the actual language technology core. You
|
141
|
-
can find the core technolies in the following repositories:
|
137
|
+
The component is a fat wrapper around the actual language technology core. The core is a rule based tokenizer implemented in Perl. You can find the core technologies in the following repositories:
|
142
138
|
|
143
|
-
*
|
139
|
+
* [tokenizer-base](http://github.com/opener-project/tokenizer-base)
|
144
140
|
|
145
141
|
Where to go from here
|
146
142
|
---------------------
|
147
143
|
|
148
|
-
* Check
|
149
|
-
*
|
144
|
+
* [Check the project website](http://opener-project.github.io)
|
145
|
+
* [Checkout the webservice](http://opener.olery.com/tokenizer)
|
150
146
|
|
151
147
|
Report problem/Get help
|
152
148
|
-----------------------
|
153
149
|
|
154
|
-
If you encounter problems, please email support@opener-project.eu or leave an
|
155
|
-
issue in the
|
150
|
+
If you encounter problems, please email <support@opener-project.eu> or leave an
|
151
|
+
issue in the [issue tracker](https://github.com/opener-project/tokenizer/issues).
|
156
152
|
|
157
153
|
|
158
154
|
Contributing
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: opener-tokenizer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- development@olery.com
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-05-
|
11
|
+
date: 2014-05-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: opener-tokenizer-base
|