opener-tokenizer-base 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (44) hide show
  1. checksums.yaml +7 -0
  2. data/README.md +148 -0
  3. data/bin/tokenizer-base +5 -0
  4. data/bin/tokenizer-de +5 -0
  5. data/bin/tokenizer-en +5 -0
  6. data/bin/tokenizer-es +5 -0
  7. data/bin/tokenizer-fr +5 -0
  8. data/bin/tokenizer-it +5 -0
  9. data/bin/tokenizer-nl +5 -0
  10. data/core/lib/Data/OptList.pm +256 -0
  11. data/core/lib/Params/Util.pm +866 -0
  12. data/core/lib/Sub/Exporter.pm +1101 -0
  13. data/core/lib/Sub/Exporter/Cookbook.pod +309 -0
  14. data/core/lib/Sub/Exporter/Tutorial.pod +280 -0
  15. data/core/lib/Sub/Exporter/Util.pm +354 -0
  16. data/core/lib/Sub/Install.pm +329 -0
  17. data/core/lib/Time/Stamp.pm +808 -0
  18. data/core/load-prefixes.pl +43 -0
  19. data/core/nonbreaking_prefixes/abbreviation_list.kaf +0 -0
  20. data/core/nonbreaking_prefixes/abbreviation_list.txt +444 -0
  21. data/core/nonbreaking_prefixes/nonbreaking_prefix.ca +533 -0
  22. data/core/nonbreaking_prefixes/nonbreaking_prefix.de +781 -0
  23. data/core/nonbreaking_prefixes/nonbreaking_prefix.el +448 -0
  24. data/core/nonbreaking_prefixes/nonbreaking_prefix.en +564 -0
  25. data/core/nonbreaking_prefixes/nonbreaking_prefix.es +758 -0
  26. data/core/nonbreaking_prefixes/nonbreaking_prefix.fr +1027 -0
  27. data/core/nonbreaking_prefixes/nonbreaking_prefix.is +697 -0
  28. data/core/nonbreaking_prefixes/nonbreaking_prefix.it +641 -0
  29. data/core/nonbreaking_prefixes/nonbreaking_prefix.nl +739 -0
  30. data/core/nonbreaking_prefixes/nonbreaking_prefix.pl +729 -0
  31. data/core/nonbreaking_prefixes/nonbreaking_prefix.pt +656 -0
  32. data/core/nonbreaking_prefixes/nonbreaking_prefix.ro +484 -0
  33. data/core/nonbreaking_prefixes/nonbreaking_prefix.ru +705 -0
  34. data/core/nonbreaking_prefixes/nonbreaking_prefix.sk +920 -0
  35. data/core/nonbreaking_prefixes/nonbreaking_prefix.sl +524 -0
  36. data/core/nonbreaking_prefixes/nonbreaking_prefix.sv +492 -0
  37. data/core/split-sentences.pl +114 -0
  38. data/core/text-fixer.pl +169 -0
  39. data/core/tokenizer-cli.pl +363 -0
  40. data/core/tokenizer.pl +145 -0
  41. data/lib/opener/tokenizers/base.rb +84 -0
  42. data/lib/opener/tokenizers/base/version.rb +8 -0
  43. data/opener-tokenizer-base.gemspec +25 -0
  44. metadata +134 -0
@@ -0,0 +1,309 @@
1
+
2
+ # ABSTRACT: useful, demonstrative, or stupid Sub::Exporter tricks
3
+ # PODNAME: Sub::Exporter::Cookbook
4
+
5
+
6
+
7
+ __END__
8
+ =pod
9
+
10
+ =head1 NAME
11
+
12
+ Sub::Exporter::Cookbook - useful, demonstrative, or stupid Sub::Exporter tricks
13
+
14
+ =head1 VERSION
15
+
16
+ version 0.984
17
+
18
+ =head1 OVERVIEW
19
+
20
+ Sub::Exporter is a fairly simple tool, and can be used to achieve some very
21
+ simple goals. Its basic behaviors and their basic application (that is,
22
+ "traditional" exporting of routines) are described in
23
+ L<Sub::Exporter::Tutorial> and L<Sub::Exporter>. This document presents
24
+ applications that may not be immediately obvious, or that can demonstrate how
25
+ certain features can be put to use (for good or evil).
26
+
27
+ =head1 THE RECIPES
28
+
29
+ =head2 Exporting Methods as Routines
30
+
31
+ With Exporter.pm, exporting methods is a non-starter. Sub::Exporter makes it
32
+ simple. By using the C<curry_method> utility provided in
33
+ L<Sub::Exporter::Util>, a method can be exported with the invocant built in.
34
+
35
+ package Object::Strenuous;
36
+
37
+ use Sub::Exporter::Util 'curry_method';
38
+ use Sub::Exporter -setup => {
39
+ exports => [ objection => curry_method('new') ],
40
+ };
41
+
42
+ With this configuration, the importing code may contain:
43
+
44
+ my $obj = objection("irrelevant");
45
+
46
+ ...and this will be equivalent to:
47
+
48
+ my $obj = Object::Strenuous->new("irrelevant");
49
+
50
+ The built-in invocant is determined by the invocant for the C<import> method.
51
+ That means that if we were to subclass Object::Strenuous as follows:
52
+
53
+ package Object::Strenuous::Repeated;
54
+ @ISA = 'Object::Strenuous';
55
+
56
+ ...then importing C<objection> from the subclass would build-in that subclass.
57
+
58
+ Finally, since the invocant can be an object, you can write something like
59
+ this:
60
+
61
+ package Cypher;
62
+ use Sub::Exporter::Util 'curry_method';
63
+ use Sub::Exporter -setup => {
64
+ exports => [ encypher => curry_method ],
65
+ };
66
+
67
+ with the expectation that C<import> will be called on an instantiated Cypher
68
+ object:
69
+
70
+ BEGIN {
71
+ my $cypher = Cypher->new( ... );
72
+ $cypher->import('encypher');
73
+ }
74
+
75
+ Now there is a globally-available C<encypher> routine which calls the encypher
76
+ method on an otherwise unavailable Cypher object.
77
+
78
+ =head2 Exporting Methods as Methods
79
+
80
+ While exporting modules usually export subroutines to be called as subroutines,
81
+ it's easy to use Sub::Exporter to export subroutines meant to be called as
82
+ methods on the importing package or its objects.
83
+
84
+ Here's a trivial (and naive) example:
85
+
86
+ package Mixin::DumpObj;
87
+
88
+ use Data::Dumper;
89
+
90
+ use Sub::Exporter -setup => {
91
+ exports => [ qw(dump) ]
92
+ };
93
+
94
+ sub dump {
95
+ my ($self) = @_;
96
+ return Dumper($self);
97
+ }
98
+
99
+ When writing your own object class, you can then import C<dump> to be used as a
100
+ method, called like so:
101
+
102
+ $object->dump;
103
+
104
+ By assuming that the importing class will provide a certain interface, a
105
+ method-exporting module can be used as a simple plugin:
106
+
107
+ package Number::Plugin::Upto;
108
+ use Sub::Exporter -setup => {
109
+ into => 'Number',
110
+ exports => [ qw(upto) ],
111
+ groups => [ default => [ qw(upto) ] ],
112
+ };
113
+
114
+ sub upto {
115
+ my ($self) = @_;
116
+ return 1 .. abs($self->as_integer);
117
+ }
118
+
119
+ The C<into> line in the configuration says that this plugin will export, by
120
+ default, into the Number package, not into the C<use>-ing package. It can be
121
+ exported anyway, though, and will work as long as the destination provides an
122
+ C<as_integer> method like the one it expects. To import it to a different
123
+ destination, one can just write:
124
+
125
+ use Number::Plugin::Upto { into => 'Quantity' };
126
+
127
+ =head2 Mixing-in Complex External Behavior
128
+
129
+ When exporting methods to be used as methods (see above), one very powerful
130
+ option is to export methods that are generated routines that maintain an
131
+ enclosed reference to the exporting module. This allows a user to import a
132
+ single method which is implemented in terms of a complete, well-structured
133
+ package.
134
+
135
+ Here is a very small example:
136
+
137
+ package Data::Analyzer;
138
+
139
+ use Sub::Exporter -setup => {
140
+ exports => [ analyze => \'_generate_analyzer' ],
141
+ };
142
+
143
+ sub _generate_analyzer {
144
+ my ($mixin, $name, $arg, $col) = @_;
145
+
146
+ return sub {
147
+ my ($self) = @_;
148
+
149
+ my $values = [ $self->values ];
150
+
151
+ my $analyzer = $mixin->new($values);
152
+ $analyzer->perform_analysis;
153
+ $analyzer->aggregate_results;
154
+
155
+ return $analyzer->summary;
156
+ };
157
+ }
158
+
159
+ If imported by any package providing a C<values> method, this plugin will
160
+ provide a single C<analyze> method that acts as a simple interface to a more
161
+ complex set of behaviors.
162
+
163
+ Even more importantly, because the C<$mixin> value will be the invocant on
164
+ which the C<import> was actually called, one can subclass C<Data::Analyzer> and
165
+ replace only individual pieces of the complex behavior, making it easy to write
166
+ complex, subclassable toolkits with simple single points of entry for external
167
+ interfaces.
168
+
169
+ =head2 Exporting Constants
170
+
171
+ While Sub::Exporter isn't in the constant-exporting business, it's easy to
172
+ export constants by using one of its sister modules, Package::Generator.
173
+
174
+ package Important::Constants;
175
+
176
+ use Sub::Exporter -setup => {
177
+ collectors => [ constants => \'_set_constants' ],
178
+ };
179
+
180
+ sub _set_constants {
181
+ my ($class, $value, $data) = @_;
182
+
183
+ Package::Generator->assign_symbols(
184
+ $data->{into},
185
+ [
186
+ MEANING_OF_LIFE => \42,
187
+ ONE_TRUE_BASE => \13,
188
+ FACTORS => [ 6, 9 ],
189
+ ],
190
+ );
191
+
192
+ return 1;
193
+ }
194
+
195
+ Then, someone can write:
196
+
197
+ use Important::Constants 'constants';
198
+
199
+ print "The factors @FACTORS produce $MEANING_OF_LIFE in $ONE_TRUE_BASE.";
200
+
201
+ (The constants must be exported via a collector, because they are effectively
202
+ altering the importing class in a way other than installing subroutines.)
203
+
204
+ =head2 Altering the Importer's @ISA
205
+
206
+ It's trivial to make a collector that changes the inheritance of an importing
207
+ package:
208
+
209
+ use Sub::Exporter -setup => {
210
+ collectors => { -base => \'_make_base' },
211
+ };
212
+
213
+ sub _make_base {
214
+ my ($class, $value, $data) = @_;
215
+
216
+ my $target = $data->{into};
217
+ push @{"$target\::ISA"}, $class;
218
+ }
219
+
220
+ Then, the user of your class can write:
221
+
222
+ use Some::Class -base;
223
+
224
+ and become a subclass. This can be quite useful in building, for example, a
225
+ module that helps build plugins. We may want a few utilities imported, but we
226
+ also want to inherit behavior from some base plugin class;
227
+
228
+ package Framework::Util;
229
+
230
+ use Sub::Exporter -setup => {
231
+ exports => [ qw(log global_config) ],
232
+ groups => [ _plugin => [ qw(log global_config) ]
233
+ collectors => { '-plugin' => \'_become_plugin' },
234
+ };
235
+
236
+ sub _become_plugin {
237
+ my ($class, $value, $data) = @_;
238
+
239
+ my $target = $data->{into};
240
+ push @{"$target\::ISA"}, $class->plugin_base_class;
241
+
242
+ push @{ $data->{import_args} }, '-_plugin';
243
+ }
244
+
245
+ Now, you can write a plugin like this:
246
+
247
+ package Framework::Plugin::AirFreshener;
248
+ use Framework::Util -plugin;
249
+
250
+ =head2 Eating Exporter.pm's Brain
251
+
252
+ You probably shouldn't actually do this in production. It's offered more as a
253
+ demonstration than a suggestion.
254
+
255
+ sub exporter_upgrade {
256
+ my ($pkg) = @_;
257
+ my $new_pkg = "$pkg\::UsingSubExporter";
258
+
259
+ return $new_pkg if $new_pkg->isa($pkg);
260
+
261
+ Sub::Exporter::setup_exporter({
262
+ as => 'import',
263
+ into => $new_pkg,
264
+ exports => [ @{"$pkg\::EXPORT_OK"} ],
265
+ groups => {
266
+ %{{"$pkg\::EXPORT_TAG"},
267
+ default => [ @{"$pkg\::EXPORTS"} ],
268
+ },
269
+ });
270
+
271
+ @{"$new_pkg\::ISA"} = $class;
272
+ return $new_pkg;
273
+ }
274
+
275
+ This routine, given the name of an existing package configured to use
276
+ Exporter.pm, returns the name of a new package with a Sub::Exporter-powered
277
+ C<import> routine. This lets you write:
278
+
279
+ BEGIN {
280
+ require Toolkit;
281
+ exporter_upgrade('Toolkit')->import(exported_sub => { -as => 'foo' })
282
+ }
283
+
284
+ If you're feeling particularly naughty, this routine could have been declared
285
+ in the UNIVERSAL package, meaning you could write:
286
+
287
+ BEGIN {
288
+ require Toolkit;
289
+ Toolkit->exporter_upgrade->import(exported_sub => { -as => 'foo' })
290
+ }
291
+
292
+ The new package will have all the same exporter configuration as the original,
293
+ but will support export and group renaming, including exporting into scalar
294
+ references. Further, since Sub::Exporter uses C<can> to find the routine being
295
+ exported, the new package may be subclassed and some of its exports replaced.
296
+
297
+ =head1 AUTHOR
298
+
299
+ Ricardo Signes <rjbs@cpan.org>
300
+
301
+ =head1 COPYRIGHT AND LICENSE
302
+
303
+ This software is copyright (c) 2007 by Ricardo Signes.
304
+
305
+ This is free software; you can redistribute it and/or modify it under
306
+ the same terms as the Perl 5 programming language system itself.
307
+
308
+ =cut
309
+
@@ -0,0 +1,280 @@
1
+
2
+ # PODNAME: Sub::Exporter::Tutorial
3
+ # ABSTRACT: a friendly guide to exporting with Sub::Exporter
4
+
5
+
6
+ __END__
7
+ =pod
8
+
9
+ =head1 NAME
10
+
11
+ Sub::Exporter::Tutorial - a friendly guide to exporting with Sub::Exporter
12
+
13
+ =head1 VERSION
14
+
15
+ version 0.984
16
+
17
+ =head1 DESCRIPTION
18
+
19
+ =head2 What's an Exporter?
20
+
21
+ When you C<use> a module, first it is required, then its C<import> method is
22
+ called. The Perl documentation tells us that the following two lines are
23
+ equivalent:
24
+
25
+ use Module LIST;
26
+
27
+ BEGIN { require Module; Module->import(LIST); }
28
+
29
+ The import method is the module's I<exporter>.
30
+
31
+ =head2 The Basics of Sub::Exporter
32
+
33
+ Sub::Exporter builds a custom exporter which can then be installed into your
34
+ module. It builds this method based on configuration passed to its
35
+ C<setup_exporter> method.
36
+
37
+ A very basic use case might look like this:
38
+
39
+ package Addition;
40
+ use Sub::Exporter;
41
+ Sub::Exporter::setup_exporter({ exports => [ qw(plus) ]});
42
+
43
+ sub plus { my ($x, $y) = @_; return $x + $y; }
44
+
45
+ This would mean that when someone used your Addition module, they could have
46
+ its C<plus> routine imported into their package:
47
+
48
+ use Addition qw(plus);
49
+
50
+ my $z = plus(2, 2); # this works, because now plus is in the main package
51
+
52
+ That syntax to set up the exporter, above, is a little verbose, so for the
53
+ simple case of just naming some exports, you can write this:
54
+
55
+ use Sub::Exporter -setup => { exports => [ qw(plus) ] };
56
+
57
+ ...which is the same as the original example -- except that now the exporter is
58
+ built and installed at compile time. Well, that and you typed less.
59
+
60
+ =head2 Using Export Groups
61
+
62
+ You can specify whole groups of things that should be exportable together.
63
+ These are called groups. L<Exporter> calls these tags. To specify groups, you
64
+ just pass a C<groups> key in your exporter configuration:
65
+
66
+ package Food;
67
+ use Sub::Exporter -setup => {
68
+ exports => [ qw(apple banana beef fluff lox rabbit) ],
69
+ groups => {
70
+ fauna => [ qw(beef lox rabbit) ],
71
+ flora => [ qw(apple banana) ],
72
+ }
73
+ };
74
+
75
+ Now, to import all that delicious foreign meat, your consumer needs only to
76
+ write:
77
+
78
+ use Food qw(:fauna);
79
+ use Food qw(-fauna);
80
+
81
+ Either one of the above is acceptable. A colon is more traditional, but
82
+ barewords with a leading colon can't be enquoted by a fat arrow. We'll see why
83
+ that matters later on.
84
+
85
+ Groups can contain other groups. If you include a group name (with the leading
86
+ dash or colon) in a group definition, it will be expanded recursively when the
87
+ exporter is called. The exporter will B<not> recurse into the same group twice
88
+ while expanding groups.
89
+
90
+ There are two special groups: C<all> and C<default>. The C<all> group is
91
+ defined by default, and contains all exportable subs. You can redefine it,
92
+ if you want to export only a subset when all exports are requested. The
93
+ C<default> group is the set of routines to export when nothing specific is
94
+ requested. By default, there is no C<default> group.
95
+
96
+ =head2 Renaming Your Imports
97
+
98
+ Sometimes you want to import something, but you don't like the name as which
99
+ it's imported. Sub::Exporter can rename your imports for you. If you wanted
100
+ to import C<lox> from the Food package, but you don't like the name, you could
101
+ write this:
102
+
103
+ use Food lox => { -as => 'salmon' };
104
+
105
+ Now you'd get the C<lox> routine, but it would be called salmon in your
106
+ package. You can also rename entire groups by using the C<prefix> option:
107
+
108
+ use Food -fauna => { -prefix => 'cute_little_' };
109
+
110
+ Now you can call your C<cute_little_rabbit> routine. (You can also call
111
+ C<cute_little_beef>, but that hardly seems as enticing.)
112
+
113
+ When you define groups, you can include renaming.
114
+
115
+ use Sub::Exporter -setup => {
116
+ exports => [ qw(apple banana beef fluff lox rabbit) ],
117
+ groups => {
118
+ fauna => [ qw(beef lox), rabbit => { -as => 'coney' } ],
119
+ }
120
+ };
121
+
122
+ A prefix on a group like that does the right thing. This is when it's useful
123
+ to use a dash instead of a colon to indicate a group: you can put a fat arrow
124
+ between the group and its arguments, then.
125
+
126
+ use Food -fauna => { -prefix => 'lovely_' };
127
+
128
+ eat( lovely_coney ); # this works
129
+
130
+ Prefixes also apply recursively. That means that this code works:
131
+
132
+ use Sub::Exporter -setup => {
133
+ exports => [ qw(apple banana beef fluff lox rabbit) ],
134
+ groups => {
135
+ fauna => [ qw(beef lox), rabbit => { -as => 'coney' } ],
136
+ allowed => [ -fauna => { -prefix => 'willing_' }, 'banana' ],
137
+ }
138
+ };
139
+
140
+ ...
141
+
142
+ use Food -allowed => { -prefix => 'any_' };
143
+
144
+ $dinner = any_willing_coney; # yum!
145
+
146
+ Groups can also be passed a C<-suffix> argument.
147
+
148
+ Finally, if the C<-as> argument to an exported routine is a reference to a
149
+ scalar, a reference to the routine will be placed in that scalar.
150
+
151
+ =head2 Building Subroutines to Order
152
+
153
+ Sometimes, you want to export things that you don't have on hand. You might
154
+ want to offer customized routines built to the specification of your consumer;
155
+ that's just good business! With Sub::Exporter, this is easy.
156
+
157
+ To offer subroutines to order, you need to provide a generator when you set up
158
+ your exporter. A generator is just a routine that returns a new routine.
159
+ L<perlref> is talking about these when it discusses closures and function
160
+ templates. The canonical example of a generator builds a unique incrementor;
161
+ here's how you'd do that with Sub::Exporter;
162
+
163
+ package Package::Counter;
164
+ use Sub::Exporter -setup => {
165
+ exports => [ counter => sub { my $i = 0; sub { $i++ } } ],
166
+ groups => { default => [ qw(counter) ] },
167
+ };
168
+
169
+ Now anyone can use your Package::Counter module and he'll receive a C<counter>
170
+ in his package. It will count up by one, and will never interfere with anyone
171
+ else's counter.
172
+
173
+ This isn't very useful, though, unless the consumer can explain what he wants.
174
+ This is done, in part, by supplying arguments when importing. The following
175
+ example shows how a generator can take and use arguments:
176
+
177
+ package Package::Counter;
178
+
179
+ sub _build_counter {
180
+ my ($class, $name, $arg) = @_;
181
+ $arg ||= {};
182
+ my $i = $arg->{start} || 0;
183
+ return sub { $i++ };
184
+ }
185
+
186
+ use Sub::Exporter -setup => {
187
+ exports => [ counter => \'_build_counter' ],
188
+ groups => { default => [ qw(counter) ] },
189
+ };
190
+
191
+ Now, the consumer can (if he wants) specify a starting value for his counter:
192
+
193
+ use Package::Counter counter => { start => 10 };
194
+
195
+ Arguments to a group are passed along to the generators of routines in that
196
+ group, but Sub::Exporter arguments -- anything beginning with a dash -- are
197
+ never passed in. When groups are nested, the arguments are merged as the
198
+ groups are expanded.
199
+
200
+ Notice, too, that in the example above, we gave a reference to a method I<name>
201
+ rather than a method I<implementation>. By giving the name rather than the
202
+ subroutine, we make it possible for subclasses of our "Package::Counter" module
203
+ to replace the C<_build_counter> method.
204
+
205
+ When a generator is called, it is passed four parameters:
206
+
207
+ =over
208
+
209
+ =item * the invocant on which the exporter was called
210
+
211
+ =item * the name of the export being generated (not the name it's being installed as)
212
+
213
+ =item * the arguments supplied for the routine
214
+
215
+ =item * the collection of generic arguments
216
+
217
+ =back
218
+
219
+ The fourth item is the last major feature that hasn't been covered.
220
+
221
+ =head2 Argument Collectors
222
+
223
+ Sometimes you will want to accept arguments once that can then be available to
224
+ any subroutine that you're going to export. To do this, you specify
225
+ collectors, like this:
226
+
227
+ package Menu::Airline
228
+ use Sub::Exporter -setup => {
229
+ exports => ... ,
230
+ groups => ... ,
231
+ collectors => [ qw(allergies ethics) ],
232
+ };
233
+
234
+ Collectors look like normal exports in the import call, but they don't do
235
+ anything but collect data which can later be passed to generators. If the
236
+ module was used like this:
237
+
238
+ use Menu::Airline allergies => [ qw(peanuts) ], ethics => [ qw(vegan) ];
239
+
240
+ ...the consumer would get a salad. Also, all the generators would be passed,
241
+ as their fourth argument, something like this:
242
+
243
+ { allerges => [ qw(peanuts) ], ethics => [ qw(vegan) ] }
244
+
245
+ Generators may have arguments in their definition, as well. These must be code
246
+ refs that perform validation of the collected values. They are passed the
247
+ collection value and may return true or false. If they return false, the
248
+ exporter will throw an exception.
249
+
250
+ =head2 Generating Many Routines in One Scope
251
+
252
+ Sometimes it's useful to have multiple routines generated in one scope. This
253
+ way they can share lexical data which is otherwise unavailable. To do this,
254
+ you can supply a generator for a group which returns a hashref of names and
255
+ code references. This generator is passed all the usual data, and the group
256
+ may receive the usual C<-prefix> or C<-suffix> arguments.
257
+
258
+ =head1 SEE ALSO
259
+
260
+ =over 4
261
+
262
+ =item *
263
+
264
+ L<Sub::Exporter> for complete documentation and references to other exporters
265
+
266
+ =back
267
+
268
+ =head1 AUTHOR
269
+
270
+ Ricardo Signes <rjbs@cpan.org>
271
+
272
+ =head1 COPYRIGHT AND LICENSE
273
+
274
+ This software is copyright (c) 2007 by Ricardo Signes.
275
+
276
+ This is free software; you can redistribute it and/or modify it under
277
+ the same terms as the Perl 5 programming language system itself.
278
+
279
+ =cut
280
+