piglet 0.2.3 → 0.2.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +52 -7
- data/TODO +4 -0
- data/lib/piglet/interpreter.rb +38 -11
- data/lib/piglet/relation/relation.rb +7 -9
- data/lib/piglet/relation/stream.rb +34 -0
- data/lib/piglet/udf/define.rb +68 -0
- data/lib/piglet/udf/register.rb +13 -0
- data/lib/piglet.rb +6 -1
- data/spec/piglet_spec.rb +140 -0
- metadata +6 -2
data/README.rdoc
CHANGED
@@ -213,6 +213,8 @@ Binary conditionals, a.k.a. the ternary operator are supported through +test+ (u
|
|
213
213
|
|
214
214
|
The first argument to +test+ is the test expression, the second is the if-true expression and the third is the if-false expression.
|
215
215
|
|
216
|
+
<code>FOREACH { … } GENERATE</code>, a.k.a. nested foreach or foreach-with-inner-bag is currently not supported. Nor is specifying the schema of the resulting relation, or +FLATTEN+ operations.
|
217
|
+
|
216
218
|
== +split+
|
217
219
|
|
218
220
|
The syntax of +split+ shouldn't be surprising if you've read this far, but there's perhaps some details that aren't obvious. To split a relation into a number of parts you call +split+ on the relation and pass a block in which you specify the expressions describing each shard. Just as with +filter+ and +foreach+ the block receives the relation as an argument. +split+ returns an array containing the relation shards and you can use parallel assignment to make it look really nice:
|
@@ -238,6 +240,37 @@ I'm not absolutely sure that it is legal to join or cogroup on more than one fie
|
|
238
240
|
|
239
241
|
a.cogroup(b => [:y, :inner], a => [:z, :outer]) # => COGROUP b BY y INNER, a BY z OUTER
|
240
242
|
|
243
|
+
== +stream+, +define+ & +register+
|
244
|
+
|
245
|
+
The +STREAM+ operator is supported through <code>stream</code>. You can either stream through a command, or through a command reference.
|
246
|
+
|
247
|
+
To stream through a command, use this syntax:
|
248
|
+
|
249
|
+
a.stream(:command => 'cut -f 3') # => STREAM a THROUGH `cut -f 3`
|
250
|
+
|
251
|
+
to define a command and then stream a relation through that command, use this syntax:
|
252
|
+
|
253
|
+
define(:reverse, :command => 'reverse.rb') # => DEFINE reverse `reverse.rb`
|
254
|
+
a.stream(:reverse) # => STREAM a THROUGH reverse
|
255
|
+
|
256
|
+
You can also use +define+ to define function references:
|
257
|
+
|
258
|
+
define(:hello, :function => 'com.example.Hello') # => DEFINE hello com.example.Hello
|
259
|
+
|
260
|
+
If you need to register a JAR you can use +register+:
|
261
|
+
|
262
|
+
register('path/to/lib.jar') # => REGISTER path/to/lib.jar
|
263
|
+
|
264
|
+
Streaming multiple relations is supported, just pass more relations as an array:
|
265
|
+
|
266
|
+
a.stream([b, c, d], :reverse) # => STREAM a, b, c THROUGH reverse
|
267
|
+
|
268
|
+
And finally, this is how you specify the schema of the resulting relation:
|
269
|
+
|
270
|
+
a.stream(:reverse, :schema => [:x, :y]) # => STREAM a THROUGH reverse AS (x, y)
|
271
|
+
|
272
|
+
the schema syntax is the same as for +load+, and you can read more about it under “Types & schemas” below.
|
273
|
+
|
241
274
|
== <code>:parallel</code>
|
242
275
|
|
243
276
|
For some operators in Pig Latin you can specify the <code>PARALLEL</code> keyword to tell Pig how many reducers
|
@@ -351,6 +384,7 @@ The following Pig operators are supported:
|
|
351
384
|
|
352
385
|
* +COGROUP+
|
353
386
|
* +CROSS+
|
387
|
+
* +DEFINE+
|
354
388
|
* +DESCRIBE+
|
355
389
|
* +DISTINCT+
|
356
390
|
* +DUMP+
|
@@ -363,32 +397,39 @@ The following Pig operators are supported:
|
|
363
397
|
* +LIMIT+
|
364
398
|
* +LOAD+
|
365
399
|
* +ORDER+
|
400
|
+
* +REGISTER+
|
366
401
|
* +SAMPLE+
|
367
402
|
* +SPLIT+
|
368
403
|
* +STORE+
|
404
|
+
* +STREAM+
|
369
405
|
* +UNION+
|
370
|
-
|
406
|
+
|
371
407
|
The following are currently not supported (but will be soon):
|
372
408
|
|
373
|
-
*
|
374
|
-
*
|
375
|
-
*
|
376
|
-
* +REGISTER+
|
409
|
+
* <code>FOREACH { … } GENERATE</code>
|
410
|
+
* <code>%DECLARE</code>
|
411
|
+
* <code>%DEFAULT</code>
|
377
412
|
|
378
413
|
The file commands (+cd+, +cat+, etc.) will probably not be supported for the forseeable future.
|
379
414
|
|
380
|
-
|
415
|
+
UDF:s are sort of supported. You can declare a UDF with +declare+ and you can use it in some situations by using a string, but some situations are trickier (like renaming the result of a UDF call in a <code>FOREACH … GENERATE</code> body).
|
416
|
+
|
417
|
+
All the aggregate functions except two are supported:
|
381
418
|
|
382
419
|
* +AVG+
|
383
420
|
* +CONCAT+
|
384
421
|
* +COUNT+
|
385
|
-
* +DIFF+
|
386
422
|
* +IsEmpty+
|
387
423
|
* +MAX+
|
388
424
|
* +MIN+
|
389
425
|
* +SIZE+
|
390
426
|
* +SUM+
|
391
427
|
* +TOKENIZE+
|
428
|
+
|
429
|
+
These are not supported yet:
|
430
|
+
|
431
|
+
* +DIFF+
|
432
|
+
* +FLATTEN+
|
392
433
|
|
393
434
|
Piglet only supports most arithmetic and logic operators (see below) on fields -- but check the output and make sure that it's doing what you expect because some it's tricky to see where Piglet hijacks the operators and when it's Ruby that is running the show. I'm doing the best I can, but there are many things that can't be done, at least not in Ruby 1.8.
|
394
435
|
|
@@ -442,6 +483,10 @@ When you call +group+, +filter+ or any of the other methods that can be applied
|
|
442
483
|
|
443
484
|
As a side effect of using +store+ and the other output operators as the trigger for creating the needed relational operations any relations that are not ancestors of relations that are outputed will not be included in the Pig Latin output. On the other hand, they would be no-ops when run by Pig anyway.
|
444
485
|
|
486
|
+
=== The output is not what I expected!
|
487
|
+
|
488
|
+
Please contact me and give me the Piglet code and what you think the output should be. I'll try to either fix your Piglet code, or fix Piglet to do what you expect it to do.
|
489
|
+
|
445
490
|
== Copyright
|
446
491
|
|
447
492
|
© 2009-2010 Theo Hultberg / Iconara. See LICENSE for details.
|
data/TODO
ADDED
data/lib/piglet/interpreter.rb
CHANGED
@@ -4,7 +4,7 @@ require 'set'
|
|
4
4
|
module Piglet
|
5
5
|
class Interpreter
|
6
6
|
def initialize(&block)
|
7
|
-
@
|
7
|
+
@top_level_statements = [ ]
|
8
8
|
|
9
9
|
interpret(&block) if block_given?
|
10
10
|
end
|
@@ -20,18 +20,18 @@ module Piglet
|
|
20
20
|
def to_pig_latin(&block)
|
21
21
|
interpret(&block) if block_given?
|
22
22
|
|
23
|
-
return '' if @
|
23
|
+
return '' if @top_level_statements.empty?
|
24
24
|
|
25
25
|
handled_relations = Set.new
|
26
26
|
statements = [ ]
|
27
27
|
|
28
|
-
@
|
29
|
-
|
30
|
-
assignments(
|
28
|
+
@top_level_statements.each do |top_level_statement|
|
29
|
+
if top_level_statement.respond_to?(:relation) && ! top_level_statement.relation.nil?
|
30
|
+
assignments(top_level_statement.relation, handled_relations).each do |assignment|
|
31
31
|
statements << assignment
|
32
32
|
end
|
33
33
|
end
|
34
|
-
statements <<
|
34
|
+
statements << top_level_statement
|
35
35
|
end
|
36
36
|
|
37
37
|
statements.flatten.map { |s| s.to_s }.join(";\n") + ";\n"
|
@@ -62,28 +62,28 @@ module Piglet
|
|
62
62
|
# store(x, 'some/path', :using => 'Xyz') # => STORE x INTO 'some/path' USING Xyz
|
63
63
|
# store(x, 'some/path', :using => :pig_storage) # => STORE x INTO 'some/path' USING PigStorage
|
64
64
|
def store(relation, path, options={})
|
65
|
-
@
|
65
|
+
@top_level_statements << Inout::Store.new(relation, path, options)
|
66
66
|
end
|
67
67
|
|
68
68
|
# DUMP
|
69
69
|
#
|
70
70
|
# dump(x) # => DUMP x
|
71
71
|
def dump(relation)
|
72
|
-
@
|
72
|
+
@top_level_statements << Inout::Dump.new(relation)
|
73
73
|
end
|
74
74
|
|
75
75
|
# ILLUSTRATE
|
76
76
|
#
|
77
77
|
# illustrate(x) # => ILLUSTRATE x
|
78
78
|
def illustrate(relation)
|
79
|
-
@
|
79
|
+
@top_level_statements << Inout::Illustrate.new(relation)
|
80
80
|
end
|
81
81
|
|
82
82
|
# DESCRIBE
|
83
83
|
#
|
84
84
|
# describe(x) # => DESCRIBE x
|
85
85
|
def describe(relation)
|
86
|
-
@
|
86
|
+
@top_level_statements << Inout::Describe.new(relation)
|
87
87
|
end
|
88
88
|
|
89
89
|
# EXPLAIN
|
@@ -91,7 +91,34 @@ module Piglet
|
|
91
91
|
# explain # => EXPLAIN
|
92
92
|
# explain(x) # => EXPLAIN(x)
|
93
93
|
def explain(relation=nil)
|
94
|
-
@
|
94
|
+
@top_level_statements << Inout::Explain.new(relation)
|
95
|
+
end
|
96
|
+
|
97
|
+
# REGISTER
|
98
|
+
#
|
99
|
+
# register 'path/to/lib.jar' # => REGISTER path/to/lib.jar
|
100
|
+
def register(path)
|
101
|
+
@top_level_statements << Udf::Register.new(path)
|
102
|
+
end
|
103
|
+
|
104
|
+
# DEFINE
|
105
|
+
#
|
106
|
+
# define('test', :function => 'com.example.Test') # => DEFINE test com.example.Test
|
107
|
+
# define('test', :command => 'test.rb') # => DEFINE test `test.rb`
|
108
|
+
# define('test', :command => 'test.rb', :input => :stdin) # => DEFINE test `test.rb` INPUT(stdin)
|
109
|
+
# define('test', :command => 'test.rb', :input => 'path/x') # => DEFINE test `test.rb` INPUT('path/x')
|
110
|
+
# define('test', :command => 'test.rb', :output => :stdout) # => DEFINE test `test.rb` OUTPUT(stdout)
|
111
|
+
# define('test', :command => 'test.rb', :ship => 'a/b/c') # => DEFINE test `test.rb` SHIP('a/b/c')
|
112
|
+
# define('test', :command => 'test.rb', :cache => ['x', 'y']) # => DEFINE test `test.rb` CACHE('x', 'y')
|
113
|
+
#
|
114
|
+
# The <code>:input</code> and <code>:output</code> options can take pretty
|
115
|
+
# complicated definitions in addition to the examples above:
|
116
|
+
#
|
117
|
+
# :input => {:from => :stdin, :using => :pig_storage} # => INPUT(stdin USING PigStorage)
|
118
|
+
# :output => {:to => :stdout, :using => 'MySerializer'} # => OUTPUT(stdout USING MySerializer)
|
119
|
+
# :output => [{:to => :stdout, :using => 'MySerializer'}, 'some/path'] # => OUTPUT(stdout USING MySerializer, 'some/path')
|
120
|
+
def define(ali4s, options=nil)
|
121
|
+
@top_level_statements << Udf::Define.new(ali4s, options)
|
95
122
|
end
|
96
123
|
|
97
124
|
# Support for binary conditions, a.k.a. the ternary operator.
|
@@ -120,15 +120,13 @@ module Piglet
|
|
120
120
|
|
121
121
|
# STREAM
|
122
122
|
#
|
123
|
-
# x.stream(
|
124
|
-
# x.stream(
|
125
|
-
# x.stream(
|
126
|
-
#
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
def stream(relations, command, options={})
|
131
|
-
raise NotSupportedError
|
123
|
+
# x.stream(:command => 'cut -f 3') # => STREAM x THROUGH `cut -f 3`
|
124
|
+
# x.stream(:cmd) # => STREAM x THROUGH cmd
|
125
|
+
# x.stream(y, :command => 'cut -f 3') # => STREAM x, y THROUGH `cut -f 3`
|
126
|
+
# x.stream(:cmd, :schema => [%w(a int)]) # => STREAM x THROUGH cmd AS (a:int)
|
127
|
+
def stream(*args)
|
128
|
+
fields, options = split_at_options(args)
|
129
|
+
Stream.new(self, fields, options)
|
132
130
|
end
|
133
131
|
|
134
132
|
# UNION
|
@@ -2,6 +2,40 @@ module Piglet
|
|
2
2
|
module Relation
|
3
3
|
class Stream # :nodoc:
|
4
4
|
include Relation
|
5
|
+
|
6
|
+
def initialize(source, args, options=nil)
|
7
|
+
options ||= {}
|
8
|
+
@sources = [source]
|
9
|
+
args.each do |arg|
|
10
|
+
@sources << arg if arg.is_a?(Relation) || arg.is_a?(Array)
|
11
|
+
end
|
12
|
+
@command_reference = (args - @sources).first
|
13
|
+
@sources = @sources.flatten
|
14
|
+
@command = options[:command]
|
15
|
+
@schema = options[:schema]
|
16
|
+
end
|
17
|
+
|
18
|
+
def schema
|
19
|
+
if @schema
|
20
|
+
Piglet::Schema::Tuple.parse(@schema)
|
21
|
+
else
|
22
|
+
nil
|
23
|
+
end
|
24
|
+
end
|
25
|
+
|
26
|
+
def to_s
|
27
|
+
source_str = @sources.map { |s| s.alias }.join(', ')
|
28
|
+
str = "STREAM #{source_str} THROUGH"
|
29
|
+
if @command_reference
|
30
|
+
str << " #{@command_reference}"
|
31
|
+
else
|
32
|
+
str << " `#{@command}`"
|
33
|
+
end
|
34
|
+
if @schema
|
35
|
+
str << " AS #{schema}"
|
36
|
+
end
|
37
|
+
str
|
38
|
+
end
|
5
39
|
end
|
6
40
|
end
|
7
41
|
end
|
@@ -0,0 +1,68 @@
|
|
1
|
+
module Piglet
|
2
|
+
module Udf
|
3
|
+
class Define
|
4
|
+
include Piglet::Inout::StorageTypes
|
5
|
+
|
6
|
+
def initialize(ali4s, options=nil)
|
7
|
+
options ||= {}
|
8
|
+
@alias = ali4s
|
9
|
+
@command = options[:command]
|
10
|
+
@function = options[:function]
|
11
|
+
@input = options[:input]
|
12
|
+
@output = options[:output]
|
13
|
+
@ship = options[:ship]
|
14
|
+
@cache = options[:cache]
|
15
|
+
end
|
16
|
+
|
17
|
+
def to_s
|
18
|
+
if @command
|
19
|
+
str = "DEFINE #{@alias} `#{@command}`"
|
20
|
+
str << io_to_s(:input, @input) if @input
|
21
|
+
str << io_to_s(:output, @output) if @output
|
22
|
+
str << paths_to_s(:ship, @ship) if @ship
|
23
|
+
str << paths_to_s(:cache, @cache) if @cache
|
24
|
+
str
|
25
|
+
else
|
26
|
+
"DEFINE #{@alias} #{@function}"
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
|
32
|
+
def paths_to_s(kind, paths)
|
33
|
+
unless Enumerable === paths
|
34
|
+
paths = [paths]
|
35
|
+
end
|
36
|
+
path_str = paths.map { |p| "'#{p}'" }.join(', ')
|
37
|
+
" #{kind.to_s.upcase}(#{path_str})"
|
38
|
+
end
|
39
|
+
|
40
|
+
def io_to_s(method, description)
|
41
|
+
case description
|
42
|
+
when Symbol, String
|
43
|
+
if method == :input
|
44
|
+
io_to_s(method, [{:from => description}])
|
45
|
+
else
|
46
|
+
io_to_s(method, [{:to => description}])
|
47
|
+
end
|
48
|
+
when Hash
|
49
|
+
io_to_s(method, [description])
|
50
|
+
when Enumerable
|
51
|
+
str = " #{method.to_s.upcase}("
|
52
|
+
description_strs = description.map do |desc|
|
53
|
+
stream = (method == :input ? desc[:from] : desc[:to])
|
54
|
+
stream = "'#{stream}'" unless Symbol === stream
|
55
|
+
if desc[:using]
|
56
|
+
"#{stream} USING #{resolve_load_store_function(desc[:using])}"
|
57
|
+
else
|
58
|
+
stream.to_s
|
59
|
+
end
|
60
|
+
end
|
61
|
+
str << description_strs.join(', ')
|
62
|
+
str << ')'
|
63
|
+
str
|
64
|
+
end
|
65
|
+
end
|
66
|
+
end
|
67
|
+
end
|
68
|
+
end
|
data/lib/piglet.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# :main: README.rdoc
|
2
2
|
module Piglet # :nodoc:
|
3
|
-
VERSION = '0.2.
|
3
|
+
VERSION = '0.2.4'
|
4
4
|
|
5
5
|
class PigletError < StandardError; end
|
6
6
|
class NotSupportedError < PigletError; end
|
@@ -53,4 +53,9 @@ module Piglet # :nodoc:
|
|
53
53
|
|
54
54
|
class SchemaError < PigletError; end
|
55
55
|
end
|
56
|
+
|
57
|
+
module Udf
|
58
|
+
autoload :Define, 'piglet/udf/define'
|
59
|
+
autoload :Register, 'piglet/udf/register'
|
60
|
+
end
|
56
61
|
end
|
data/spec/piglet_spec.rb
CHANGED
@@ -359,6 +359,134 @@ describe Piglet do
|
|
359
359
|
@interpreter.to_pig_latin.should match(/\w+ BY y OUTER/)
|
360
360
|
end
|
361
361
|
end
|
362
|
+
|
363
|
+
describe 'STREAM' do
|
364
|
+
it 'outputs a STREAM statement with a command reference' do
|
365
|
+
output = @interpreter.to_pig_latin do
|
366
|
+
a = load('in')
|
367
|
+
b = a.stream(:swoosch)
|
368
|
+
store(b, 'out')
|
369
|
+
end
|
370
|
+
output.should match(/STREAM \w+ THROUGH swoosch/)
|
371
|
+
end
|
372
|
+
|
373
|
+
it 'outputs a STREAM statement with a command' do
|
374
|
+
output = @interpreter.to_pig_latin do
|
375
|
+
a = load('in')
|
376
|
+
b = a.stream(:command => 'swoosch')
|
377
|
+
store(b, 'out')
|
378
|
+
end
|
379
|
+
output.should match(/STREAM \w+ THROUGH `swoosch`/)
|
380
|
+
end
|
381
|
+
|
382
|
+
it 'outputs a STREAM statement with a schema' do
|
383
|
+
output = @interpreter.to_pig_latin do
|
384
|
+
a = load('in')
|
385
|
+
b = a.stream(:command => 'swoosch', :schema => [:a, :b])
|
386
|
+
store(b, 'out')
|
387
|
+
end
|
388
|
+
output.should match(/STREAM \w+ THROUGH `swoosch` AS \(a:bytearray, b:bytearray\)/)
|
389
|
+
end
|
390
|
+
|
391
|
+
it 'outputs a STREAM statement with many relations' do
|
392
|
+
output = @interpreter.to_pig_latin do
|
393
|
+
x = load('in1')
|
394
|
+
y = load('in2')
|
395
|
+
z = load('in3')
|
396
|
+
w = x.stream([x, y], :plink)
|
397
|
+
store(w, 'out')
|
398
|
+
end
|
399
|
+
output.should match(/STREAM \w+, \w+, \w+ THROUGH plink/)
|
400
|
+
end
|
401
|
+
end
|
402
|
+
end
|
403
|
+
|
404
|
+
context 'UDF statements:' do
|
405
|
+
describe 'DEFINE' do
|
406
|
+
it 'outputs a DEFINE with the correct alias and function name' do
|
407
|
+
output = @interpreter.to_pig_latin { define('plunk', :function => 'com.example.Plunk') }
|
408
|
+
output.should include('DEFINE plunk com.example.Plunk')
|
409
|
+
end
|
410
|
+
|
411
|
+
it 'outputs a DEFINE with the correct alias and command string' do
|
412
|
+
output = @interpreter.to_pig_latin { define('plunk', :command => 'plunk.rb') }
|
413
|
+
output.should include('DEFINE plunk `plunk.rb`')
|
414
|
+
end
|
415
|
+
|
416
|
+
it 'outputs a DEFINE with an INPUT definition' do
|
417
|
+
output = @interpreter.to_pig_latin do
|
418
|
+
define('plunk', :command => 'plunk.rb', :input => :stdin)
|
419
|
+
end
|
420
|
+
output.should include('DEFINE plunk `plunk.rb` INPUT(stdin)')
|
421
|
+
end
|
422
|
+
|
423
|
+
it 'outputs a DEFINE with an OUTPUT definition' do
|
424
|
+
output = @interpreter.to_pig_latin do
|
425
|
+
define('plunk', :command => 'plunk.rb', :output => :stdout)
|
426
|
+
end
|
427
|
+
output.should include('DEFINE plunk `plunk.rb` OUTPUT(stdout)')
|
428
|
+
end
|
429
|
+
|
430
|
+
it 'outputs a DEFINE with a SHIP definition with one path' do
|
431
|
+
output = @interpreter.to_pig_latin do
|
432
|
+
define('plunk', :command => 'plunk.rb', :ship => 'path/to/somewhere')
|
433
|
+
end
|
434
|
+
output.should include('DEFINE plunk `plunk.rb` SHIP(\'path/to/somewhere\')')
|
435
|
+
end
|
436
|
+
|
437
|
+
it 'outputs a DEFINE with a SHIP definition with may paths' do
|
438
|
+
output = @interpreter.to_pig_latin do
|
439
|
+
define('plunk', :command => 'plunk.rb', :ship => ['path/to/somewhere', 'and/to/somewhere/else'])
|
440
|
+
end
|
441
|
+
output.should include('DEFINE plunk `plunk.rb` SHIP(\'path/to/somewhere\', \'and/to/somewhere/else\')')
|
442
|
+
end
|
443
|
+
|
444
|
+
it 'outputs a DEFINE with a CACHE definition with one path description' do
|
445
|
+
output = @interpreter.to_pig_latin do
|
446
|
+
define('plunk', :command => 'plunk.rb', :cache => '/input/data.gz#data.gz')
|
447
|
+
end
|
448
|
+
output.should include('DEFINE plunk `plunk.rb` CACHE(\'/input/data.gz#data.gz\')')
|
449
|
+
end
|
450
|
+
|
451
|
+
it 'outputs a DEFINE with a CACHE definition with may path descriptions' do
|
452
|
+
output = @interpreter.to_pig_latin do
|
453
|
+
define('plunk', :command => 'plunk.rb', :cache => ['/input/data.gz#data.gz', '/mydir/mydata.txt#mydata.txt'])
|
454
|
+
end
|
455
|
+
output.should include('DEFINE plunk `plunk.rb` CACHE(\'/input/data.gz#data.gz\', \'/mydir/mydata.txt#mydata.txt\')')
|
456
|
+
end
|
457
|
+
|
458
|
+
it 'outputs a DEFINE with with a somewhat complex INPUT definition' do
|
459
|
+
output = @interpreter.to_pig_latin do
|
460
|
+
define('plunk', :command => 'plunk.rb', :input => {:from => 'some/path', :using => :pig_storage})
|
461
|
+
end
|
462
|
+
output.should include('DEFINE plunk `plunk.rb` INPUT(\'some/path\' USING PigStorage)')
|
463
|
+
end
|
464
|
+
|
465
|
+
it 'outputs a DEFINE with with really complex options' do
|
466
|
+
output = @interpreter.to_pig_latin do
|
467
|
+
define('plunk', :command => 'plunk.rb',
|
468
|
+
:input => [
|
469
|
+
{:from => 'some/path', :using => :pig_storage},
|
470
|
+
{:from => :stdin, :using => 'HelloWorld(\'test\')'}
|
471
|
+
],
|
472
|
+
:output => [
|
473
|
+
{:to => 'some/other/path', :using => :bin_storage},
|
474
|
+
{:to => :stdout, :using => 'SomeOtherMechanism()'}
|
475
|
+
],
|
476
|
+
:ship => 'to/here',
|
477
|
+
:cache => ['first', 'second', 'third']
|
478
|
+
)
|
479
|
+
end
|
480
|
+
output.should include('DEFINE plunk `plunk.rb` INPUT(\'some/path\' USING PigStorage, stdin USING HelloWorld(\'test\')) OUTPUT(\'some/other/path\' USING BinStorage, stdout USING SomeOtherMechanism()) SHIP(\'to/here\') CACHE(\'first\', \'second\', \'third\')')
|
481
|
+
end
|
482
|
+
end
|
483
|
+
|
484
|
+
describe 'REGISTER' do
|
485
|
+
it 'outputs a REGISTER statement with the path to the specified JAR' do
|
486
|
+
output = @interpreter.to_pig_latin { register('path/to/lib.jar') }
|
487
|
+
output.should include('REGISTER path/to/lib.jar')
|
488
|
+
end
|
489
|
+
end
|
362
490
|
end
|
363
491
|
|
364
492
|
context 'aliasing & multiple statements' do
|
@@ -705,6 +833,18 @@ describe Piglet do
|
|
705
833
|
schema.field_type(0).should eql(:double)
|
706
834
|
end
|
707
835
|
|
836
|
+
it 'knows the schema of a relation streamed through a command (if there\'s a schema)' do
|
837
|
+
schema = catch(:schema) do
|
838
|
+
@interpreter.interpret do
|
839
|
+
relation1 = load('in1', :schema => [[:a, :float], [:b, :int]])
|
840
|
+
relation2 = relation1.stream(:command => 'command', :schema => [[:x, :chararray]])
|
841
|
+
throw :schema, relation2.schema
|
842
|
+
end
|
843
|
+
end
|
844
|
+
schema.field_names.should eql([:x])
|
845
|
+
schema.field_type(:x).should eql(:chararray)
|
846
|
+
end
|
847
|
+
|
708
848
|
end
|
709
849
|
|
710
850
|
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: piglet
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Theo Hultberg
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2010-01-
|
12
|
+
date: 2010-01-14 00:00:00 +01:00
|
13
13
|
default_executable: piglet
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -31,6 +31,7 @@ extensions: []
|
|
31
31
|
extra_rdoc_files:
|
32
32
|
- LICENSE
|
33
33
|
- README.rdoc
|
34
|
+
- TODO
|
34
35
|
files:
|
35
36
|
- .document
|
36
37
|
- .gitignore
|
@@ -73,6 +74,8 @@ files:
|
|
73
74
|
- lib/piglet/relation/union.rb
|
74
75
|
- lib/piglet/schema/bag.rb
|
75
76
|
- lib/piglet/schema/tuple.rb
|
77
|
+
- lib/piglet/udf/define.rb
|
78
|
+
- lib/piglet/udf/register.rb
|
76
79
|
- spec/piglet/field/binary_conditional_spec.rb
|
77
80
|
- spec/piglet/field/field_spec.rb
|
78
81
|
- spec/piglet/field/infix_expression_spec.rb
|
@@ -89,6 +92,7 @@ files:
|
|
89
92
|
- tasks/gem.rake
|
90
93
|
- tasks/rdoc.rake
|
91
94
|
- tasks/spec.rake
|
95
|
+
- TODO
|
92
96
|
has_rdoc: true
|
93
97
|
homepage: http://github.com/iconara/piglet
|
94
98
|
licenses: []
|