piglet 0.2.4 → 0.2.5
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +26 -5
- data/Rakefile +1 -1
- data/TODO +0 -2
- data/lib/piglet/field/binary_conditional.rb +2 -0
- data/lib/piglet/field/call_expression.rb +2 -0
- data/lib/piglet/field/field.rb +2 -0
- data/lib/piglet/field/infix_expression.rb +2 -0
- data/lib/piglet/field/literal.rb +2 -0
- data/lib/piglet/field/prefix_expression.rb +2 -0
- data/lib/piglet/field/reference.rb +2 -0
- data/lib/piglet/field/rename.rb +2 -0
- data/lib/piglet/field/suffix_expression.rb +2 -0
- data/lib/piglet/field/udf_expression.rb +28 -0
- data/lib/piglet/inout/describe.rb +2 -0
- data/lib/piglet/inout/dump.rb +2 -0
- data/lib/piglet/inout/explain.rb +2 -0
- data/lib/piglet/inout/illustrate.rb +2 -0
- data/lib/piglet/inout/load.rb +2 -0
- data/lib/piglet/inout/output.rb +2 -0
- data/lib/piglet/inout/storage_types.rb +2 -0
- data/lib/piglet/inout/store.rb +2 -0
- data/lib/piglet/interpreter.rb +30 -0
- data/lib/piglet/param/declare.rb +11 -0
- data/lib/piglet/param/default.rb +11 -0
- data/lib/piglet/param/parameter_statement.rb +25 -0
- data/lib/piglet/relation/cogroup.rb +6 -1
- data/lib/piglet/relation/cross.rb +2 -0
- data/lib/piglet/relation/distinct.rb +2 -0
- data/lib/piglet/relation/filter.rb +2 -0
- data/lib/piglet/relation/foreach.rb +2 -0
- data/lib/piglet/relation/group.rb +2 -0
- data/lib/piglet/relation/join.rb +2 -0
- data/lib/piglet/relation/limit.rb +2 -0
- data/lib/piglet/relation/order.rb +2 -0
- data/lib/piglet/relation/relation.rb +2 -0
- data/lib/piglet/relation/sample.rb +2 -0
- data/lib/piglet/relation/split.rb +2 -0
- data/lib/piglet/relation/stream.rb +2 -0
- data/lib/piglet/relation/union.rb +2 -0
- data/lib/piglet/schema/bag.rb +2 -0
- data/lib/piglet/schema/tuple.rb +2 -0
- data/lib/piglet/udf/define.rb +2 -0
- data/lib/piglet/udf/register.rb +2 -0
- data/lib/piglet.rb +10 -1
- data/spec/piglet/field/binary_conditional_spec.rb +2 -0
- data/spec/piglet/field/field_spec.rb +2 -0
- data/spec/piglet/field/infix_expression_spec.rb +2 -0
- data/spec/piglet/field/literal_spec.rb +2 -0
- data/spec/piglet/field/reference_spec.rb +2 -0
- data/spec/piglet/interpreter_spec.rb +2 -0
- data/spec/piglet/relation/relation_spec.rb +2 -0
- data/spec/piglet/relation/split_spec.rb +2 -0
- data/spec/piglet/relation/union_spec.rb +2 -0
- data/spec/piglet/schema/tuple_spec.rb +2 -0
- data/spec/piglet_spec.rb +43 -0
- data/spec/spec_helper.rb +2 -0
- data/tasks/gem.rake +2 -0
- data/tasks/rdoc.rake +2 -0
- data/tasks/spec.rake +2 -0
- metadata +6 -2
data/README.rdoc
CHANGED
@@ -14,6 +14,14 @@ The code above will be translated to the following Pig Latin:
|
|
14
14
|
|
15
15
|
Piglet aims to look like Pig Latin while allowing for things like loops and control of flow that are missing from Pig. I started working on Piglet out of frustration that my Pig scripts started to be very repetitive. Pig lacks control of flow and mechanisms to apply the same set of operations on multiple relations. Piglet is my way of adding those features.
|
16
16
|
|
17
|
+
== Installation
|
18
|
+
|
19
|
+
If you have Gemcutter.org as a source, just do
|
20
|
+
|
21
|
+
gem install piglet
|
22
|
+
|
23
|
+
there are no dependencies.
|
24
|
+
|
17
25
|
== Usage
|
18
26
|
|
19
27
|
It can be used either as a command line tool for translating a file of Piglet code into Pig Latin, or you can use it inline in a Ruby script:
|
@@ -257,6 +265,12 @@ You can also use +define+ to define function references:
|
|
257
265
|
|
258
266
|
define(:hello, :function => 'com.example.Hello') # => DEFINE hello com.example.Hello
|
259
267
|
|
268
|
+
When you define a UDF it becomes available as a method in the interpreter scope. This means that you can refer to it by name in, for example, a <code>FOREACH … GENERATE</code> statement:
|
269
|
+
|
270
|
+
define :awesome, :function => 'my.awesome.Function' # => DEFINE awesome my.awesome.Function
|
271
|
+
…
|
272
|
+
b = a.foreach { |r| [awesome(r[0]).as(:something_special)] } # => b = FOREACH a GENERATE awesome($0) AS something_special
|
273
|
+
|
260
274
|
If you need to register a JAR you can use +register+:
|
261
275
|
|
262
276
|
register('path/to/lib.jar') # => REGISTER path/to/lib.jar
|
@@ -277,6 +291,17 @@ For some operators in Pig Latin you can specify the <code>PARALLEL</code> keywor
|
|
277
291
|
|
278
292
|
For the +cogroup+, +cross+, +distinct+, +group+, +join+ and +order+ you can pass <code>:parallel => <em>n</em></code> as the last parameter to specify the amount of parallelism, e.g. <code>a.group(:x, :y, :z, :parallel => 5)</code>.
|
279
293
|
|
294
|
+
== <code>%declare</code> & <code>%default</code>
|
295
|
+
|
296
|
+
The <code>%declare</code> and <code>%default</code> preprocessor macros are available as +declare+ and +default+. Each take two parameters, a name and a value:
|
297
|
+
|
298
|
+
declare(:foo, 'bar') # => %declare foo 'bar'
|
299
|
+
default('hello', :world) # => %default hello 'world'
|
300
|
+
|
301
|
+
If you want to quote the value with backticks, pass <code>:backticks => true</code> as the third parameter:
|
302
|
+
|
303
|
+
default 'CMD', 'uniq', :backticks => true
|
304
|
+
|
280
305
|
=== Putting it all together
|
281
306
|
|
282
307
|
Let's look at a more complex example:
|
@@ -404,16 +429,12 @@ The following Pig operators are supported:
|
|
404
429
|
* +STREAM+
|
405
430
|
* +UNION+
|
406
431
|
|
407
|
-
The following
|
432
|
+
The following is currently not supported (but will be soon):
|
408
433
|
|
409
434
|
* <code>FOREACH { … } GENERATE</code>
|
410
|
-
* <code>%DECLARE</code>
|
411
|
-
* <code>%DEFAULT</code>
|
412
435
|
|
413
436
|
The file commands (+cd+, +cat+, etc.) will probably not be supported for the forseeable future.
|
414
437
|
|
415
|
-
UDF:s are sort of supported. You can declare a UDF with +declare+ and you can use it in some situations by using a string, but some situations are trickier (like renaming the result of a UDF call in a <code>FOREACH … GENERATE</code> body).
|
416
|
-
|
417
438
|
All the aggregate functions except two are supported:
|
418
439
|
|
419
440
|
* +AVG+
|
data/Rakefile
CHANGED
data/TODO
CHANGED
data/lib/piglet/field/field.rb
CHANGED
data/lib/piglet/field/literal.rb
CHANGED
data/lib/piglet/field/rename.rb
CHANGED
@@ -0,0 +1,28 @@
|
|
1
|
+
module Piglet
|
2
|
+
module Field
|
3
|
+
class UdfExpression
|
4
|
+
include Field
|
5
|
+
|
6
|
+
def initialize(ali4s, *args)
|
7
|
+
@alias, @args = ali4s, args
|
8
|
+
end
|
9
|
+
|
10
|
+
def to_s
|
11
|
+
"#{@alias}(#{args_to_s(@args)})"
|
12
|
+
end
|
13
|
+
|
14
|
+
private
|
15
|
+
|
16
|
+
def args_to_s(arg)
|
17
|
+
case arg
|
18
|
+
when String
|
19
|
+
"'#{escape(arg)}'"
|
20
|
+
when Enumerable
|
21
|
+
arg.map { |a| args_to_s(a) }.join(', ')
|
22
|
+
else
|
23
|
+
arg
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
data/lib/piglet/inout/dump.rb
CHANGED
data/lib/piglet/inout/explain.rb
CHANGED
data/lib/piglet/inout/load.rb
CHANGED
data/lib/piglet/inout/output.rb
CHANGED
data/lib/piglet/inout/store.rb
CHANGED
data/lib/piglet/interpreter.rb
CHANGED
@@ -1,3 +1,5 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
1
3
|
require 'set'
|
2
4
|
|
3
5
|
|
@@ -119,6 +121,34 @@ module Piglet
|
|
119
121
|
# :output => [{:to => :stdout, :using => 'MySerializer'}, 'some/path'] # => OUTPUT(stdout USING MySerializer, 'some/path')
|
120
122
|
def define(ali4s, options=nil)
|
121
123
|
@top_level_statements << Udf::Define.new(ali4s, options)
|
124
|
+
unless respond_to?(ali4s)
|
125
|
+
def metaclass
|
126
|
+
class << self
|
127
|
+
return self
|
128
|
+
end
|
129
|
+
end
|
130
|
+
metaclass.send(:define_method, ali4s) do |*args|
|
131
|
+
Field::UdfExpression.new(ali4s, *args)
|
132
|
+
end
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
136
|
+
# %declare
|
137
|
+
#
|
138
|
+
# declare(:my_var, 'value') # => %declare my_var 'value'
|
139
|
+
# declare('quote', "He said 'hello!'") # => %declare quote 'He said \'hello!\''
|
140
|
+
# declare('cmd', 'uniq', :backticks => true) # => %declare cmd `uniq`
|
141
|
+
def declare(name, value, options=nil)
|
142
|
+
@top_level_statements << Param::Declare.new(name, value, options)
|
143
|
+
end
|
144
|
+
|
145
|
+
# %default
|
146
|
+
#
|
147
|
+
# default(:my_var, 'value') # => %default my_var 'value'
|
148
|
+
# default('quote', "He said 'hello!'") # => %default quote 'He said \'hello!\''
|
149
|
+
# default('cmd', 'uniq', :backticks => true) # => %default cmd `uniq`
|
150
|
+
def default(name, value, options=nil)
|
151
|
+
@top_level_statements << Param::Default.new(name, value, options)
|
122
152
|
end
|
123
153
|
|
124
154
|
# Support for binary conditions, a.k.a. the ternary operator.
|
@@ -0,0 +1,25 @@
|
|
1
|
+
module Piglet
|
2
|
+
module Param
|
3
|
+
module ParameterStatement
|
4
|
+
def to_s
|
5
|
+
if @backticks
|
6
|
+
v = "`#{@value}`"
|
7
|
+
else
|
8
|
+
case @value
|
9
|
+
when String, Symbol
|
10
|
+
v = "'#{escape(@value)}'"
|
11
|
+
else
|
12
|
+
v = @value
|
13
|
+
end
|
14
|
+
end
|
15
|
+
"%#{@kind} #{@name} #{v}"
|
16
|
+
end
|
17
|
+
|
18
|
+
private
|
19
|
+
|
20
|
+
def escape(str)
|
21
|
+
str.to_s.gsub(/('|\\)/) { |m| "\\#{$1}" }
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
@@ -1,3 +1,5 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
1
3
|
module Piglet
|
2
4
|
module Relation
|
3
5
|
class Cogroup # :nodoc:
|
@@ -31,7 +33,10 @@ module Piglet
|
|
31
33
|
inout = fields.last.to_s.upcase
|
32
34
|
fields = fields[0..-2]
|
33
35
|
end
|
34
|
-
if fields.is_a?(Enumerable) && fields.size
|
36
|
+
if fields.is_a?(Enumerable) && fields.size == 1
|
37
|
+
fields = fields.first
|
38
|
+
end
|
39
|
+
if fields.is_a?(Enumerable)
|
35
40
|
str = "#{s.alias} BY (#{fields.join(', ')})"
|
36
41
|
else
|
37
42
|
str = "#{s.alias} BY #{fields}"
|
data/lib/piglet/relation/join.rb
CHANGED
data/lib/piglet/schema/bag.rb
CHANGED
data/lib/piglet/schema/tuple.rb
CHANGED
data/lib/piglet/udf/define.rb
CHANGED
data/lib/piglet/udf/register.rb
CHANGED
data/lib/piglet.rb
CHANGED
@@ -1,6 +1,8 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
1
3
|
# :main: README.rdoc
|
2
4
|
module Piglet # :nodoc:
|
3
|
-
VERSION = '0.2.
|
5
|
+
VERSION = '0.2.5'
|
4
6
|
|
5
7
|
class PigletError < StandardError; end
|
6
8
|
class NotSupportedError < PigletError; end
|
@@ -45,6 +47,7 @@ module Piglet # :nodoc:
|
|
45
47
|
autoload :Reference, 'piglet/field/reference'
|
46
48
|
autoload :Rename, 'piglet/field/rename'
|
47
49
|
autoload :SuffixExpression, 'piglet/field/suffix_expression'
|
50
|
+
autoload :UdfExpression, 'piglet/field/udf_expression'
|
48
51
|
end
|
49
52
|
|
50
53
|
module Schema
|
@@ -58,4 +61,10 @@ module Piglet # :nodoc:
|
|
58
61
|
autoload :Define, 'piglet/udf/define'
|
59
62
|
autoload :Register, 'piglet/udf/register'
|
60
63
|
end
|
64
|
+
|
65
|
+
module Param
|
66
|
+
autoload :Declare, 'piglet/param/declare'
|
67
|
+
autoload :Default, 'piglet/param/default'
|
68
|
+
autoload :ParameterStatement, 'piglet/param/parameter_statement'
|
69
|
+
end
|
61
70
|
end
|
data/spec/piglet_spec.rb
CHANGED
@@ -1,3 +1,5 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
|
1
3
|
require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
|
2
4
|
|
3
5
|
|
@@ -104,6 +106,37 @@ describe Piglet do
|
|
104
106
|
end
|
105
107
|
end
|
106
108
|
end
|
109
|
+
|
110
|
+
context 'parameter declarations:' do
|
111
|
+
%w(declare default).each do |op|
|
112
|
+
describe "%#{op}" do
|
113
|
+
it "outputs a %#{op} statement" do
|
114
|
+
@interpreter.interpret { self.send(op, :my_var, 'my_value') }
|
115
|
+
@interpreter.to_pig_latin.should match(/%#{op} my_var 'my_value'/)
|
116
|
+
end
|
117
|
+
|
118
|
+
it "outputs a %#{op} statement with single quotes escaped" do
|
119
|
+
@interpreter.interpret { self.send(op, :my_var, "my 'value'") }
|
120
|
+
@interpreter.to_pig_latin.should match(/%#{op} my_var 'my \\'value\\''/)
|
121
|
+
end
|
122
|
+
|
123
|
+
it "outputs a %#{op} statement with an numeric value unquoted" do
|
124
|
+
@interpreter.interpret { self.send(op, :my_var, 1) }
|
125
|
+
@interpreter.to_pig_latin.should match(/%#{op} my_var 1/)
|
126
|
+
end
|
127
|
+
|
128
|
+
it "outputs a %#{op} statement with an symbol value quoted" do
|
129
|
+
@interpreter.interpret { self.send(op, :my_var, :x) }
|
130
|
+
@interpreter.to_pig_latin.should match(/%#{op} my_var 'x'/)
|
131
|
+
end
|
132
|
+
|
133
|
+
it "outputs a %#{op} statement with the value quoted in backticks, if the option :backticks => true is passed" do
|
134
|
+
@interpreter.interpret { self.send(op, :my_var, 'cut -f 4', :backticks => true) }
|
135
|
+
@interpreter.to_pig_latin.should match(/%#{op} my_var `cut -f 4`/)
|
136
|
+
end
|
137
|
+
end
|
138
|
+
end
|
139
|
+
end
|
107
140
|
|
108
141
|
context 'relation operators:' do
|
109
142
|
describe 'GROUP' do
|
@@ -479,6 +512,16 @@ describe Piglet do
|
|
479
512
|
end
|
480
513
|
output.should include('DEFINE plunk `plunk.rb` INPUT(\'some/path\' USING PigStorage, stdin USING HelloWorld(\'test\')) OUTPUT(\'some/other/path\' USING BinStorage, stdout USING SomeOtherMechanism()) SHIP(\'to/here\') CACHE(\'first\', \'second\', \'third\')')
|
481
514
|
end
|
515
|
+
|
516
|
+
it 'makes the defined UDF available as a method in the interpreter scope, so that it can be used in a FOREACH and it\'s result renamed using AS' do
|
517
|
+
output = @interpreter.to_pig_latin do
|
518
|
+
define('my_udf', :function => 'com.example.My')
|
519
|
+
a = load('in')
|
520
|
+
b = a.foreach { |r| [my_udf('foo', 3, 'hello \'world\'', r[0]).as(:bar)]}
|
521
|
+
store(b, 'out')
|
522
|
+
end
|
523
|
+
output.should match(/FOREACH \w+ GENERATE my_udf\('foo', 3, 'hello \\'world\\'', \$0\) AS bar/)
|
524
|
+
end
|
482
525
|
end
|
483
526
|
|
484
527
|
describe 'REGISTER' do
|
data/spec/spec_helper.rb
CHANGED
data/tasks/gem.rake
CHANGED
data/tasks/rdoc.rake
CHANGED
data/tasks/spec.rake
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: piglet
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Theo Hultberg
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2010-01-
|
12
|
+
date: 2010-01-16 00:00:00 +01:00
|
13
13
|
default_executable: piglet
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -49,6 +49,7 @@ files:
|
|
49
49
|
- lib/piglet/field/reference.rb
|
50
50
|
- lib/piglet/field/rename.rb
|
51
51
|
- lib/piglet/field/suffix_expression.rb
|
52
|
+
- lib/piglet/field/udf_expression.rb
|
52
53
|
- lib/piglet/inout/describe.rb
|
53
54
|
- lib/piglet/inout/dump.rb
|
54
55
|
- lib/piglet/inout/explain.rb
|
@@ -58,6 +59,9 @@ files:
|
|
58
59
|
- lib/piglet/inout/storage_types.rb
|
59
60
|
- lib/piglet/inout/store.rb
|
60
61
|
- lib/piglet/interpreter.rb
|
62
|
+
- lib/piglet/param/declare.rb
|
63
|
+
- lib/piglet/param/default.rb
|
64
|
+
- lib/piglet/param/parameter_statement.rb
|
61
65
|
- lib/piglet/relation/cogroup.rb
|
62
66
|
- lib/piglet/relation/cross.rb
|
63
67
|
- lib/piglet/relation/distinct.rb
|