elasticity 2.1.1 → 2.2
Sign up to get free protection for your applications and to get access to all the features.
- data/HISTORY.md +4 -0
- data/README.md +28 -19
- data/lib/elasticity/streaming_step.rb +33 -0
- data/lib/elasticity/version.rb +1 -1
- data/lib/elasticity.rb +1 -0
- data/spec/lib/elasticity/streaming_step_spec.rb +37 -0
- metadata +5 -2
data/HISTORY.md
CHANGED
@@ -1,3 +1,7 @@
|
|
1
|
+
## 2.2 - July 23, 2012
|
2
|
+
|
3
|
+
+ Hadoop streaming jobs are now supported via ```Elasticity::StreamingStep```.
|
4
|
+
|
1
5
|
## 2.1.1 - July 22, 2012
|
2
6
|
|
3
7
|
+ ```JobFlow::from_jobflow_id``` factory method added so that you can operate on running job flows (add steps, shutdown, status, etc.) that you didn't start in the same Ruby instance.
|
data/README.md
CHANGED
@@ -9,13 +9,13 @@ Elasticity provides two ways to access EMR:
|
|
9
9
|
|
10
10
|
# Installation
|
11
11
|
|
12
|
-
```
|
12
|
+
```ruby
|
13
13
|
gem install elasticity
|
14
14
|
```
|
15
15
|
|
16
16
|
or in your Gemfile
|
17
17
|
|
18
|
-
```
|
18
|
+
```ruby
|
19
19
|
gem 'elasticity', '~> 2.0'
|
20
20
|
```
|
21
21
|
|
@@ -25,7 +25,7 @@ This will ensure that you protect yourself from API changes, which will only be
|
|
25
25
|
|
26
26
|
When using the EMR UI, there are several sample jobs that Amazon supplies. The assets for these sample jobs are hosted on S3 and publicly available meaning you can run this code as-is (supplying your AWS credentials appropriately) and ```JobFlow#run``` will return the ID of the job flow.
|
27
27
|
|
28
|
-
```
|
28
|
+
```ruby
|
29
29
|
require 'elasticity'
|
30
30
|
|
31
31
|
# Create a job flow with your AWS credentials
|
@@ -63,13 +63,13 @@ Job flows are the center of the EMR universe. The general order of operations i
|
|
63
63
|
|
64
64
|
Only your AWS credentials are needed.
|
65
65
|
|
66
|
-
```
|
66
|
+
```ruby
|
67
67
|
jobflow = Elasticity::JobFlow.new('AWS access key', 'AWS secret key')
|
68
68
|
```
|
69
69
|
|
70
70
|
If you want to access a job flow that's already running:
|
71
71
|
|
72
|
-
```
|
72
|
+
```ruby
|
73
73
|
jobflow = Elasticity::JobFlow.from_jobflow_id('AWS access key', 'AWS secret key', 'jobflow ID')
|
74
74
|
```
|
75
75
|
|
@@ -81,7 +81,7 @@ Configuration job flow options, shown below with default values. Note that thes
|
|
81
81
|
|
82
82
|
These options are sent up as part of job flow submission (i.e. ```JobFlow#run```), so be sure to configure these before running the job.
|
83
83
|
|
84
|
-
```
|
84
|
+
```ruby
|
85
85
|
jobflow.action_on_failure = 'TERMINATE_JOB_FLOW'
|
86
86
|
jobflow.ami_version = 'latest'
|
87
87
|
jobflow.ec2_key_name = 'default'
|
@@ -103,7 +103,7 @@ Technically this is optional since Elasticity creates MASTER and CORE instance g
|
|
103
103
|
|
104
104
|
If all you'd like to do is change the type or number of instances, ```JobFlow``` provides a few shortcuts to do just that.
|
105
105
|
|
106
|
-
```
|
106
|
+
```ruby
|
107
107
|
jobflow.instance_count = 10
|
108
108
|
jobflow.master_instance_type = 'm1.small'
|
109
109
|
jobflow.slave_instance_type = 'c1.medium'
|
@@ -119,7 +119,7 @@ Elasticity supports all EMR instance group types and all configuration options.
|
|
119
119
|
|
120
120
|
These instances will be available for the life of your EMR job, versus Spot instances which are transient depending on your bid price (see below).
|
121
121
|
|
122
|
-
```
|
122
|
+
```ruby
|
123
123
|
ig = Elasticity::InstanceGroup.new
|
124
124
|
ig.count = 10 # Provision 10 instances
|
125
125
|
ig.type = 'c1.medium' # See the EMR docs for a list of supported types
|
@@ -133,7 +133,7 @@ jobflow.set_core_instance_group(ig)
|
|
133
133
|
|
134
134
|
*When Amazon EC2 has unused capacity, it offers EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.* - [EMR Developer Guide](http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_SpotInstances.html)
|
135
135
|
|
136
|
-
```
|
136
|
+
```ruby
|
137
137
|
ig = Elasticity::InstanceGroup.new
|
138
138
|
ig.count = 10 # Provision 10 instances
|
139
139
|
ig.type = 'c1.medium' # See the EMR docs for a list of supported types
|
@@ -147,7 +147,7 @@ jobflow.set_core_instance_group(ig)
|
|
147
147
|
|
148
148
|
Bootstrap actions are run as part of setting up the job flow, so be sure to configure these before running the job.
|
149
149
|
|
150
|
-
```
|
150
|
+
```ruby
|
151
151
|
[
|
152
152
|
Elasticity::HadoopBootstrapAction.new('-m', 'mapred.map.tasks=101'),
|
153
153
|
Elasticity::HadoopBootstrapAction.new('-m', 'mapred.reduce.child.java.opts=-Xmx200m')
|
@@ -159,11 +159,11 @@ end
|
|
159
159
|
|
160
160
|
## 5 - Adding Steps
|
161
161
|
|
162
|
-
Each type of step has
|
162
|
+
Each type of step has ```#name``` and ```#action_on_failure``` fields that can be overridden. Apart from that, steps are configured differently - exhaustively described below.
|
163
163
|
|
164
164
|
### Adding a Pig Step
|
165
165
|
|
166
|
-
```
|
166
|
+
```ruby
|
167
167
|
# Path to the Pig script
|
168
168
|
pig_step = Elasticity::PigStep.new('s3n://mybucket/script.pig')
|
169
169
|
|
@@ -182,7 +182,7 @@ Given the importance of specifying a reasonable value for [the number of paralle
|
|
182
182
|
|
183
183
|
For example, if you had 8 instances in total and your slaves were m1.xlarge, the value is 26 (as shown below).
|
184
184
|
|
185
|
-
```
|
185
|
+
```sh
|
186
186
|
s3://elasticmapreduce/libs/pig/pig-script
|
187
187
|
--run-pig-script
|
188
188
|
--args
|
@@ -194,7 +194,7 @@ For example, if you had 8 instances in total and your slaves were m1.xlarge, the
|
|
194
194
|
|
195
195
|
Use this as you would any other Pig variable.
|
196
196
|
|
197
|
-
```
|
197
|
+
```pig
|
198
198
|
A = LOAD 'myfile' AS (t, u, v);
|
199
199
|
B = GROUP A BY t PARALLEL $E_PARALLELS;
|
200
200
|
...
|
@@ -202,7 +202,7 @@ Use this as you would any other Pig variable.
|
|
202
202
|
|
203
203
|
### Adding a Hive Step
|
204
204
|
|
205
|
-
```
|
205
|
+
```ruby
|
206
206
|
# Path to the Hive Script
|
207
207
|
hive_step = Elasticity::HiveStep.new('s3n://mybucket/script.hql')
|
208
208
|
|
@@ -215,9 +215,18 @@ hive_step.variables = {
|
|
215
215
|
jobflow.add_step(hive_step)
|
216
216
|
```
|
217
217
|
|
218
|
-
### Adding a
|
218
|
+
### Adding a Streaming Step
|
219
|
+
|
220
|
+
```ruby
|
221
|
+
# Input bucket, output bucket, mapper and reducer scripts
|
222
|
+
streaming_step = Elasticity::StreamingStep.new('s3n://elasticmapreduce/samples/wordcount/input', 's3n://elasticityoutput/wordcount/output/2012-07-23', 's3n://elasticmapreduce/samples/wordcount/wordSplitter.py', 'aggregate')
|
219
223
|
|
224
|
+
jobflow.add_step(streaming_step)
|
220
225
|
```
|
226
|
+
|
227
|
+
### Adding a Custom Jar Step
|
228
|
+
|
229
|
+
```ruby
|
221
230
|
# Path to your jar
|
222
231
|
jar_step = Elasticity::CustomJarStep.new('s3n://mybucket/my.jar')
|
223
232
|
|
@@ -231,7 +240,7 @@ jobflow.add_step(jar_step)
|
|
231
240
|
|
232
241
|
Submit the job flow to Amazon, storing the ID of the running job flow.
|
233
242
|
|
234
|
-
```
|
243
|
+
```ruby
|
235
244
|
jobflow_id = jobflow.run
|
236
245
|
```
|
237
246
|
|
@@ -243,13 +252,13 @@ Steps can be added to a running jobflow just by calling ```#add_step``` on the j
|
|
243
252
|
|
244
253
|
By default, job flows are set to terminate when there are no more running steps. You can tell the job flow to stay alive when it has nothing left to do:
|
245
254
|
|
246
|
-
```
|
255
|
+
```ruby
|
247
256
|
jobflow.keep_job_flow_alive_when_no_steps = true
|
248
257
|
```
|
249
258
|
|
250
259
|
If that's the case, or if you'd just like to terminate a running jobflow before waiting for it to finish:
|
251
260
|
|
252
|
-
```
|
261
|
+
```ruby
|
253
262
|
jobflow.shutdown
|
254
263
|
```
|
255
264
|
|
@@ -0,0 +1,33 @@
|
|
1
|
+
module Elasticity
|
2
|
+
|
3
|
+
class StreamingStep
|
4
|
+
|
5
|
+
include JobFlowStep
|
6
|
+
|
7
|
+
attr_accessor :name
|
8
|
+
attr_accessor :action_on_failure
|
9
|
+
attr_accessor :input_bucket
|
10
|
+
attr_accessor :output_bucket
|
11
|
+
attr_accessor :mapper
|
12
|
+
attr_accessor :reducer
|
13
|
+
|
14
|
+
def initialize(input_bucket, output_bucket, mapper, reducer)
|
15
|
+
@name = 'Elasticity Streaming Step'
|
16
|
+
@action_on_failure = 'TERMINATE_JOB_FLOW'
|
17
|
+
@input_bucket = input_bucket
|
18
|
+
@output_bucket = output_bucket
|
19
|
+
@mapper = mapper
|
20
|
+
@reducer = reducer
|
21
|
+
end
|
22
|
+
|
23
|
+
def to_aws_step(job_flow)
|
24
|
+
step = Elasticity::CustomJarStep.new('/home/hadoop/contrib/streaming/hadoop-streaming.jar')
|
25
|
+
step.name = @name
|
26
|
+
step.action_on_failure = @action_on_failure
|
27
|
+
step.arguments = ['-input', @input_bucket, '-output', @output_bucket, '-mapper', @mapper, '-reducer', @reducer]
|
28
|
+
step.to_aws_step(job_flow)
|
29
|
+
end
|
30
|
+
|
31
|
+
end
|
32
|
+
|
33
|
+
end
|
data/lib/elasticity/version.rb
CHANGED
data/lib/elasticity.rb
CHANGED
@@ -0,0 +1,37 @@
|
|
1
|
+
describe Elasticity::StreamingStep do
|
2
|
+
|
3
|
+
subject do
|
4
|
+
Elasticity::StreamingStep.new('INPUT_BUCKET', 'OUTPUT_BUCKET', 'MAPPER', 'REDUCER')
|
5
|
+
end
|
6
|
+
|
7
|
+
it { should be_a Elasticity::JobFlowStep }
|
8
|
+
|
9
|
+
its(:name) { should == 'Elasticity Streaming Step' }
|
10
|
+
its(:action_on_failure) { should == 'TERMINATE_JOB_FLOW' }
|
11
|
+
its(:input_bucket) { should == 'INPUT_BUCKET' }
|
12
|
+
its(:output_bucket) { should == 'OUTPUT_BUCKET' }
|
13
|
+
its(:mapper) { should == 'MAPPER' }
|
14
|
+
its(:reducer) { should == 'REDUCER' }
|
15
|
+
|
16
|
+
describe '#to_aws_step' do
|
17
|
+
|
18
|
+
it 'should convert to aws step format' do
|
19
|
+
subject.to_aws_step(Elasticity::JobFlow.new('_', '_')).should == {
|
20
|
+
:name => 'Elasticity Streaming Step',
|
21
|
+
:action_on_failure => 'TERMINATE_JOB_FLOW',
|
22
|
+
:hadoop_jar_step => {
|
23
|
+
:jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
|
24
|
+
:args => %w(-input INPUT_BUCKET -output OUTPUT_BUCKET -mapper MAPPER -reducer REDUCER),
|
25
|
+
},
|
26
|
+
}
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
|
31
|
+
describe '.requires_installation?' do
|
32
|
+
it 'should not require installation' do
|
33
|
+
Elasticity::StreamingStep.requires_installation?.should be_false
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: elasticity
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: '2.2'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-07-
|
12
|
+
date: 2012-07-23 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rest-client
|
@@ -104,6 +104,7 @@ files:
|
|
104
104
|
- lib/elasticity/job_flow_status_step.rb
|
105
105
|
- lib/elasticity/job_flow_step.rb
|
106
106
|
- lib/elasticity/pig_step.rb
|
107
|
+
- lib/elasticity/streaming_step.rb
|
107
108
|
- lib/elasticity/support/conditional_raise.rb
|
108
109
|
- lib/elasticity/version.rb
|
109
110
|
- spec/lib/elasticity/aws_request_spec.rb
|
@@ -118,6 +119,7 @@ files:
|
|
118
119
|
- spec/lib/elasticity/job_flow_status_step_spec.rb
|
119
120
|
- spec/lib/elasticity/job_flow_step_spec.rb
|
120
121
|
- spec/lib/elasticity/pig_step_spec.rb
|
122
|
+
- spec/lib/elasticity/streaming_step_spec.rb
|
121
123
|
- spec/lib/elasticity/support/conditional_raise_spec.rb
|
122
124
|
- spec/spec_helper.rb
|
123
125
|
- spec/support/be_a_hash_including_matcher.rb
|
@@ -158,6 +160,7 @@ test_files:
|
|
158
160
|
- spec/lib/elasticity/job_flow_status_step_spec.rb
|
159
161
|
- spec/lib/elasticity/job_flow_step_spec.rb
|
160
162
|
- spec/lib/elasticity/pig_step_spec.rb
|
163
|
+
- spec/lib/elasticity/streaming_step_spec.rb
|
161
164
|
- spec/lib/elasticity/support/conditional_raise_spec.rb
|
162
165
|
- spec/spec_helper.rb
|
163
166
|
- spec/support/be_a_hash_including_matcher.rb
|