elasticity 2.1.1 → 2.2

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY.md CHANGED
@@ -1,3 +1,7 @@
1
+ ## 2.2 - July 23, 2012
2
+
3
+ + Hadoop streaming jobs are now supported via ```Elasticity::StreamingStep```.
4
+
1
5
  ## 2.1.1 - July 22, 2012
2
6
 
3
7
  + ```JobFlow::from_jobflow_id``` factory method added so that you can operate on running job flows (add steps, shutdown, status, etc.) that you didn't start in the same Ruby instance.
data/README.md CHANGED
@@ -9,13 +9,13 @@ Elasticity provides two ways to access EMR:
9
9
 
10
10
  # Installation
11
11
 
12
- ```
12
+ ```ruby
13
13
  gem install elasticity
14
14
  ```
15
15
 
16
16
  or in your Gemfile
17
17
 
18
- ```
18
+ ```ruby
19
19
  gem 'elasticity', '~> 2.0'
20
20
  ```
21
21
 
@@ -25,7 +25,7 @@ This will ensure that you protect yourself from API changes, which will only be
25
25
 
26
26
  When using the EMR UI, there are several sample jobs that Amazon supplies. The assets for these sample jobs are hosted on S3 and publicly available meaning you can run this code as-is (supplying your AWS credentials appropriately) and ```JobFlow#run``` will return the ID of the job flow.
27
27
 
28
- ```
28
+ ```ruby
29
29
  require 'elasticity'
30
30
 
31
31
  # Create a job flow with your AWS credentials
@@ -63,13 +63,13 @@ Job flows are the center of the EMR universe. The general order of operations i
63
63
 
64
64
  Only your AWS credentials are needed.
65
65
 
66
- ```
66
+ ```ruby
67
67
  jobflow = Elasticity::JobFlow.new('AWS access key', 'AWS secret key')
68
68
  ```
69
69
 
70
70
  If you want to access a job flow that's already running:
71
71
 
72
- ```
72
+ ```ruby
73
73
  jobflow = Elasticity::JobFlow.from_jobflow_id('AWS access key', 'AWS secret key', 'jobflow ID')
74
74
  ```
75
75
 
@@ -81,7 +81,7 @@ Configuration job flow options, shown below with default values. Note that thes
81
81
 
82
82
  These options are sent up as part of job flow submission (i.e. ```JobFlow#run```), so be sure to configure these before running the job.
83
83
 
84
- ```
84
+ ```ruby
85
85
  jobflow.action_on_failure = 'TERMINATE_JOB_FLOW'
86
86
  jobflow.ami_version = 'latest'
87
87
  jobflow.ec2_key_name = 'default'
@@ -103,7 +103,7 @@ Technically this is optional since Elasticity creates MASTER and CORE instance g
103
103
 
104
104
  If all you'd like to do is change the type or number of instances, ```JobFlow``` provides a few shortcuts to do just that.
105
105
 
106
- ```
106
+ ```ruby
107
107
  jobflow.instance_count = 10
108
108
  jobflow.master_instance_type = 'm1.small'
109
109
  jobflow.slave_instance_type = 'c1.medium'
@@ -119,7 +119,7 @@ Elasticity supports all EMR instance group types and all configuration options.
119
119
 
120
120
  These instances will be available for the life of your EMR job, versus Spot instances which are transient depending on your bid price (see below).
121
121
 
122
- ```
122
+ ```ruby
123
123
  ig = Elasticity::InstanceGroup.new
124
124
  ig.count = 10 # Provision 10 instances
125
125
  ig.type = 'c1.medium' # See the EMR docs for a list of supported types
@@ -133,7 +133,7 @@ jobflow.set_core_instance_group(ig)
133
133
 
134
134
  *When Amazon EC2 has unused capacity, it offers EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.* - [EMR Developer Guide](http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_SpotInstances.html)
135
135
 
136
- ```
136
+ ```ruby
137
137
  ig = Elasticity::InstanceGroup.new
138
138
  ig.count = 10 # Provision 10 instances
139
139
  ig.type = 'c1.medium' # See the EMR docs for a list of supported types
@@ -147,7 +147,7 @@ jobflow.set_core_instance_group(ig)
147
147
 
148
148
  Bootstrap actions are run as part of setting up the job flow, so be sure to configure these before running the job.
149
149
 
150
- ```
150
+ ```ruby
151
151
  [
152
152
  Elasticity::HadoopBootstrapAction.new('-m', 'mapred.map.tasks=101'),
153
153
  Elasticity::HadoopBootstrapAction.new('-m', 'mapred.reduce.child.java.opts=-Xmx200m')
@@ -159,11 +159,11 @@ end
159
159
 
160
160
  ## 5 - Adding Steps
161
161
 
162
- Each type of step has a default name that can be overridden (the :name field). Apart from that, steps are configured differently - exhaustively described below.
162
+ Each type of step has ```#name``` and ```#action_on_failure``` fields that can be overridden. Apart from that, steps are configured differently - exhaustively described below.
163
163
 
164
164
  ### Adding a Pig Step
165
165
 
166
- ```
166
+ ```ruby
167
167
  # Path to the Pig script
168
168
  pig_step = Elasticity::PigStep.new('s3n://mybucket/script.pig')
169
169
 
@@ -182,7 +182,7 @@ Given the importance of specifying a reasonable value for [the number of paralle
182
182
 
183
183
  For example, if you had 8 instances in total and your slaves were m1.xlarge, the value is 26 (as shown below).
184
184
 
185
- ```
185
+ ```sh
186
186
  s3://elasticmapreduce/libs/pig/pig-script
187
187
  --run-pig-script
188
188
  --args
@@ -194,7 +194,7 @@ For example, if you had 8 instances in total and your slaves were m1.xlarge, the
194
194
 
195
195
  Use this as you would any other Pig variable.
196
196
 
197
- ```
197
+ ```pig
198
198
  A = LOAD 'myfile' AS (t, u, v);
199
199
  B = GROUP A BY t PARALLEL $E_PARALLELS;
200
200
  ...
@@ -202,7 +202,7 @@ Use this as you would any other Pig variable.
202
202
 
203
203
  ### Adding a Hive Step
204
204
 
205
- ```
205
+ ```ruby
206
206
  # Path to the Hive Script
207
207
  hive_step = Elasticity::HiveStep.new('s3n://mybucket/script.hql')
208
208
 
@@ -215,9 +215,18 @@ hive_step.variables = {
215
215
  jobflow.add_step(hive_step)
216
216
  ```
217
217
 
218
- ### Adding a Custom Jar Step
218
+ ### Adding a Streaming Step
219
+
220
+ ```ruby
221
+ # Input bucket, output bucket, mapper and reducer scripts
222
+ streaming_step = Elasticity::StreamingStep.new('s3n://elasticmapreduce/samples/wordcount/input', 's3n://elasticityoutput/wordcount/output/2012-07-23', 's3n://elasticmapreduce/samples/wordcount/wordSplitter.py', 'aggregate')
219
223
 
224
+ jobflow.add_step(streaming_step)
220
225
  ```
226
+
227
+ ### Adding a Custom Jar Step
228
+
229
+ ```ruby
221
230
  # Path to your jar
222
231
  jar_step = Elasticity::CustomJarStep.new('s3n://mybucket/my.jar')
223
232
 
@@ -231,7 +240,7 @@ jobflow.add_step(jar_step)
231
240
 
232
241
  Submit the job flow to Amazon, storing the ID of the running job flow.
233
242
 
234
- ```
243
+ ```ruby
235
244
  jobflow_id = jobflow.run
236
245
  ```
237
246
 
@@ -243,13 +252,13 @@ Steps can be added to a running jobflow just by calling ```#add_step``` on the j
243
252
 
244
253
  By default, job flows are set to terminate when there are no more running steps. You can tell the job flow to stay alive when it has nothing left to do:
245
254
 
246
- ```
255
+ ```ruby
247
256
  jobflow.keep_job_flow_alive_when_no_steps = true
248
257
  ```
249
258
 
250
259
  If that's the case, or if you'd just like to terminate a running jobflow before waiting for it to finish:
251
260
 
252
- ```
261
+ ```ruby
253
262
  jobflow.shutdown
254
263
  ```
255
264
 
@@ -0,0 +1,33 @@
1
+ module Elasticity
2
+
3
+ class StreamingStep
4
+
5
+ include JobFlowStep
6
+
7
+ attr_accessor :name
8
+ attr_accessor :action_on_failure
9
+ attr_accessor :input_bucket
10
+ attr_accessor :output_bucket
11
+ attr_accessor :mapper
12
+ attr_accessor :reducer
13
+
14
+ def initialize(input_bucket, output_bucket, mapper, reducer)
15
+ @name = 'Elasticity Streaming Step'
16
+ @action_on_failure = 'TERMINATE_JOB_FLOW'
17
+ @input_bucket = input_bucket
18
+ @output_bucket = output_bucket
19
+ @mapper = mapper
20
+ @reducer = reducer
21
+ end
22
+
23
+ def to_aws_step(job_flow)
24
+ step = Elasticity::CustomJarStep.new('/home/hadoop/contrib/streaming/hadoop-streaming.jar')
25
+ step.name = @name
26
+ step.action_on_failure = @action_on_failure
27
+ step.arguments = ['-input', @input_bucket, '-output', @output_bucket, '-mapper', @mapper, '-reducer', @reducer]
28
+ step.to_aws_step(job_flow)
29
+ end
30
+
31
+ end
32
+
33
+ end
@@ -1,3 +1,3 @@
1
1
  module Elasticity
2
- VERSION = '2.1.1'
2
+ VERSION = '2.2'
3
3
  end
data/lib/elasticity.rb CHANGED
@@ -21,6 +21,7 @@ require 'elasticity/job_flow_status_step'
21
21
  require 'elasticity/custom_jar_step'
22
22
  require 'elasticity/hive_step'
23
23
  require 'elasticity/pig_step'
24
+ require 'elasticity/streaming_step'
24
25
 
25
26
  module Elasticity
26
27
  end
@@ -0,0 +1,37 @@
1
+ describe Elasticity::StreamingStep do
2
+
3
+ subject do
4
+ Elasticity::StreamingStep.new('INPUT_BUCKET', 'OUTPUT_BUCKET', 'MAPPER', 'REDUCER')
5
+ end
6
+
7
+ it { should be_a Elasticity::JobFlowStep }
8
+
9
+ its(:name) { should == 'Elasticity Streaming Step' }
10
+ its(:action_on_failure) { should == 'TERMINATE_JOB_FLOW' }
11
+ its(:input_bucket) { should == 'INPUT_BUCKET' }
12
+ its(:output_bucket) { should == 'OUTPUT_BUCKET' }
13
+ its(:mapper) { should == 'MAPPER' }
14
+ its(:reducer) { should == 'REDUCER' }
15
+
16
+ describe '#to_aws_step' do
17
+
18
+ it 'should convert to aws step format' do
19
+ subject.to_aws_step(Elasticity::JobFlow.new('_', '_')).should == {
20
+ :name => 'Elasticity Streaming Step',
21
+ :action_on_failure => 'TERMINATE_JOB_FLOW',
22
+ :hadoop_jar_step => {
23
+ :jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
24
+ :args => %w(-input INPUT_BUCKET -output OUTPUT_BUCKET -mapper MAPPER -reducer REDUCER),
25
+ },
26
+ }
27
+ end
28
+
29
+ end
30
+
31
+ describe '.requires_installation?' do
32
+ it 'should not require installation' do
33
+ Elasticity::StreamingStep.requires_installation?.should be_false
34
+ end
35
+ end
36
+
37
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: elasticity
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.1
4
+ version: '2.2'
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-07-22 00:00:00.000000000 Z
12
+ date: 2012-07-23 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rest-client
@@ -104,6 +104,7 @@ files:
104
104
  - lib/elasticity/job_flow_status_step.rb
105
105
  - lib/elasticity/job_flow_step.rb
106
106
  - lib/elasticity/pig_step.rb
107
+ - lib/elasticity/streaming_step.rb
107
108
  - lib/elasticity/support/conditional_raise.rb
108
109
  - lib/elasticity/version.rb
109
110
  - spec/lib/elasticity/aws_request_spec.rb
@@ -118,6 +119,7 @@ files:
118
119
  - spec/lib/elasticity/job_flow_status_step_spec.rb
119
120
  - spec/lib/elasticity/job_flow_step_spec.rb
120
121
  - spec/lib/elasticity/pig_step_spec.rb
122
+ - spec/lib/elasticity/streaming_step_spec.rb
121
123
  - spec/lib/elasticity/support/conditional_raise_spec.rb
122
124
  - spec/spec_helper.rb
123
125
  - spec/support/be_a_hash_including_matcher.rb
@@ -158,6 +160,7 @@ test_files:
158
160
  - spec/lib/elasticity/job_flow_status_step_spec.rb
159
161
  - spec/lib/elasticity/job_flow_step_spec.rb
160
162
  - spec/lib/elasticity/pig_step_spec.rb
163
+ - spec/lib/elasticity/streaming_step_spec.rb
161
164
  - spec/lib/elasticity/support/conditional_raise_spec.rb
162
165
  - spec/spec_helper.rb
163
166
  - spec/support/be_a_hash_including_matcher.rb