RubyGems - elasticity - Versions diffs - 2.1.1 → 2.2 - Mend

elasticity 2.1.1 → 2.2

Files changed (7) hide show

data/HISTORY.md +4 -0
data/README.md +28 -19
data/lib/elasticity/streaming_step.rb +33 -0
data/lib/elasticity/version.rb +1 -1
data/lib/elasticity.rb +1 -0
data/spec/lib/elasticity/streaming_step_spec.rb +37 -0
metadata +5 -2

data/HISTORY.md CHANGED Viewed

@@ -1,3 +1,7 @@
+## 2.2 - July 23, 2012
++ Hadoop streaming jobs are now supported via ```Elasticity::StreamingStep```.
 ## 2.1.1 - July 22, 2012
 + ```JobFlow::from_jobflow_id``` factory method added so that you can operate on running job flows (add steps, shutdown, status, etc.) that you didn't start in the same Ruby instance.

data/README.md CHANGED Viewed

@@ -9,13 +9,13 @@ Elasticity provides two ways to access EMR:
 # Installation
-```
+```ruby
   gem install elasticity
 ```
 or in your Gemfile
-```
+```ruby
   gem 'elasticity', '~> 2.0'
 ```
@@ -25,7 +25,7 @@ This will ensure that you protect yourself from API changes, which will only be
 When using the EMR UI, there are several sample jobs that Amazon supplies.  The assets for these sample jobs are hosted on S3 and publicly available meaning you can run this code as-is (supplying your AWS credentials appropriately) and ```JobFlow#run``` will return the ID of the job flow.
-```
+```ruby
 require 'elasticity'
 # Create a job flow with your AWS credentials
@@ -63,13 +63,13 @@ Job flows are the center of the EMR universe.  The general order of operations i
 Only your AWS credentials are needed.
-```
+```ruby
 jobflow = Elasticity::JobFlow.new('AWS access key', 'AWS secret key')
 ```
 If you want to access a job flow that's already running:
-```
+```ruby
 jobflow = Elasticity::JobFlow.from_jobflow_id('AWS access key', 'AWS secret key', 'jobflow ID')
 ```
@@ -81,7 +81,7 @@ Configuration job flow options, shown below with default values.  Note that thes
 These options are sent up as part of job flow submission (i.e. ```JobFlow#run```), so be sure to configure these before running the job.
-```
+```ruby
 jobflow.action_on_failure                 = 'TERMINATE_JOB_FLOW'
 jobflow.ami_version                       = 'latest'
 jobflow.ec2_key_name                      = 'default'
@@ -103,7 +103,7 @@ Technically this is optional since Elasticity creates MASTER and CORE instance g
 If all you'd like to do is change the type or number of instances, ```JobFlow``` provides a few shortcuts to do just that.
-```
+```ruby
 jobflow.instance_count       = 10
 jobflow.master_instance_type = 'm1.small'
 jobflow.slave_instance_type  = 'c1.medium'
@@ -119,7 +119,7 @@ Elasticity supports all EMR instance group types and all configuration options.
 These instances will be available for the life of your EMR job, versus Spot instances which are transient depending on your bid price (see below).
-```
+```ruby
 ig = Elasticity::InstanceGroup.new
 ig.count = 10                       # Provision 10 instances
 ig.type  = 'c1.medium'              # See the EMR docs for a list of supported types
@@ -133,7 +133,7 @@ jobflow.set_core_instance_group(ig)
 *When Amazon EC2 has unused capacity, it offers EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.* - [EMR Developer Guide](http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_SpotInstances.html)
-```
+```ruby
 ig = Elasticity::InstanceGroup.new
 ig.count = 10                       # Provision 10 instances
 ig.type  = 'c1.medium'              # See the EMR docs for a list of supported types
@@ -147,7 +147,7 @@ jobflow.set_core_instance_group(ig)
 Bootstrap actions are run as part of setting up the job flow, so be sure to configure these before running the job.
-```
+```ruby
 [
   Elasticity::HadoopBootstrapAction.new('-m', 'mapred.map.tasks=101'),
   Elasticity::HadoopBootstrapAction.new('-m', 'mapred.reduce.child.java.opts=-Xmx200m')
@@ -159,11 +159,11 @@ end
 ## 5 - Adding Steps
-Each type of step has a default name that can be overridden (the :name field).  Apart from that, steps are configured differently - exhaustively described below.
+Each type of step has ```#name``` and ```#action_on_failure``` fields that can be overridden.  Apart from that, steps are configured differently - exhaustively described below.
 ### Adding a Pig Step
-```
+```ruby
 # Path to the Pig script
 pig_step = Elasticity::PigStep.new('s3n://mybucket/script.pig')
@@ -182,7 +182,7 @@ Given the importance of specifying a reasonable value for [the number of paralle
 For example, if you had 8 instances in total and your slaves were m1.xlarge, the value is 26 (as shown below).
-```
+```sh
   s3://elasticmapreduce/libs/pig/pig-script
     --run-pig-script
       --args
@@ -194,7 +194,7 @@ For example, if you had 8 instances in total and your slaves were m1.xlarge, the
 Use this as you would any other Pig variable.
-```
+```pig
   A = LOAD 'myfile' AS (t, u, v);
   B = GROUP A BY t PARALLEL $E_PARALLELS;
   ...
@@ -202,7 +202,7 @@ Use this as you would any other Pig variable.
 ### Adding a Hive Step
-```
+```ruby
 # Path to the Hive Script
 hive_step = Elasticity::HiveStep.new('s3n://mybucket/script.hql')
@@ -215,9 +215,18 @@ hive_step.variables = {
 jobflow.add_step(hive_step)
 ```
-### Adding a Custom Jar Step
+### Adding a Streaming Step
+```ruby
+# Input bucket, output bucket, mapper and reducer scripts
+streaming_step = Elasticity::StreamingStep.new('s3n://elasticmapreduce/samples/wordcount/input', 's3n://elasticityoutput/wordcount/output/2012-07-23', 's3n://elasticmapreduce/samples/wordcount/wordSplitter.py', 'aggregate')
+jobflow.add_step(streaming_step)
 ```
+### Adding a Custom Jar Step
+```ruby
 # Path to your jar
 jar_step = Elasticity::CustomJarStep.new('s3n://mybucket/my.jar')
@@ -231,7 +240,7 @@ jobflow.add_step(jar_step)
 Submit the job flow to Amazon, storing the ID of the running job flow.
-```
+```ruby
 jobflow_id = jobflow.run
 ```
@@ -243,13 +252,13 @@ Steps can be added to a running jobflow just by calling ```#add_step``` on the j
 By default, job flows are set to terminate when there are no more running steps.  You can tell the job flow to stay alive when it has nothing left to do:
-```
+```ruby
 jobflow.keep_job_flow_alive_when_no_steps = true
 ```
 If that's the case, or if you'd just like to terminate a running jobflow before waiting for it to finish:
-```
+```ruby
 jobflow.shutdown
 ```

data/lib/elasticity/streaming_step.rb ADDED Viewed

@@ -0,0 +1,33 @@
+module Elasticity
+  class StreamingStep
+    include JobFlowStep
+    attr_accessor :name
+    attr_accessor :action_on_failure
+    attr_accessor :input_bucket
+    attr_accessor :output_bucket
+    attr_accessor :mapper
+    attr_accessor :reducer
+    def initialize(input_bucket, output_bucket, mapper, reducer)
+      @name = 'Elasticity Streaming Step'
+      @action_on_failure = 'TERMINATE_JOB_FLOW'
+      @input_bucket = input_bucket
+      @output_bucket = output_bucket
+      @mapper = mapper
+      @reducer = reducer
+    end
+    def to_aws_step(job_flow)
+      step = Elasticity::CustomJarStep.new('/home/hadoop/contrib/streaming/hadoop-streaming.jar')
+      step.name = @name
+      step.action_on_failure = @action_on_failure
+      step.arguments = ['-input', @input_bucket, '-output', @output_bucket, '-mapper', @mapper, '-reducer', @reducer]
+      step.to_aws_step(job_flow)
+    end
+  end
+end

data/lib/elasticity/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Elasticity
-  VERSION = '2.1.1'
+  VERSION = '2.2'
 end

data/lib/elasticity.rb CHANGED Viewed

@@ -21,6 +21,7 @@ require 'elasticity/job_flow_status_step'
 require 'elasticity/custom_jar_step'
 require 'elasticity/hive_step'
 require 'elasticity/pig_step'
+require 'elasticity/streaming_step'
 module Elasticity
 end

data/spec/lib/elasticity/streaming_step_spec.rb ADDED Viewed

@@ -0,0 +1,37 @@
+describe Elasticity::StreamingStep do
+  subject do
+    Elasticity::StreamingStep.new('INPUT_BUCKET', 'OUTPUT_BUCKET', 'MAPPER', 'REDUCER')
+  end
+  it { should be_a Elasticity::JobFlowStep }
+  its(:name) { should == 'Elasticity Streaming Step' }
+  its(:action_on_failure) { should == 'TERMINATE_JOB_FLOW' }
+  its(:input_bucket) { should == 'INPUT_BUCKET' }
+  its(:output_bucket) { should == 'OUTPUT_BUCKET' }
+  its(:mapper) { should == 'MAPPER' }
+  its(:reducer) { should == 'REDUCER' }
+  describe '#to_aws_step' do
+    it 'should convert to aws step format' do
+      subject.to_aws_step(Elasticity::JobFlow.new('_', '_')).should == {
+        :name => 'Elasticity Streaming Step',
+        :action_on_failure => 'TERMINATE_JOB_FLOW',
+        :hadoop_jar_step => {
+          :jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
+          :args => %w(-input INPUT_BUCKET -output OUTPUT_BUCKET -mapper MAPPER -reducer REDUCER),
+        },
+      }
+    end
+  end
+  describe '.requires_installation?' do
+    it 'should not require installation' do
+      Elasticity::StreamingStep.requires_installation?.should be_false
+    end
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: elasticity
 version: !ruby/object:Gem::Version
-  version: 2.1.1
+  version: '2.2'
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-07-22 00:00:00.000000000 Z
+date: 2012-07-23 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rest-client
@@ -104,6 +104,7 @@ files:
 - lib/elasticity/job_flow_status_step.rb
 - lib/elasticity/job_flow_step.rb
 - lib/elasticity/pig_step.rb
+- lib/elasticity/streaming_step.rb
 - lib/elasticity/support/conditional_raise.rb
 - lib/elasticity/version.rb
 - spec/lib/elasticity/aws_request_spec.rb
@@ -118,6 +119,7 @@ files:
 - spec/lib/elasticity/job_flow_status_step_spec.rb
 - spec/lib/elasticity/job_flow_step_spec.rb
 - spec/lib/elasticity/pig_step_spec.rb
+- spec/lib/elasticity/streaming_step_spec.rb
 - spec/lib/elasticity/support/conditional_raise_spec.rb
 - spec/spec_helper.rb
 - spec/support/be_a_hash_including_matcher.rb
@@ -158,6 +160,7 @@ test_files:
 - spec/lib/elasticity/job_flow_status_step_spec.rb
 - spec/lib/elasticity/job_flow_step_spec.rb
 - spec/lib/elasticity/pig_step_spec.rb
+- spec/lib/elasticity/streaming_step_spec.rb
 - spec/lib/elasticity/support/conditional_raise_spec.rb
 - spec/spec_helper.rb
 - spec/support/be_a_hash_including_matcher.rb