RubyGems - elasticity - Versions diffs - 2.5.3 → 2.5.5 - Mend

elasticity 2.5.3 → 2.5.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

data/HISTORY.md +8 -0
data/README.md +11 -5
data/elasticity.gemspec +1 -1
data/lib/elasticity/job_flow_status.rb +26 -23
data/lib/elasticity/streaming_step.rb +5 -3
data/lib/elasticity/version.rb +1 -1
data/spec/lib/elasticity/job_flow_status_spec.rb +47 -39
data/spec/lib/elasticity/streaming_step_spec.rb +4 -3
metadata +10 -4

data/HISTORY.md CHANGED

@@ -1,3 +1,11 @@
+## 2.5.5 - February 3, 2013
++ Pull request from [Aaron Olson](https://github.com/airolson), adding ```StreamingStep#arguments```.
+## 2.5.4 - February 1, 2013
++ Pull request from [Aaron Olson](https://github.com/airolson), adding ```JobFlowStatus#normalized_instance_hours```.
 ## 2.5.3 - January 16, 2013
 + Added ```#visible_to_all_users``` to ```JobFlow```.  Thanks to [dstumm](https://github.com/dstumm) for the contribution!

data/README.md CHANGED

@@ -1,7 +1,11 @@
-(2012-11-30) Taking requests! I have a few ideas for what might be cool features though I'd rather work on what the community wants.  Go ahead and file an issue!
+[![Gem Version](https://badge.fury.io/rb/elasticity.png)](http://badge.fury.io/rb/elasticity)
+**(February 3, 2013)** Taking requests! I have a few ideas for what might be cool features though I'd rather work on what the community wants.  Go ahead and file an issue!
 Elasticity provides programmatic access to Amazon's Elastic Map Reduce service.  The aim is to conveniently abstract away the complex EMR REST API and make working with job flows more productive and more enjoyable.
+**Travis has been flaky, failing builds before they start.  "Trust me", it's green :)**
 [![Build Status](https://secure.travis-ci.org/rslifka/elasticity.png)](http://travis-ci.org/rslifka/elasticity) REE, 1.8.7, 1.9.2, 1.9.3
 Elasticity provides two ways to access EMR:
@@ -256,9 +260,12 @@ jobflow.add_step(hive_step)
 ### Adding a Streaming Step
 ```ruby
-# Input bucket, output bucket, mapper and reducer scripts
+# Input bucket, output bucket, mapper script,reducer script
 streaming_step = Elasticity::StreamingStep.new('s3n://elasticmapreduce/samples/wordcount/input', 's3n://elasticityoutput/wordcount/output/2012-07-23', 's3n://elasticmapreduce/samples/wordcount/wordSplitter.py', 'aggregate')
+# Optionally, include additional *arguments
+# streaming_step = Elasticity::StreamingStep.new('s3n://elasticmapreduce/samples/wordcount/input', 's3n://elasticityoutput/wordcount/output/2012-07-23', 's3n://elasticmapreduce/samples/wordcount/wordSplitter.py', 'aggregate', '-arg1', 'value1')
 jobflow.add_step(streaming_step)
 ```
@@ -333,7 +340,7 @@ Elasticity.configure do |config|
   # If using Hive, it will be configured via the directives here
   config.hive_site = 's3://bucket/hive-site.xml'
 end
 ```
@@ -355,8 +362,7 @@ Unfortunately, the documentation is sometimes incorrect and sometimes missing.
 * AWS signing was used from [RightScale's](http://www.rightscale.com/) amazing [right_aws gem](https://github.com/rightscale/right_aws) which works extraordinarily well!  If you need access to any AWS service (EC2, S3, etc.), have a look.
 * <code>camelize</code> was used from ActiveSupport to assist in converting parmeters to AWS request format.
-* Thanks to the following people who have contributed patches or helpful suggestions: [Ryan Weald](https://github.com/rweald), [Aram Price](https://github.com/aramprice/), [Wouter Broekhof](https://github.com/wouter/), [Menno van der Sman](https://github.com/menno), [Michael Tibben](https://github.com/mtibben) and [Alexander Dean](https://github.com/alexanderdean).
+* Thanks to [Ryan Weald](https://github.com/rweald) and [Alexander Dean](https://github.com/alexanderdean) for their constant barrage of excellent suggestions :)
 # License

data/elasticity.gemspec CHANGED

@@ -9,7 +9,7 @@ Gem::Specification.new do |s|
   s.authors     = ['Robert Slifka']
   s.homepage    = 'http://www.github.com/rslifka/elasticity'
   s.summary     = %q{Streamlined, programmatic access to Amazon's Elastic Map Reduce service.}
-  s.description = %q{Streamlined, Programmatic access to Amazon's Elastic Map Reduce service, driven by the Sharethrough team's requirements for belting out EMR jobs.}
+  s.description = %q{Streamlined, programmatic access to Amazon's Elastic Map Reduce service, driven by the Sharethrough team's requirements for belting out EMR jobs.}
   s.add_dependency('rest-client')
   s.add_dependency('nokogiri')

data/lib/elasticity/job_flow_status.rb CHANGED

@@ -17,6 +17,7 @@ module Elasticity
     attr_accessor :last_state_change_reason
     attr_accessor :installed_steps
     attr_accessor :master_public_dns_name
+    attr_accessor :normalized_instance_hours
     def initialize
       @steps = []
@@ -26,55 +27,57 @@ module Elasticity
     # Create a jobflow from an AWS <member> (Nokogiri::XML::Element):
     #   /DescribeJobFlowsResponse/DescribeJobFlowsResult/JobFlows/member
     def self.from_member_element(xml_element)
-      jobflow = JobFlowStatus.new
+      jobflow_status = JobFlowStatus.new
-      jobflow.name = xml_element.xpath('./Name').text.strip
-      jobflow.jobflow_id = xml_element.xpath('./JobFlowId').text.strip
-      jobflow.state = xml_element.xpath('./ExecutionStatusDetail/State').text.strip
-      jobflow.last_state_change_reason = xml_element.xpath('./ExecutionStatusDetail/LastStateChangeReason').text.strip
+      jobflow_status.name = xml_element.xpath('./Name').text.strip
+      jobflow_status.jobflow_id = xml_element.xpath('./JobFlowId').text.strip
+      jobflow_status.state = xml_element.xpath('./ExecutionStatusDetail/State').text.strip
+      jobflow_status.last_state_change_reason = xml_element.xpath('./ExecutionStatusDetail/LastStateChangeReason').text.strip
-      jobflow.steps = JobFlowStatusStep.from_members_nodeset(xml_element.xpath('./Steps/member'))
+      jobflow_status.steps = JobFlowStatusStep.from_members_nodeset(xml_element.xpath('./Steps/member'))
-      step_names = jobflow.steps.map(&:name)
+      step_names = jobflow_status.steps.map(&:name)
       Elasticity::JobFlowStep.steps_requiring_installation.each do |step|
-        jobflow.installed_steps << step if step_names.include?(step.aws_installation_step_name)
+        jobflow_status.installed_steps << step if step_names.include?(step.aws_installation_step_name)
       end
-      jobflow.created_at = Time.parse(xml_element.xpath('./ExecutionStatusDetail/CreationDateTime').text.strip)
+      jobflow_status.created_at = Time.parse(xml_element.xpath('./ExecutionStatusDetail/CreationDateTime').text.strip)
       ready_at = xml_element.xpath('./ExecutionStatusDetail/ReadyDateTime').text.strip
-      jobflow.ready_at = (ready_at == '') ? (nil) : (Time.parse(ready_at))
+      jobflow_status.ready_at = (ready_at == '') ? (nil) : (Time.parse(ready_at))
       started_at = xml_element.xpath('./ExecutionStatusDetail/StartDateTime').text.strip
-      jobflow.started_at = (started_at == '') ? (nil) : (Time.parse(started_at))
+      jobflow_status.started_at = (started_at == '') ? (nil) : (Time.parse(started_at))
       ended_at = xml_element.xpath('./ExecutionStatusDetail/EndDateTime').text.strip
-      jobflow.ended_at = (ended_at == '') ? (nil) : (Time.parse(ended_at))
+      jobflow_status.ended_at = (ended_at == '') ? (nil) : (Time.parse(ended_at))
-      if jobflow.ended_at && jobflow.started_at
-        jobflow.duration = ((jobflow.ended_at - jobflow.started_at) / 60).to_i
+      if jobflow_status.ended_at && jobflow_status.started_at
+        jobflow_status.duration = ((jobflow_status.ended_at - jobflow_status.started_at) / 60).to_i
       end
-      jobflow.instance_count = xml_element.xpath('./Instances/InstanceCount').text.strip
-      jobflow.master_instance_type = xml_element.xpath('./Instances/MasterInstanceType').text.strip
-      jobflow.slave_instance_type = xml_element.xpath('./Instances/SlaveInstanceType').text.strip
+      jobflow_status.instance_count = xml_element.xpath('./Instances/InstanceCount').text.strip
+      jobflow_status.master_instance_type = xml_element.xpath('./Instances/MasterInstanceType').text.strip
+      jobflow_status.slave_instance_type = xml_element.xpath('./Instances/SlaveInstanceType').text.strip
       master_public_dns_name = xml_element.xpath('./Instances/MasterPublicDnsName').text.strip
-      jobflow.master_public_dns_name = (master_public_dns_name == '') ? (nil) : (master_public_dns_name)
+      jobflow_status.master_public_dns_name = (master_public_dns_name == '') ? (nil) : (master_public_dns_name)
-      jobflow
+      jobflow_status.normalized_instance_hours = xml_element.xpath('./Instances/NormalizedInstanceHours').text.strip
+      jobflow_status
     end
     # Create JobFlows from a collection of AWS <member> nodes (Nokogiri::XML::NodeSet):
     #   /DescribeJobFlowsResponse/DescribeJobFlowsResult/JobFlows
     def self.from_members_nodeset(members_nodeset)
-      jobflows = []
+      jobflow_statuses = []
       members_nodeset.each do |member|
-        jobflows << from_member_element(member)
+        jobflow_statuses << from_member_element(member)
       end
-      jobflows
+      jobflow_statuses
     end
   end
-end
+end

data/lib/elasticity/streaming_step.rb CHANGED

@@ -10,24 +10,26 @@ module Elasticity
     attr_accessor :output_bucket
     attr_accessor :mapper
     attr_accessor :reducer
+    attr_accessor :arguments
-    def initialize(input_bucket, output_bucket, mapper, reducer)
+    def initialize(input_bucket, output_bucket, mapper, reducer, *arguments)
       @name = 'Elasticity Streaming Step'
       @action_on_failure = 'TERMINATE_JOB_FLOW'
       @input_bucket = input_bucket
       @output_bucket = output_bucket
       @mapper = mapper
       @reducer = reducer
+      @arguments = arguments || []
     end
     def to_aws_step(job_flow)
       step = Elasticity::CustomJarStep.new('/home/hadoop/contrib/streaming/hadoop-streaming.jar')
       step.name = @name
       step.action_on_failure = @action_on_failure
-      step.arguments = ['-input', @input_bucket, '-output', @output_bucket, '-mapper', @mapper, '-reducer', @reducer]
+      step.arguments = ['-input', @input_bucket, '-output', @output_bucket, '-mapper', @mapper, '-reducer', @reducer] + @arguments
       step.to_aws_step(job_flow)
     end
   end
-end
+end

data/lib/elasticity/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Elasticity
-  VERSION = '2.5.3'
+  VERSION = '2.5.5'
 end

data/spec/lib/elasticity/job_flow_status_spec.rb CHANGED

@@ -77,6 +77,9 @@ describe Elasticity::JobFlowStatus do
                  <MasterPublicDnsName>
                    ec2-107-22-77-99.compute-1.amazonaws.com
                  </MasterPublicDnsName>
+                 <NormalizedInstanceHours>
+                   0
+                 </NormalizedInstanceHours>
                  <Placement>
                     <AvailabilityZone>
                       eu-west-1a
@@ -111,6 +114,9 @@ describe Elasticity::JobFlowStatus do
                 </LastStateChangeReason>
               </ExecutionStatusDetail>
               <Instances>
+                 <NormalizedInstanceHours>
+                   4
+                 </NormalizedInstanceHours>
                  <Placement>
                     <AvailabilityZone>
                       eu-west-1b
@@ -142,53 +148,55 @@ describe Elasticity::JobFlowStatus do
     describe_jobflows_document.xpath('/DescribeJobFlowsResponse/DescribeJobFlowsResult/JobFlows/member')
   end
-  let(:single_jobflow) { Elasticity::JobFlowStatus.from_member_element(members_nodeset[0]) }
+  let(:single_jobflow_status) { Elasticity::JobFlowStatus.from_member_element(members_nodeset[0]) }
-  let(:multiple_jobflows) { Elasticity::JobFlowStatus.from_members_nodeset(members_nodeset) }
+  let(:multiple_jobflow_statuses) { Elasticity::JobFlowStatus.from_members_nodeset(members_nodeset) }
   describe '.from_xml' do
-    it 'should return a JobFlow with the appropriate fields initialized' do
-      single_jobflow.name.should == 'Hive Job 1'
-      single_jobflow.jobflow_id.should == 'j-p'
-      single_jobflow.state.should == 'TERMINATED'
-      single_jobflow.steps.map(&:name).should == ['Elasticity - Install Hive', 'Run Hive Script']
-      single_jobflow.steps.map(&:state).should == %w(FAILED PENDING)
-      single_jobflow.created_at.should == Time.parse('2011-10-04T21:49:16Z')
-      single_jobflow.started_at.should == Time.parse('2011-10-04T21:49:17Z')
-      single_jobflow.ready_at.should == Time.parse('2011-10-04T21:49:18Z')
-      single_jobflow.ended_at.should == Time.parse('2011-10-05T21:49:18Z')
-      single_jobflow.duration.should == 1440
-      single_jobflow.master_instance_type.should == 'm1.small'
-      single_jobflow.slave_instance_type.should == 'm1.small'
-      single_jobflow.instance_count.should == '4'
-      single_jobflow.last_state_change_reason.should == 'Steps completed with errors'
-      single_jobflow.master_public_dns_name.should == 'ec2-107-22-77-99.compute-1.amazonaws.com'
+    it 'should return a JobFlowStatus with the appropriate fields initialized' do
+      single_jobflow_status.name.should == 'Hive Job 1'
+      single_jobflow_status.jobflow_id.should == 'j-p'
+      single_jobflow_status.state.should == 'TERMINATED'
+      single_jobflow_status.steps.map(&:name).should == ['Elasticity - Install Hive', 'Run Hive Script']
+      single_jobflow_status.steps.map(&:state).should == %w(FAILED PENDING)
+      single_jobflow_status.created_at.should == Time.parse('2011-10-04T21:49:16Z')
+      single_jobflow_status.started_at.should == Time.parse('2011-10-04T21:49:17Z')
+      single_jobflow_status.ready_at.should == Time.parse('2011-10-04T21:49:18Z')
+      single_jobflow_status.ended_at.should == Time.parse('2011-10-05T21:49:18Z')
+      single_jobflow_status.duration.should == 1440
+      single_jobflow_status.master_instance_type.should == 'm1.small'
+      single_jobflow_status.slave_instance_type.should == 'm1.small'
+      single_jobflow_status.instance_count.should == '4'
+      single_jobflow_status.last_state_change_reason.should == 'Steps completed with errors'
+      single_jobflow_status.master_public_dns_name.should == 'ec2-107-22-77-99.compute-1.amazonaws.com'
+      single_jobflow_status.normalized_instance_hours.should == '0'
     end
     context 'when the jobflow never started' do
       let(:started_at) {}
       it 'should have a nil duration' do
-        single_jobflow.started_at.should == nil
-        single_jobflow.duration.should == nil
+        single_jobflow_status.started_at.should == nil
+        single_jobflow_status.duration.should == nil
       end
     end
   end
-  describe '.from_jobflows_nodeset' do
-    it 'should return JobFlows with the appropriate fields initialized' do
-      multiple_jobflows.map(&:name).should == ['Hive Job 1', 'Hive Job 2']
-      multiple_jobflows.map(&:jobflow_id).should == %w(j-p j-h)
-      multiple_jobflows.map(&:state).should == %w(TERMINATED TERMINATED)
-      multiple_jobflows.map(&:created_at).should == [Time.parse('2011-10-04T21:49:16Z'), Time.parse('2011-10-04T22:49:16Z')]
-      multiple_jobflows.map(&:started_at).should == [Time.parse('2011-10-04T21:49:17Z'), nil]
-      multiple_jobflows.map(&:ready_at).should == [Time.parse('2011-10-04T21:49:18Z'), nil]
-      multiple_jobflows.map(&:ended_at).should == [Time.parse('2011-10-05T21:49:18Z'), nil]
-      multiple_jobflows.map(&:duration).should == [1440, nil]
-      multiple_jobflows.map(&:master_instance_type).should == %w(m1.small c1.medium)
-      multiple_jobflows.map(&:slave_instance_type).should == %w(m1.small c1.medium)
-      multiple_jobflows.map(&:instance_count).should == %w(4 2)
-      multiple_jobflows.map(&:last_state_change_reason).should == ['Steps completed with errors', 'Steps completed']
-      multiple_jobflows.map(&:master_public_dns_name).should == ['ec2-107-22-77-99.compute-1.amazonaws.com', nil]
+  describe '.from_jobflow_statuses_nodeset' do
+    it 'should return JobFlowStatuses with the appropriate fields initialized' do
+      multiple_jobflow_statuses.map(&:name).should == ['Hive Job 1', 'Hive Job 2']
+      multiple_jobflow_statuses.map(&:jobflow_id).should == %w(j-p j-h)
+      multiple_jobflow_statuses.map(&:state).should == %w(TERMINATED TERMINATED)
+      multiple_jobflow_statuses.map(&:created_at).should == [Time.parse('2011-10-04T21:49:16Z'), Time.parse('2011-10-04T22:49:16Z')]
+      multiple_jobflow_statuses.map(&:started_at).should == [Time.parse('2011-10-04T21:49:17Z'), nil]
+      multiple_jobflow_statuses.map(&:ready_at).should == [Time.parse('2011-10-04T21:49:18Z'), nil]
+      multiple_jobflow_statuses.map(&:ended_at).should == [Time.parse('2011-10-05T21:49:18Z'), nil]
+      multiple_jobflow_statuses.map(&:duration).should == [1440, nil]
+      multiple_jobflow_statuses.map(&:master_instance_type).should == %w(m1.small c1.medium)
+      multiple_jobflow_statuses.map(&:slave_instance_type).should == %w(m1.small c1.medium)
+      multiple_jobflow_statuses.map(&:instance_count).should == %w(4 2)
+      multiple_jobflow_statuses.map(&:last_state_change_reason).should == ['Steps completed with errors', 'Steps completed']
+      multiple_jobflow_statuses.map(&:master_public_dns_name).should == ['ec2-107-22-77-99.compute-1.amazonaws.com', nil]
+      multiple_jobflow_statuses.map(&:normalized_instance_hours).should == %w(0 4)
     end
   end
@@ -197,28 +205,28 @@ describe Elasticity::JobFlowStatus do
     context 'when nothing has been installed' do
       let(:setup_config) { }
       it 'should be empty' do
-        single_jobflow.installed_steps.should == []
+        single_jobflow_status.installed_steps.should == []
       end
     end
     context 'when Hive has been installed by Elasticity' do
       let(:setup_config) { hive_setup_config }
       it 'should include HiveStep' do
-        single_jobflow.installed_steps.should == [Elasticity::HiveStep]
+        single_jobflow_status.installed_steps.should == [Elasticity::HiveStep]
       end
     end
     context 'when Pig has been installed by Elasticity' do
       let(:setup_config) { pig_setup_config }
       it 'should include PigStep' do
-        single_jobflow.installed_steps.should == [Elasticity::PigStep]
+        single_jobflow_status.installed_steps.should == [Elasticity::PigStep]
       end
     end
     context 'when more than one step has been installed by Elasticity' do
       let(:setup_config) { hive_setup_config + pig_setup_config }
       it 'should include all of them' do
-        single_jobflow.installed_steps.should =~ [Elasticity::HiveStep, Elasticity::PigStep]
+        single_jobflow_status.installed_steps.should =~ [Elasticity::HiveStep, Elasticity::PigStep]
       end
     end
   end

data/spec/lib/elasticity/streaming_step_spec.rb CHANGED

@@ -1,7 +1,7 @@
 describe Elasticity::StreamingStep do
   subject do
-    Elasticity::StreamingStep.new('INPUT_BUCKET', 'OUTPUT_BUCKET', 'MAPPER', 'REDUCER')
+    Elasticity::StreamingStep.new('INPUT_BUCKET', 'OUTPUT_BUCKET', 'MAPPER', 'REDUCER', '-ARG1', 'VALUE1')
   end
   it { should be_a Elasticity::JobFlowStep }
@@ -12,6 +12,7 @@ describe Elasticity::StreamingStep do
   its(:output_bucket) { should == 'OUTPUT_BUCKET' }
   its(:mapper) { should == 'MAPPER' }
   its(:reducer) { should == 'REDUCER' }
+  its(:arguments) { should == %w(-ARG1 VALUE1) }
   describe '#to_aws_step' do
@@ -21,7 +22,7 @@ describe Elasticity::StreamingStep do
         :action_on_failure => 'TERMINATE_JOB_FLOW',
         :hadoop_jar_step => {
           :jar => '/home/hadoop/contrib/streaming/hadoop-streaming.jar',
-          :args => %w(-input INPUT_BUCKET -output OUTPUT_BUCKET -mapper MAPPER -reducer REDUCER),
+          :args => %w(-input INPUT_BUCKET -output OUTPUT_BUCKET -mapper MAPPER -reducer REDUCER -ARG1 VALUE1),
         },
       }
     end
@@ -34,4 +35,4 @@ describe Elasticity::StreamingStep do
     end
   end
-end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: elasticity
 version: !ruby/object:Gem::Version
-  version: 2.5.3
+  version: 2.5.5
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-01-16 00:00:00.000000000 Z
+date: 2013-02-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rest-client
@@ -123,7 +123,7 @@ dependencies:
     - - ~>
       - !ruby/object:Gem::Version
         version: '0.4'
-description: Streamlined, Programmatic access to Amazon's Elastic Map Reduce service,
+description: Streamlined, programmatic access to Amazon's Elastic Map Reduce service,
   driven by the Sharethrough team's requirements for belting out EMR jobs.
 email:
 executables: []
@@ -187,15 +187,21 @@ required_ruby_version: !ruby/object:Gem::Requirement
   - - ! '>='
     - !ruby/object:Gem::Version
       version: '0'
+      segments:
+      - 0
+      hash: 4428846755123210746
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:
   - - ! '>='
     - !ruby/object:Gem::Version
       version: '0'
+      segments:
+      - 0
+      hash: 4428846755123210746
 requirements: []
 rubyforge_project:
-rubygems_version: 1.8.24
+rubygems_version: 1.8.25
 signing_key:
 specification_version: 3
 summary: Streamlined, programmatic access to Amazon's Elastic Map Reduce service.