elasticity 2.3.1 → 2.4
Sign up to get free protection for your applications and to get access to all the features.
- data/.travis.yml +0 -5
- data/HISTORY.md +9 -1
- data/README.md +76 -23
- data/elasticity.gemspec +4 -1
- data/lib/elasticity.rb +5 -0
- data/lib/elasticity/aws_request.rb +27 -15
- data/lib/elasticity/bootstrap_action.rb +29 -0
- data/lib/elasticity/hadoop_bootstrap_action.rb +2 -15
- data/lib/elasticity/hadoop_file_bootstrap_action.rb +14 -0
- data/lib/elasticity/job_flow.rb +9 -6
- data/lib/elasticity/sync_to_s3.rb +76 -0
- data/lib/elasticity/version.rb +1 -1
- data/spec/lib/elasticity/aws_request_spec.rb +69 -3
- data/spec/lib/elasticity/bootstrap_action_spec.rb +25 -0
- data/spec/lib/elasticity/hadoop_bootstrap_action_spec.rb +3 -15
- data/spec/lib/elasticity/hadoop_file_bootstrap_action_spec.rb +14 -0
- data/spec/lib/elasticity/job_flow_spec.rb +14 -0
- data/spec/lib/elasticity/sync_to_s3_spec.rb +240 -0
- data/spec/spec_helper.rb +5 -2
- metadata +61 -4
data/.travis.yml
CHANGED
data/HISTORY.md
CHANGED
@@ -1,8 +1,16 @@
|
|
1
|
+
## 2.4 - September 1, 2012
|
2
|
+
|
3
|
+
+ ```SyncToS3``` added to enable one-way asset synchronization.
|
4
|
+
+ Generic bootstrap actions are now supported via ```BootstrapAction```.
|
5
|
+
+ If you have several Hadoop bootstrap actions (15 is the current EMR limit), store all of your Hadoop configuration options in a file, ship it up with ```SyncToS3``` and use the new ```HadoopFileBootstrapAction``` to point at that file.
|
6
|
+
+ If no parameters are passed to ```JobFlow.new```, it will use the standard AWS environment variables to lookup the access and secret keys - ```AWS_ACCESS_KEY_ID``` and ```AWS_SECRET_ACCESS_KEY```.
|
7
|
+
+ New dependencies: [fog](https://github.com/fog/fog) (S3 access), [fakefs](https://github.com/defunkt/fakefs) (filesystem stubbing - development only), [timecop](https://github.com/jtrupiano/timecop) (freezing and manipulating time - development only).
|
8
|
+
|
1
9
|
## 2.3.1 - August 23, 2012
|
2
10
|
|
3
11
|
+ Birthday release! ;)
|
4
12
|
+ Bumped the default version of Hadoop to 1.0.3.
|
5
|
-
+ Amazon now requires the
|
13
|
+
+ Amazon now requires the ```--hive-versions``` argument when installing Hive (thanks to Johannes Wuerbach).
|
6
14
|
+ ```JobFlowStatus#master_public_dns_name``` is now available (thanks to Johannes Wuerbach).
|
7
15
|
|
8
16
|
## 2.3 - July 28, 2012
|
data/README.md
CHANGED
@@ -1,29 +1,29 @@
|
|
1
|
-
Elasticity provides programmatic access to Amazon's Elastic Map Reduce service. The aim is to conveniently
|
1
|
+
Elasticity provides programmatic access to Amazon's Elastic Map Reduce service. The aim is to conveniently abstract away the complex EMR REST API and make working with job flows more productive and more enjoyable.
|
2
2
|
|
3
3
|
[![Build Status](https://secure.travis-ci.org/rslifka/elasticity.png)](http://travis-ci.org/rslifka/elasticity) REE, 1.8.7, 1.9.2, 1.9.3
|
4
4
|
|
5
5
|
Elasticity provides two ways to access EMR:
|
6
6
|
|
7
7
|
* **Indirectly through a JobFlow-based API**. This README discusses the Elasticity API.
|
8
|
-
* **Directly through access to the EMR REST API**. The less-discussed hidden darkside... I use this to enable the Elasticity API
|
8
|
+
* **Directly through access to the EMR REST API**. The less-discussed hidden darkside... I use this to enable the Elasticity API. RubyDoc can be found at the RubyGems [auto-generated documentation site](http://rubydoc.info/gems/elasticity/frames). Be forewarned: Making the calls directly requires that you understand how to structure EMR requests at the Amazon API level and from experience I can tell you there are more fun things you could be doing :) Scroll to the end for more information on the Amazon API.
|
9
9
|
|
10
10
|
# Installation
|
11
11
|
|
12
|
-
```
|
13
|
-
|
12
|
+
```
|
13
|
+
gem install elasticity
|
14
14
|
```
|
15
15
|
|
16
16
|
or in your Gemfile
|
17
17
|
|
18
|
-
```
|
19
|
-
|
18
|
+
```
|
19
|
+
gem 'elasticity', '~> 2.0'
|
20
20
|
```
|
21
21
|
|
22
22
|
This will ensure that you protect yourself from API changes, which will only be made in major revisions.
|
23
23
|
|
24
|
-
#
|
24
|
+
# Roughly, What Am I Getting Myself Into?
|
25
25
|
|
26
|
-
|
26
|
+
If you're familiar with the AWS EMR UI, you'll recall there are sample jobs Amazon supplies to help us get familiar with EMR. Here's how you'd kick off the "Cloudburst (Custom Jar)" sample job with Elasticity. You can run this code as-is (supplying your AWS credentials and an output location) and ```JobFlow#run``` will return the ID of the job flow.
|
27
27
|
|
28
28
|
```ruby
|
29
29
|
require 'elasticity'
|
@@ -31,11 +31,14 @@ require 'elasticity'
|
|
31
31
|
# Create a job flow with your AWS credentials
|
32
32
|
jobflow = Elasticity::JobFlow.new('AWS access key', 'AWS secret key')
|
33
33
|
|
34
|
+
# Omit credentials to use the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables
|
35
|
+
# jobflow = Elasticity::JobFlow.new
|
36
|
+
|
34
37
|
# This is the first step in the jobflow - running a custom jar
|
35
38
|
step = Elasticity::CustomJarStep.new('s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar')
|
36
39
|
|
37
40
|
# Here are the arguments to pass to the jar
|
38
|
-
step.arguments = %w(s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br s3n://elasticmapreduce/samples/cloudburst/input/100k.br s3n://
|
41
|
+
step.arguments = %w(s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br s3n://elasticmapreduce/samples/cloudburst/input/100k.br s3n://OUTPUT_BUCKET/cloudburst/output/2012-06-22 36 3 0 1 240 48 24 24 128 16)
|
39
42
|
|
40
43
|
# Add the step to the jobflow
|
41
44
|
jobflow.add_step(step)
|
@@ -44,7 +47,7 @@ jobflow.add_step(step)
|
|
44
47
|
jobflow.run
|
45
48
|
```
|
46
49
|
|
47
|
-
Note that this example is only for ```CustomJarStep```.
|
50
|
+
Note that this example is only for ```CustomJarStep```. Other steps will have different means of passing parameters.
|
48
51
|
|
49
52
|
# Working with Job Flows
|
50
53
|
|
@@ -54,7 +57,8 @@ Job flows are the center of the EMR universe. The general order of operations i
|
|
54
57
|
1. Specify options.
|
55
58
|
1. (optional) Configure instance groups.
|
56
59
|
1. (optional) Add bootstrap actions.
|
57
|
-
1.
|
60
|
+
1. Add steps.
|
61
|
+
1. (optional) Upload assets.
|
58
62
|
1. Run the job flow.
|
59
63
|
1. (optional) Add additional steps.
|
60
64
|
1. (optional) Shutdown the job flow.
|
@@ -64,39 +68,49 @@ Job flows are the center of the EMR universe. The general order of operations i
|
|
64
68
|
Only your AWS credentials are needed.
|
65
69
|
|
66
70
|
```ruby
|
71
|
+
# Manually specify AWS credentials
|
67
72
|
jobflow = Elasticity::JobFlow.new('AWS access key', 'AWS secret key')
|
73
|
+
|
74
|
+
# Use the standard environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)
|
75
|
+
jobflow = Elasticity::JobFlow.new
|
68
76
|
```
|
69
77
|
|
70
78
|
If you want to access a job flow that's already running:
|
71
79
|
|
72
80
|
```ruby
|
81
|
+
# Manually specify AWS credentials
|
73
82
|
jobflow = Elasticity::JobFlow.from_jobflow_id('AWS access key', 'AWS secret key', 'jobflow ID', 'region')
|
83
|
+
|
84
|
+
# Use the standard environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)
|
85
|
+
jobflow = Elasticity::JobFlow.from_jobflow_id(nil, nil, 'jobflow ID', 'region')
|
74
86
|
```
|
75
87
|
|
76
88
|
This is useful if you'd like to attach to a running job flow and add more steps, etc. The ```region``` parameter is necessary because job flows are only accessible from the the API when you connect to the same endpoint that created them (e.g. us-west-1). If you don't specify the ```region``` parameter, us-east-1 is assumed.
|
77
89
|
|
78
|
-
## 2 - Specifying
|
90
|
+
## 2 - Specifying Options
|
79
91
|
|
80
92
|
Configuration job flow options, shown below with default values. Note that these defaults are subject to change - they are reasonable defaults at the time(s) I work on them (e.g. the latest version of Hadoop).
|
81
93
|
|
82
94
|
These options are sent up as part of job flow submission (i.e. ```JobFlow#run```), so be sure to configure these before running the job.
|
83
95
|
|
84
96
|
```ruby
|
97
|
+
jobflow.name = 'Elasticity Job Flow'
|
98
|
+
|
85
99
|
jobflow.action_on_failure = 'TERMINATE_JOB_FLOW'
|
100
|
+
jobflow.keep_job_flow_alive_when_no_steps = false
|
86
101
|
jobflow.ami_version = 'latest'
|
87
|
-
jobflow.
|
88
|
-
jobflow.ec2_subnet_id = nil
|
89
|
-
jobflow.hadoop_version = '0.20.205'
|
90
|
-
jobflow.keep_job_flow_alive_when_no_steps = true
|
102
|
+
jobflow.hadoop_version = '1.0.3'
|
91
103
|
jobflow.log_uri = nil
|
92
|
-
|
104
|
+
|
105
|
+
jobflow.ec2_key_name = nil
|
106
|
+
jobflow.ec2_subnet_id = nil
|
93
107
|
jobflow.placement = 'us-east-1a'
|
94
108
|
jobflow.instance_count = 2
|
95
109
|
jobflow.master_instance_type = 'm1.small'
|
96
110
|
jobflow.slave_instance_type = 'm1.small'
|
97
111
|
```
|
98
112
|
|
99
|
-
## 3 -
|
113
|
+
## 3 - Configure Instance Groups (optional)
|
100
114
|
|
101
115
|
Technically this is optional since Elasticity creates MASTER and CORE instance groups for you (one m1.small instance in each). If you'd like your jobs to finish in an appreciable amount of time, you'll want to at least add a few instances to the CORE group :)
|
102
116
|
|
@@ -142,10 +156,23 @@ ig.set_spot_instances(0.25) # Makes this a SPOT group with a $0.25 bid p
|
|
142
156
|
jobflow.set_core_instance_group(ig)
|
143
157
|
```
|
144
158
|
|
145
|
-
## 4 -
|
159
|
+
## 4 - Add Bootstrap Actions (optional)
|
146
160
|
|
147
161
|
Bootstrap actions are run as part of setting up the job flow, so be sure to configure these before running the job.
|
148
162
|
|
163
|
+
### Bootstrap Actions
|
164
|
+
|
165
|
+
With the basic ```BootstrapAction``` you specify everything about the action - the script, options and arguments.
|
166
|
+
|
167
|
+
```ruby
|
168
|
+
action = Elasticity::BootstrapAction.new('s3n://my-bucket/my-script', '-g', '100')
|
169
|
+
jobflow.add_bootstrap_action(action)
|
170
|
+
```
|
171
|
+
|
172
|
+
### Hadoop Bootstrap Actions
|
173
|
+
|
174
|
+
`HadoopBootstrapAction` handles passing Hadoop configuration options through.
|
175
|
+
|
149
176
|
```ruby
|
150
177
|
[
|
151
178
|
Elasticity::HadoopBootstrapAction.new('-m', 'mapred.map.tasks=101'),
|
@@ -156,7 +183,16 @@ Bootstrap actions are run as part of setting up the job flow, so be sure to conf
|
|
156
183
|
end
|
157
184
|
```
|
158
185
|
|
159
|
-
|
186
|
+
### Hadoop File Bootstrap Actions
|
187
|
+
|
188
|
+
With EMR's current limit of 15 bootstrap actions, chances are you're going to create a configuration file full of your options and opt to use that instead of passing all the options individually. In that case, use the ```HadoopFileBootstrapAction```, supplying the location of your configuration file.
|
189
|
+
|
190
|
+
```ruby
|
191
|
+
action = Elasticity::HadoopFileBootstrapAction.new('s3n://my-bucket/job-config.xml')
|
192
|
+
jobflow.add_bootstrap_action(action)
|
193
|
+
```
|
194
|
+
|
195
|
+
## 5 - Add Steps
|
160
196
|
|
161
197
|
Each type of step has ```#name``` and ```#action_on_failure``` fields that can be overridden. Apart from that, steps are configured differently - exhaustively described below.
|
162
198
|
|
@@ -235,7 +271,24 @@ jar_step.arguments = ['arg1', 'arg2']
|
|
235
271
|
jobflow.add_step(jar_step)
|
236
272
|
```
|
237
273
|
|
238
|
-
## 6 -
|
274
|
+
## 6 - Upload Assets (optional)
|
275
|
+
|
276
|
+
This isn't part of ```JobFlow```; more of an aside :) Elasticity provides a very basic means of uploading assets to S3 so that your EMR job has access to them. For example, a TSV file with a range of valid values, join tables, etc.
|
277
|
+
|
278
|
+
```ruby
|
279
|
+
# Specify the bucket and AWS credentials
|
280
|
+
s3 = Elasticity::SyncToS3('my-bucket', 'access', 'secret')
|
281
|
+
|
282
|
+
# Use the standard environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)
|
283
|
+
# s3 = Elasticity::SyncToS3('my-bucket')
|
284
|
+
|
285
|
+
# Recursively sync the contents of '/some/parent/dir' under the remote location 'remote-dir/this-job/assets'
|
286
|
+
s3.sync('/some/parent/dir', 'remote-dir/this-job/assets')
|
287
|
+
```
|
288
|
+
|
289
|
+
If the files already exist, there is an MD5 checksum check. If the checksums are the same, the file will be skipped. Now you can use something like ```s3n://my-bucket/remote-dir/this-job/assets/join.tsv``` in your EMR jobs.
|
290
|
+
|
291
|
+
## 7 - Run the Job Flow
|
239
292
|
|
240
293
|
Submit the job flow to Amazon, storing the ID of the running job flow.
|
241
294
|
|
@@ -243,11 +296,11 @@ Submit the job flow to Amazon, storing the ID of the running job flow.
|
|
243
296
|
jobflow_id = jobflow.run
|
244
297
|
```
|
245
298
|
|
246
|
-
##
|
299
|
+
## 8 - Add Additional Steps (optional)
|
247
300
|
|
248
301
|
Steps can be added to a running jobflow just by calling ```#add_step``` on the job flow exactly how you add them prior to submitting the job.
|
249
302
|
|
250
|
-
##
|
303
|
+
## 9 - Shut Down the Job Flow (optional)
|
251
304
|
|
252
305
|
By default, job flows are set to terminate when there are no more running steps. You can tell the job flow to stay alive when it has nothing left to do:
|
253
306
|
|
data/elasticity.gemspec
CHANGED
@@ -13,12 +13,15 @@ Gem::Specification.new do |s|
|
|
13
13
|
|
14
14
|
s.add_dependency('rest-client')
|
15
15
|
s.add_dependency('nokogiri')
|
16
|
+
s.add_dependency('fog')
|
16
17
|
|
17
18
|
s.add_development_dependency('rake')
|
18
19
|
s.add_development_dependency('rspec', '~> 2.11.0')
|
20
|
+
s.add_development_dependency('timecop')
|
21
|
+
s.add_development_dependency('fakefs', '~> 0.4')
|
19
22
|
|
20
23
|
s.files = `git ls-files`.split("\n")
|
21
|
-
s.test_files = `git ls-files -- {
|
24
|
+
s.test_files = `git ls-files -- {spec,features}/*`.split("\n")
|
22
25
|
s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
|
23
26
|
s.require_paths = %w(lib)
|
24
27
|
end
|
data/lib/elasticity.rb
CHANGED
@@ -3,13 +3,18 @@ require 'time'
|
|
3
3
|
|
4
4
|
require 'rest_client'
|
5
5
|
require 'nokogiri'
|
6
|
+
require 'fog'
|
6
7
|
|
7
8
|
require 'elasticity/support/conditional_raise'
|
8
9
|
|
9
10
|
require 'elasticity/aws_request'
|
10
11
|
require 'elasticity/emr'
|
11
12
|
|
13
|
+
require 'elasticity/sync_to_s3'
|
14
|
+
|
15
|
+
require 'elasticity/bootstrap_action'
|
12
16
|
require 'elasticity/hadoop_bootstrap_action'
|
17
|
+
require 'elasticity/hadoop_file_bootstrap_action'
|
13
18
|
require 'elasticity/job_flow_step'
|
14
19
|
|
15
20
|
require 'elasticity/job_flow'
|
@@ -1,5 +1,8 @@
|
|
1
1
|
module Elasticity
|
2
2
|
|
3
|
+
class MissingKeyError < StandardError;
|
4
|
+
end
|
5
|
+
|
3
6
|
class AwsRequest
|
4
7
|
|
5
8
|
attr_reader :access_key
|
@@ -10,9 +13,9 @@ module Elasticity
|
|
10
13
|
# Supported values for options:
|
11
14
|
# :region - AWS region (e.g. us-west-1)
|
12
15
|
# :secure - true or false, default true.
|
13
|
-
def initialize(access, secret, options
|
14
|
-
@access_key = access
|
15
|
-
@secret_key = secret
|
16
|
+
def initialize(access=nil, secret=nil, options={})
|
17
|
+
@access_key = get_access_key(access)
|
18
|
+
@secret_key = get_secret_key(secret)
|
16
19
|
@host = "elasticmapreduce.#{{:region => 'us-east-1'}.merge(options)[:region]}.amazonaws.com"
|
17
20
|
@protocol = {:secure => true}.merge(options)[:secure] ? 'https' : 'http'
|
18
21
|
end
|
@@ -38,20 +41,33 @@ module Elasticity
|
|
38
41
|
|
39
42
|
private
|
40
43
|
|
44
|
+
def get_access_key(access)
|
45
|
+
return access if access
|
46
|
+
return ENV['AWS_ACCESS_KEY_ID'] if ENV['AWS_ACCESS_KEY_ID']
|
47
|
+
raise MissingKeyError, 'Please provide an access key or set AWS_ACCESS_KEY_ID.'
|
48
|
+
end
|
49
|
+
|
50
|
+
def get_secret_key(secret)
|
51
|
+
return secret if secret
|
52
|
+
return ENV['AWS_SECRET_ACCESS_KEY'] if ENV['AWS_SECRET_ACCESS_KEY']
|
53
|
+
raise MissingKeyError, 'Please provide a secret key or set AWS_ACCESS_KEY_ID.'
|
54
|
+
end
|
55
|
+
|
41
56
|
# (Used from RightScale's right_aws gem.)
|
42
57
|
# EC2, SQS, SDB and EMR requests must be signed by this guy.
|
43
58
|
# See: http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/index.html?REST_RESTAuth.html
|
44
59
|
# http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1928
|
45
60
|
def sign_params(service_hash)
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
61
|
+
service_hash.merge!({
|
62
|
+
'AWSAccessKeyId' => @access_key,
|
63
|
+
'Timestamp' => Time.now.utc.strftime('%Y-%m-%dT%H:%M:%S.000Z'),
|
64
|
+
'SignatureVersion' => '2',
|
65
|
+
'SignatureMethod' => 'HmacSHA256'
|
66
|
+
})
|
51
67
|
canonical_string = service_hash.keys.sort.map do |key|
|
52
68
|
"#{AwsRequest.aws_escape(key)}=#{AwsRequest.aws_escape(service_hash[key])}"
|
53
69
|
end.join('&')
|
54
|
-
string_to_sign = "POST\n#{@host.downcase}\n
|
70
|
+
string_to_sign = "POST\n#{@host.downcase}\n/\n#{canonical_string}"
|
55
71
|
signature = AwsRequest.aws_escape(Base64.encode64(OpenSSL::HMAC.digest("sha256", @secret_key, string_to_sign)).strip)
|
56
72
|
"#{canonical_string}&Signature=#{signature}"
|
57
73
|
end
|
@@ -95,12 +111,8 @@ module Elasticity
|
|
95
111
|
end
|
96
112
|
|
97
113
|
# (Used from Rails' ActiveSupport)
|
98
|
-
def self.camelize(
|
99
|
-
|
100
|
-
lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::" + $1.upcase }.gsub(/(^|_)(.)/) { $2.upcase }
|
101
|
-
else
|
102
|
-
lower_case_and_underscored_word.first + camelize(lower_case_and_underscored_word)[1..-1]
|
103
|
-
end
|
114
|
+
def self.camelize(word)
|
115
|
+
word.to_s.gsub(/\/(.?)/) { "::" + $1.upcase }.gsub(/(^|_)(.)/) { $2.upcase }
|
104
116
|
end
|
105
117
|
|
106
118
|
# AWS error responses all follow the same form. Extract the message from
|
@@ -0,0 +1,29 @@
|
|
1
|
+
module Elasticity
|
2
|
+
|
3
|
+
class BootstrapAction
|
4
|
+
|
5
|
+
attr_accessor :name
|
6
|
+
attr_accessor :option
|
7
|
+
attr_accessor :value
|
8
|
+
attr_accessor :script
|
9
|
+
|
10
|
+
def initialize(script, option, value)
|
11
|
+
@name = 'Elasticity Bootstrap Action'
|
12
|
+
@option = option
|
13
|
+
@value = value
|
14
|
+
@script = script
|
15
|
+
end
|
16
|
+
|
17
|
+
def to_aws_bootstrap_action
|
18
|
+
{
|
19
|
+
:name => @name,
|
20
|
+
:script_bootstrap_action => {
|
21
|
+
:path => @script,
|
22
|
+
:args => [@option, @value]
|
23
|
+
}
|
24
|
+
}
|
25
|
+
end
|
26
|
+
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
@@ -1,25 +1,12 @@
|
|
1
1
|
module Elasticity
|
2
2
|
|
3
|
-
class HadoopBootstrapAction
|
4
|
-
|
5
|
-
attr_accessor :name
|
6
|
-
attr_accessor :option
|
7
|
-
attr_accessor :value
|
3
|
+
class HadoopBootstrapAction < BootstrapAction
|
8
4
|
|
9
5
|
def initialize(option, value)
|
10
6
|
@name = 'Elasticity Bootstrap Action (Configure Hadoop)'
|
11
7
|
@option = option
|
12
8
|
@value = value
|
13
|
-
|
14
|
-
|
15
|
-
def to_aws_bootstrap_action
|
16
|
-
{
|
17
|
-
:name => @name,
|
18
|
-
:script_bootstrap_action => {
|
19
|
-
:path => 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop',
|
20
|
-
:args => [@option, @value]
|
21
|
-
}
|
22
|
-
}
|
9
|
+
@script = 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop'
|
23
10
|
end
|
24
11
|
|
25
12
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
module Elasticity
|
2
|
+
|
3
|
+
class HadoopFileBootstrapAction < BootstrapAction
|
4
|
+
|
5
|
+
def initialize(config_file)
|
6
|
+
@name = 'Elasticity Bootstrap Action (Configure Hadoop via File)'
|
7
|
+
@option = '--mapred-config-file'
|
8
|
+
@value = config_file
|
9
|
+
@script = 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop'
|
10
|
+
end
|
11
|
+
|
12
|
+
end
|
13
|
+
|
14
|
+
end
|
data/lib/elasticity/job_flow.rb
CHANGED
@@ -19,7 +19,10 @@ module Elasticity
|
|
19
19
|
attr_accessor :ec2_subnet_id
|
20
20
|
attr_accessor :placement
|
21
21
|
|
22
|
-
|
22
|
+
attr_reader :access_key
|
23
|
+
attr_reader :secret_key
|
24
|
+
|
25
|
+
def initialize(access=nil, secret=nil)
|
23
26
|
@action_on_failure = 'TERMINATE_JOB_FLOW'
|
24
27
|
@hadoop_version = '1.0.3'
|
25
28
|
@name = 'Elasticity Job Flow'
|
@@ -27,8 +30,8 @@ module Elasticity
|
|
27
30
|
@keep_job_flow_alive_when_no_steps = false
|
28
31
|
@placement = 'us-east-1a'
|
29
32
|
|
30
|
-
@
|
31
|
-
@
|
33
|
+
@access_key = access
|
34
|
+
@secret_key = secret
|
32
35
|
|
33
36
|
@bootstrap_actions = []
|
34
37
|
@jobflow_steps = []
|
@@ -41,8 +44,8 @@ module Elasticity
|
|
41
44
|
@master_instance_type = 'm1.small'
|
42
45
|
@slave_instance_type = 'm1.small'
|
43
46
|
|
44
|
-
@
|
45
|
-
@
|
47
|
+
@access_key = access
|
48
|
+
@secret_key = secret
|
46
49
|
end
|
47
50
|
|
48
51
|
def self.from_jobflow_id(access, secret, jobflow_id, region = 'us-east-1')
|
@@ -122,7 +125,7 @@ module Elasticity
|
|
122
125
|
|
123
126
|
def emr
|
124
127
|
@region ||= @placement.match(/(\w+-\w+-\d+)/)[0]
|
125
|
-
@emr ||= Elasticity::EMR.new(@
|
128
|
+
@emr ||= Elasticity::EMR.new(@access_key, @secret_key, :region => @region)
|
126
129
|
end
|
127
130
|
|
128
131
|
def is_jobflow_running?
|
@@ -0,0 +1,76 @@
|
|
1
|
+
module Elasticity
|
2
|
+
|
3
|
+
class NoBucketError < StandardError; end
|
4
|
+
class NoDirectoryError < StandardError; end
|
5
|
+
|
6
|
+
class SyncToS3
|
7
|
+
|
8
|
+
attr_reader :access_key
|
9
|
+
attr_reader :secret_key
|
10
|
+
attr_reader :bucket_name
|
11
|
+
|
12
|
+
def initialize(bucket, access=nil, secret=nil)
|
13
|
+
@access_key = get_access_key(access)
|
14
|
+
@secret_key = get_secret_key(secret)
|
15
|
+
@bucket_name = bucket
|
16
|
+
end
|
17
|
+
|
18
|
+
def sync(local, remote)
|
19
|
+
raise_unless bucket, NoBucketError, "Bucket '#@bucket_name' does not exist"
|
20
|
+
raise_unless File.directory?(local), NoDirectoryError, "Directory '#{local}' does not exist or is not a directory"
|
21
|
+
sync_dir(local, remote)
|
22
|
+
end
|
23
|
+
|
24
|
+
private
|
25
|
+
|
26
|
+
def sync_dir(local, remote)
|
27
|
+
Dir.glob(File.join([local, '*'])).each do |entry|
|
28
|
+
if File.directory?(entry)
|
29
|
+
sync_dir(entry, [remote, File.basename(entry)].join('/'))
|
30
|
+
else
|
31
|
+
sync_file(entry, remote)
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
def sync_file(file_name, remote_dir)
|
37
|
+
remote_dir = remote_dir.gsub(/^(\/)/, '')
|
38
|
+
remote_path = (remote_dir.empty?) ? (File.basename(file_name)) : [remote_dir, File.basename(file_name)].join('/')
|
39
|
+
metadata = bucket.files.head(remote_path)
|
40
|
+
return if metadata && metadata.etag == Digest::MD5.file(file_name).to_s
|
41
|
+
|
42
|
+
bucket.files.create({
|
43
|
+
:key => remote_path,
|
44
|
+
:body => File.open(file_name),
|
45
|
+
:public => false
|
46
|
+
})
|
47
|
+
end
|
48
|
+
|
49
|
+
def bucket
|
50
|
+
index = s3.directories.index { |d| d.key == @bucket_name }
|
51
|
+
@bucket ||= index ? s3.directories[index] : nil
|
52
|
+
end
|
53
|
+
|
54
|
+
def s3
|
55
|
+
@connection ||= Fog::Storage.new({
|
56
|
+
:provider => 'AWS',
|
57
|
+
:aws_access_key_id => @access_key,
|
58
|
+
:aws_secret_access_key => @secret_key
|
59
|
+
})
|
60
|
+
end
|
61
|
+
|
62
|
+
def get_access_key(access)
|
63
|
+
return access if access
|
64
|
+
return ENV['AWS_ACCESS_KEY_ID'] if ENV['AWS_ACCESS_KEY_ID']
|
65
|
+
raise MissingKeyError, 'Please provide an access key or set AWS_ACCESS_KEY_ID.'
|
66
|
+
end
|
67
|
+
|
68
|
+
def get_secret_key(secret)
|
69
|
+
return secret if secret
|
70
|
+
return ENV['AWS_SECRET_ACCESS_KEY'] if ENV['AWS_SECRET_ACCESS_KEY']
|
71
|
+
raise MissingKeyError, 'Please provide a secret key or set AWS_SECRET_ACCESS_KEY.'
|
72
|
+
end
|
73
|
+
|
74
|
+
end
|
75
|
+
|
76
|
+
end
|
data/lib/elasticity/version.rb
CHANGED
@@ -1,15 +1,81 @@
|
|
1
1
|
describe Elasticity::AwsRequest do
|
2
2
|
|
3
3
|
before do
|
4
|
-
|
4
|
+
Timecop.freeze(Time.at(1302461096))
|
5
5
|
end
|
6
6
|
|
7
7
|
subject do
|
8
8
|
Elasticity::AwsRequest.new('access', 'secret')
|
9
9
|
end
|
10
10
|
|
11
|
-
|
12
|
-
|
11
|
+
describe '#initialize' do
|
12
|
+
|
13
|
+
context 'when access and/or secret keys are provided' do
|
14
|
+
it 'should set them to the provided values' do
|
15
|
+
subject.access_key.should == 'access'
|
16
|
+
subject.secret_key.should == 'secret'
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
context 'when either access or secret key is not provided or nil' do
|
21
|
+
|
22
|
+
context 'when the proper environment variables are set' do
|
23
|
+
|
24
|
+
context 'when access and secret key are not provided' do
|
25
|
+
let(:default_values) { Elasticity::AwsRequest.new }
|
26
|
+
before do
|
27
|
+
ENV.stub(:[]).with('AWS_ACCESS_KEY_ID').and_return('ENV_ACCESS')
|
28
|
+
ENV.stub(:[]).with('AWS_SECRET_ACCESS_KEY').and_return('ENV_SECRET')
|
29
|
+
end
|
30
|
+
it 'should set access and secret keys' do
|
31
|
+
default_values.access_key.should == 'ENV_ACCESS'
|
32
|
+
default_values.secret_key.should == 'ENV_SECRET'
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
context 'when access and secret key are nil' do
|
37
|
+
let(:nil_values) { Elasticity::AwsRequest.new(nil, nil) }
|
38
|
+
before do
|
39
|
+
ENV.stub(:[]).with('AWS_ACCESS_KEY_ID').and_return('ENV_ACCESS')
|
40
|
+
ENV.stub(:[]).with('AWS_SECRET_ACCESS_KEY').and_return('ENV_SECRET')
|
41
|
+
end
|
42
|
+
it 'should set access and secret keys' do
|
43
|
+
nil_values.access_key.should == 'ENV_ACCESS'
|
44
|
+
nil_values.secret_key.should == 'ENV_SECRET'
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
end
|
49
|
+
|
50
|
+
context 'when the environment variables are not set' do
|
51
|
+
let(:missing_something) { Elasticity::AwsRequest.new }
|
52
|
+
context 'when the access key is not set' do
|
53
|
+
before do
|
54
|
+
ENV.stub(:[]).with('AWS_ACCESS_KEY_ID').and_return(nil)
|
55
|
+
ENV.stub(:[]).with('AWS_SECRET_ACCESS_KEY').and_return('_')
|
56
|
+
end
|
57
|
+
it 'should raise an error' do
|
58
|
+
expect {
|
59
|
+
missing_something.access_key
|
60
|
+
}.to raise_error(Elasticity::MissingKeyError, 'Please provide an access key or set AWS_ACCESS_KEY_ID.')
|
61
|
+
end
|
62
|
+
end
|
63
|
+
context 'when the secret key is not set' do
|
64
|
+
before do
|
65
|
+
ENV.stub(:[]).with('AWS_ACCESS_KEY_ID').and_return('_')
|
66
|
+
ENV.stub(:[]).with('AWS_SECRET_ACCESS_KEY').and_return(nil)
|
67
|
+
end
|
68
|
+
it 'should raise an error' do
|
69
|
+
expect {
|
70
|
+
missing_something.access_key
|
71
|
+
}.to raise_error(Elasticity::MissingKeyError, 'Please provide a secret key or set AWS_ACCESS_KEY_ID.')
|
72
|
+
end
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
end
|
77
|
+
|
78
|
+
end
|
13
79
|
|
14
80
|
describe '#host' do
|
15
81
|
|
@@ -0,0 +1,25 @@
|
|
1
|
+
describe Elasticity::BootstrapAction do
|
2
|
+
|
3
|
+
subject do
|
4
|
+
Elasticity::BootstrapAction.new('script', 'option', 'value')
|
5
|
+
end
|
6
|
+
|
7
|
+
its(:name) { should == 'Elasticity Bootstrap Action' }
|
8
|
+
its(:option) { should == 'option' }
|
9
|
+
its(:value) { should == 'value' }
|
10
|
+
its(:script) { should == 'script' }
|
11
|
+
|
12
|
+
describe '#to_aws_bootstrap_action' do
|
13
|
+
it 'should create a bootstrap action' do
|
14
|
+
subject.to_aws_bootstrap_action.should ==
|
15
|
+
{
|
16
|
+
:name => 'Elasticity Bootstrap Action',
|
17
|
+
:script_bootstrap_action => {
|
18
|
+
:path => 'script',
|
19
|
+
:args => %w(option value)
|
20
|
+
}
|
21
|
+
}
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
end
|
@@ -4,23 +4,11 @@ describe Elasticity::HadoopBootstrapAction do
|
|
4
4
|
Elasticity::HadoopBootstrapAction.new('option', 'value')
|
5
5
|
end
|
6
6
|
|
7
|
+
it { should be_a Elasticity::BootstrapAction }
|
8
|
+
|
7
9
|
its(:name) { should == 'Elasticity Bootstrap Action (Configure Hadoop)' }
|
8
10
|
its(:option) { should == 'option' }
|
9
11
|
its(:value) { should == 'value' }
|
10
|
-
|
11
|
-
describe '#to_aws_bootstrap_action' do
|
12
|
-
|
13
|
-
it 'should create a bootstrap action' do
|
14
|
-
subject.to_aws_bootstrap_action.should ==
|
15
|
-
{
|
16
|
-
:name => 'Elasticity Bootstrap Action (Configure Hadoop)',
|
17
|
-
:script_bootstrap_action => {
|
18
|
-
:path => 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop',
|
19
|
-
:args => ['option', 'value']
|
20
|
-
}
|
21
|
-
}
|
22
|
-
end
|
23
|
-
|
24
|
-
end
|
12
|
+
its(:script) { should == 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop' }
|
25
13
|
|
26
14
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
describe Elasticity::HadoopFileBootstrapAction do
|
2
|
+
|
3
|
+
subject do
|
4
|
+
Elasticity::HadoopFileBootstrapAction.new('config_file')
|
5
|
+
end
|
6
|
+
|
7
|
+
it { should be_a Elasticity::BootstrapAction }
|
8
|
+
|
9
|
+
its(:name) { should == 'Elasticity Bootstrap Action (Configure Hadoop via File)' }
|
10
|
+
its(:option) { should == '--mapred-config-file' }
|
11
|
+
its(:value) { should == 'config_file' }
|
12
|
+
its(:script) { should == 's3n://elasticmapreduce/bootstrap-actions/configure-hadoop' }
|
13
|
+
|
14
|
+
end
|
@@ -4,6 +4,8 @@ describe Elasticity::JobFlow do
|
|
4
4
|
Elasticity::JobFlow.new('access', 'secret')
|
5
5
|
end
|
6
6
|
|
7
|
+
its(:access_key) { should == 'access' }
|
8
|
+
its(:secret_key) { should == 'secret' }
|
7
9
|
its(:action_on_failure) { should == 'TERMINATE_JOB_FLOW' }
|
8
10
|
its(:ec2_key_name) { should == nil }
|
9
11
|
its(:ec2_subnet_id) { should == nil }
|
@@ -17,6 +19,18 @@ describe Elasticity::JobFlow do
|
|
17
19
|
its(:keep_job_flow_alive_when_no_steps) { should == false }
|
18
20
|
its(:placement) { should == 'us-east-1a' }
|
19
21
|
|
22
|
+
describe '.initialize' do
|
23
|
+
it 'should set the access and secret keys to nil by default' do
|
24
|
+
Elasticity::JobFlow.new.tap do |j|
|
25
|
+
j.access_key.should == nil
|
26
|
+
j.secret_key.should == nil
|
27
|
+
end
|
28
|
+
Elasticity::JobFlow.new('_') do |j|
|
29
|
+
j.secret_key.should == nil
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
20
34
|
describe '#instance_count=' do
|
21
35
|
|
22
36
|
context 'when set to more than 1' do
|
@@ -0,0 +1,240 @@
|
|
1
|
+
describe Elasticity::SyncToS3 do
|
2
|
+
|
3
|
+
include FakeFS::SpecHelpers
|
4
|
+
Fog.mock!
|
5
|
+
|
6
|
+
let(:bucket_name) { 'TEST_BUCKET' }
|
7
|
+
let(:sync_to_s3) { Elasticity::SyncToS3.new(bucket_name, '_', '_') }
|
8
|
+
let(:s3) { Fog::Storage.new({:provider => 'AWS', :aws_access_key_id => '', :aws_secret_access_key => ''}) }
|
9
|
+
|
10
|
+
before do
|
11
|
+
Fog::Mock.reset
|
12
|
+
sync_to_s3.stub(:s3).and_return(s3)
|
13
|
+
end
|
14
|
+
|
15
|
+
describe '#initialize' do
|
16
|
+
|
17
|
+
describe 'basic assignment' do
|
18
|
+
|
19
|
+
it 'should set the proper values' do
|
20
|
+
sync = Elasticity::SyncToS3.new('bucket', 'access', 'secret')
|
21
|
+
sync.access_key.should == 'access'
|
22
|
+
sync.secret_key.should == 'secret'
|
23
|
+
sync.bucket_name.should == 'bucket'
|
24
|
+
end
|
25
|
+
|
26
|
+
end
|
27
|
+
|
28
|
+
context 'when access and secret keys are nil' do
|
29
|
+
|
30
|
+
let(:both_keys_nil) { Elasticity::SyncToS3.new('_', nil, nil) }
|
31
|
+
let(:both_keys_missing) { Elasticity::SyncToS3.new('_') }
|
32
|
+
|
33
|
+
before do
|
34
|
+
ENV.stub(:[]).with('AWS_ACCESS_KEY_ID').and_return(access_key)
|
35
|
+
ENV.stub(:[]).with('AWS_SECRET_ACCESS_KEY').and_return(secret_key)
|
36
|
+
end
|
37
|
+
|
38
|
+
context 'when environment variables are present' do
|
39
|
+
let(:access_key) { 'ENV_ACCESS' }
|
40
|
+
let(:secret_key) { 'ENV_SECRET' }
|
41
|
+
it 'should assign them to the keys' do
|
42
|
+
both_keys_nil.access_key.should == 'ENV_ACCESS'
|
43
|
+
both_keys_nil.secret_key.should == 'ENV_SECRET'
|
44
|
+
|
45
|
+
both_keys_missing.access_key.should == 'ENV_ACCESS'
|
46
|
+
both_keys_missing.secret_key.should == 'ENV_SECRET'
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
context 'when environment variables are not present' do
|
51
|
+
|
52
|
+
context 'when access is not set' do
|
53
|
+
let(:access_key) { nil }
|
54
|
+
let(:secret_key) { '_' }
|
55
|
+
it 'should raise an error' do
|
56
|
+
expect {
|
57
|
+
both_keys_nil # Trigger instantiation
|
58
|
+
}.to raise_error(Elasticity::MissingKeyError, 'Please provide an access key or set AWS_ACCESS_KEY_ID.')
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
context 'when secret is not set' do
|
63
|
+
let(:access_key) { '_' }
|
64
|
+
let(:secret_key) { nil }
|
65
|
+
it 'should raise an error' do
|
66
|
+
expect {
|
67
|
+
both_keys_nil # Trigger instantiation
|
68
|
+
}.to raise_error(Elasticity::MissingKeyError, 'Please provide a secret key or set AWS_SECRET_ACCESS_KEY.')
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
end
|
73
|
+
|
74
|
+
end
|
75
|
+
|
76
|
+
end
|
77
|
+
|
78
|
+
describe '#sync' do
|
79
|
+
|
80
|
+
context 'when the bucket exists' do
|
81
|
+
|
82
|
+
before do
|
83
|
+
s3.directories.create(:key => bucket_name)
|
84
|
+
end
|
85
|
+
|
86
|
+
context 'when the local directory exists' do
|
87
|
+
before do
|
88
|
+
FileUtils.mkdir('GOOD_DIR')
|
89
|
+
end
|
90
|
+
it 'should sync that directory' do
|
91
|
+
sync_to_s3.should_receive(:sync_dir).with('GOOD_DIR', 'REMOTE_DIR')
|
92
|
+
sync_to_s3.sync('GOOD_DIR', 'REMOTE_DIR')
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
context 'when the local directory does not exist' do
|
97
|
+
it 'should raise an error' do
|
98
|
+
expect {
|
99
|
+
sync_to_s3.sync('BAD_DIR', '_')
|
100
|
+
}.to raise_error(Elasticity::NoDirectoryError, "Directory 'BAD_DIR' does not exist or is not a directory")
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
context 'when the local directory is not a directory' do
|
105
|
+
before do
|
106
|
+
FileUtils.touch('NOT_A_DIR')
|
107
|
+
end
|
108
|
+
it 'should raise an error' do
|
109
|
+
expect {
|
110
|
+
sync_to_s3.sync('NOT_A_DIR', '_')
|
111
|
+
}.to raise_error(Elasticity::NoDirectoryError, "Directory 'NOT_A_DIR' does not exist or is not a directory")
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
end
|
116
|
+
|
117
|
+
context 'when the bucket does not exist' do
|
118
|
+
let(:bucket_name) { 'BAD_BUCKET' }
|
119
|
+
it 'should raise an error' do
|
120
|
+
expect {
|
121
|
+
sync_to_s3.sync('_', '_')
|
122
|
+
}.to raise_error(Elasticity::NoBucketError, "Bucket 'BAD_BUCKET' does not exist")
|
123
|
+
end
|
124
|
+
end
|
125
|
+
|
126
|
+
end
|
127
|
+
|
128
|
+
describe '#sync_dir' do
|
129
|
+
|
130
|
+
before do
|
131
|
+
s3.directories.create(:key => bucket_name)
|
132
|
+
|
133
|
+
FileUtils.makedirs(File.join(%w(local_dir sub_dir_1)))
|
134
|
+
FileUtils.makedirs(File.join(%w(local_dir sub_dir_2)))
|
135
|
+
|
136
|
+
FileUtils.touch(File.join(%w(local_dir file_1)))
|
137
|
+
FileUtils.touch(File.join(%w(local_dir file_2)))
|
138
|
+
FileUtils.touch(File.join(%w(local_dir sub_dir_1 file_3)))
|
139
|
+
FileUtils.touch(File.join(%w(local_dir sub_dir_1 file_4)))
|
140
|
+
FileUtils.touch(File.join(%w(local_dir sub_dir_2 file_5)))
|
141
|
+
FileUtils.touch(File.join(%w(local_dir sub_dir_2 file_6)))
|
142
|
+
end
|
143
|
+
|
144
|
+
it 'should recursively sync all files and directories' do
|
145
|
+
sync_to_s3.send(:sync_dir, 'local_dir', 'remote_dir')
|
146
|
+
|
147
|
+
%w(
|
148
|
+
remote_dir/file_1
|
149
|
+
remote_dir/file_2
|
150
|
+
remote_dir/sub_dir_1/file_3
|
151
|
+
remote_dir/sub_dir_1/file_4
|
152
|
+
remote_dir/sub_dir_2/file_5
|
153
|
+
remote_dir/sub_dir_2/file_6
|
154
|
+
).each do |key|
|
155
|
+
s3.directories[0].files.map(&:key).should include(key)
|
156
|
+
end
|
157
|
+
end
|
158
|
+
|
159
|
+
end
|
160
|
+
|
161
|
+
describe '#sync_file' do
|
162
|
+
|
163
|
+
let(:local_dir) { '/tmp' }
|
164
|
+
let(:file_name) { 'test.out' }
|
165
|
+
let(:full_path) { File.join([local_dir, file_name]) }
|
166
|
+
let(:remote_dir) { 'job/assets' }
|
167
|
+
let(:remote_path) { "#{remote_dir}/#{file_name}"}
|
168
|
+
let(:file_data) { 'Some test content' }
|
169
|
+
|
170
|
+
before do
|
171
|
+
s3.directories.create(:key => bucket_name)
|
172
|
+
FileUtils.makedirs(local_dir)
|
173
|
+
File.open(full_path, 'w') {|f| f.write(file_data) }
|
174
|
+
end
|
175
|
+
|
176
|
+
it 'should write the specified file into the remote directory' do
|
177
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
178
|
+
s3.directories[0].files.head(remote_path).should_not be_nil
|
179
|
+
end
|
180
|
+
|
181
|
+
it 'should write the contents of the file' do
|
182
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
183
|
+
s3.directories[0].files.head(remote_path).body.should == file_data
|
184
|
+
end
|
185
|
+
|
186
|
+
it 'should write the remote file without public access' do
|
187
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
188
|
+
s3.directories[0].files.head(remote_path).public_url.should be_nil
|
189
|
+
end
|
190
|
+
|
191
|
+
it 'should not write identical content' do
|
192
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
193
|
+
last_modified = s3.directories[0].files.head(remote_path).last_modified
|
194
|
+
Timecop.travel(Time.now + 60)
|
195
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
196
|
+
s3.directories[0].files.head(remote_path).last_modified.should == last_modified
|
197
|
+
end
|
198
|
+
|
199
|
+
context 'when remote dir is a corner case value' do
|
200
|
+
before do
|
201
|
+
sync_to_s3.send(:sync_file, full_path, remote_dir)
|
202
|
+
end
|
203
|
+
|
204
|
+
context 'when remote dir is empty' do
|
205
|
+
let(:remote_dir) {''}
|
206
|
+
it 'should place files in the root without a bunk empty folder name' do
|
207
|
+
s3.directories[0].files.head(file_name).should_not be_nil
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
context 'when remote dir is /' do
|
212
|
+
let(:remote_dir) {'/'}
|
213
|
+
it 'should place files in the root without a bunk empty folder name' do
|
214
|
+
s3.directories[0].files.head(file_name).should_not be_nil
|
215
|
+
end
|
216
|
+
end
|
217
|
+
|
218
|
+
context 'when remote dir starts with a /' do
|
219
|
+
let(:remote_dir) {'/starts_with_slash'}
|
220
|
+
it 'should place files in the root without a bunk empty folder name' do
|
221
|
+
s3.directories[0].files.head('starts_with_slash/test.out').should_not be_nil
|
222
|
+
end
|
223
|
+
end
|
224
|
+
end
|
225
|
+
|
226
|
+
end
|
227
|
+
|
228
|
+
describe '#s3' do
|
229
|
+
let(:connection_test) { Elasticity::SyncToS3.new('_', 'access', 'secret') }
|
230
|
+
it 'should connect to S3 using the specified credentials' do
|
231
|
+
Fog::Storage.should_receive(:new).with({
|
232
|
+
:provider => 'AWS',
|
233
|
+
:aws_access_key_id => 'access',
|
234
|
+
:aws_secret_access_key => 'secret'
|
235
|
+
}).and_return('GOOD_CONNECTION')
|
236
|
+
connection_test.send(:s3).should == 'GOOD_CONNECTION'
|
237
|
+
end
|
238
|
+
end
|
239
|
+
|
240
|
+
end
|
data/spec/spec_helper.rb
CHANGED
@@ -2,6 +2,9 @@ require 'rubygems'
|
|
2
2
|
require 'bundler/setup'
|
3
3
|
require 'elasticity'
|
4
4
|
|
5
|
-
|
5
|
+
require 'timecop'
|
6
|
+
require 'fakefs/spec_helpers'
|
6
7
|
|
7
|
-
|
8
|
+
ENV['RAILS_ENV'] ||= 'test'
|
9
|
+
|
10
|
+
Dir[File.join(File.dirname(__FILE__), 'support', '**', '*.rb')].each { |f| require f }
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: elasticity
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: '2.4'
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-
|
12
|
+
date: 2012-09-02 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rest-client
|
@@ -43,6 +43,22 @@ dependencies:
|
|
43
43
|
- - ! '>='
|
44
44
|
- !ruby/object:Gem::Version
|
45
45
|
version: '0'
|
46
|
+
- !ruby/object:Gem::Dependency
|
47
|
+
name: fog
|
48
|
+
requirement: !ruby/object:Gem::Requirement
|
49
|
+
none: false
|
50
|
+
requirements:
|
51
|
+
- - ! '>='
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: '0'
|
54
|
+
type: :runtime
|
55
|
+
prerelease: false
|
56
|
+
version_requirements: !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ! '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
46
62
|
- !ruby/object:Gem::Dependency
|
47
63
|
name: rake
|
48
64
|
requirement: !ruby/object:Gem::Requirement
|
@@ -75,6 +91,38 @@ dependencies:
|
|
75
91
|
- - ~>
|
76
92
|
- !ruby/object:Gem::Version
|
77
93
|
version: 2.11.0
|
94
|
+
- !ruby/object:Gem::Dependency
|
95
|
+
name: timecop
|
96
|
+
requirement: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ! '>='
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
102
|
+
type: :development
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
none: false
|
106
|
+
requirements:
|
107
|
+
- - ! '>='
|
108
|
+
- !ruby/object:Gem::Version
|
109
|
+
version: '0'
|
110
|
+
- !ruby/object:Gem::Dependency
|
111
|
+
name: fakefs
|
112
|
+
requirement: !ruby/object:Gem::Requirement
|
113
|
+
none: false
|
114
|
+
requirements:
|
115
|
+
- - ~>
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '0.4'
|
118
|
+
type: :development
|
119
|
+
prerelease: false
|
120
|
+
version_requirements: !ruby/object:Gem::Requirement
|
121
|
+
none: false
|
122
|
+
requirements:
|
123
|
+
- - ~>
|
124
|
+
- !ruby/object:Gem::Version
|
125
|
+
version: '0.4'
|
78
126
|
description: Streamlined, Programmatic access to Amazon's Elastic Map Reduce service,
|
79
127
|
driven by the Sharethrough team's requirements for belting out EMR jobs.
|
80
128
|
email:
|
@@ -94,9 +142,11 @@ files:
|
|
94
142
|
- elasticity.gemspec
|
95
143
|
- lib/elasticity.rb
|
96
144
|
- lib/elasticity/aws_request.rb
|
145
|
+
- lib/elasticity/bootstrap_action.rb
|
97
146
|
- lib/elasticity/custom_jar_step.rb
|
98
147
|
- lib/elasticity/emr.rb
|
99
148
|
- lib/elasticity/hadoop_bootstrap_action.rb
|
149
|
+
- lib/elasticity/hadoop_file_bootstrap_action.rb
|
100
150
|
- lib/elasticity/hive_step.rb
|
101
151
|
- lib/elasticity/instance_group.rb
|
102
152
|
- lib/elasticity/job_flow.rb
|
@@ -106,11 +156,14 @@ files:
|
|
106
156
|
- lib/elasticity/pig_step.rb
|
107
157
|
- lib/elasticity/streaming_step.rb
|
108
158
|
- lib/elasticity/support/conditional_raise.rb
|
159
|
+
- lib/elasticity/sync_to_s3.rb
|
109
160
|
- lib/elasticity/version.rb
|
110
161
|
- spec/lib/elasticity/aws_request_spec.rb
|
162
|
+
- spec/lib/elasticity/bootstrap_action_spec.rb
|
111
163
|
- spec/lib/elasticity/custom_jar_step_spec.rb
|
112
164
|
- spec/lib/elasticity/emr_spec.rb
|
113
165
|
- spec/lib/elasticity/hadoop_bootstrap_action_spec.rb
|
166
|
+
- spec/lib/elasticity/hadoop_file_bootstrap_action_spec.rb
|
114
167
|
- spec/lib/elasticity/hive_step_spec.rb
|
115
168
|
- spec/lib/elasticity/instance_group_spec.rb
|
116
169
|
- spec/lib/elasticity/job_flow_integration_spec.rb
|
@@ -121,6 +174,7 @@ files:
|
|
121
174
|
- spec/lib/elasticity/pig_step_spec.rb
|
122
175
|
- spec/lib/elasticity/streaming_step_spec.rb
|
123
176
|
- spec/lib/elasticity/support/conditional_raise_spec.rb
|
177
|
+
- spec/lib/elasticity/sync_to_s3_spec.rb
|
124
178
|
- spec/spec_helper.rb
|
125
179
|
- spec/support/be_a_hash_including_matcher.rb
|
126
180
|
homepage: http://www.github.com/rslifka/elasticity
|
@@ -137,7 +191,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
137
191
|
version: '0'
|
138
192
|
segments:
|
139
193
|
- 0
|
140
|
-
hash:
|
194
|
+
hash: -2552375759855937320
|
141
195
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
142
196
|
none: false
|
143
197
|
requirements:
|
@@ -146,7 +200,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
146
200
|
version: '0'
|
147
201
|
segments:
|
148
202
|
- 0
|
149
|
-
hash:
|
203
|
+
hash: -2552375759855937320
|
150
204
|
requirements: []
|
151
205
|
rubyforge_project:
|
152
206
|
rubygems_version: 1.8.24
|
@@ -155,9 +209,11 @@ specification_version: 3
|
|
155
209
|
summary: Streamlined, programmatic access to Amazon's Elastic Map Reduce service.
|
156
210
|
test_files:
|
157
211
|
- spec/lib/elasticity/aws_request_spec.rb
|
212
|
+
- spec/lib/elasticity/bootstrap_action_spec.rb
|
158
213
|
- spec/lib/elasticity/custom_jar_step_spec.rb
|
159
214
|
- spec/lib/elasticity/emr_spec.rb
|
160
215
|
- spec/lib/elasticity/hadoop_bootstrap_action_spec.rb
|
216
|
+
- spec/lib/elasticity/hadoop_file_bootstrap_action_spec.rb
|
161
217
|
- spec/lib/elasticity/hive_step_spec.rb
|
162
218
|
- spec/lib/elasticity/instance_group_spec.rb
|
163
219
|
- spec/lib/elasticity/job_flow_integration_spec.rb
|
@@ -168,5 +224,6 @@ test_files:
|
|
168
224
|
- spec/lib/elasticity/pig_step_spec.rb
|
169
225
|
- spec/lib/elasticity/streaming_step_spec.rb
|
170
226
|
- spec/lib/elasticity/support/conditional_raise_spec.rb
|
227
|
+
- spec/lib/elasticity/sync_to_s3_spec.rb
|
171
228
|
- spec/spec_helper.rb
|
172
229
|
- spec/support/be_a_hash_including_matcher.rb
|