aws-swf 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,5 @@
1
+ source "https://rubygems.org"
2
+
3
+ gem 'aws-sdk'
4
+ gem 'rake'
5
+ gem 'rspec', group: :test
@@ -0,0 +1,28 @@
1
+ GEM
2
+ remote: https://rubygems.org/
3
+ specs:
4
+ aws-sdk (1.13.0)
5
+ json (~> 1.4)
6
+ nokogiri (< 1.6.0)
7
+ uuidtools (~> 2.1)
8
+ diff-lcs (1.2.4)
9
+ json (1.8.0)
10
+ nokogiri (1.5.10)
11
+ rake (10.1.0)
12
+ rspec (2.14.1)
13
+ rspec-core (~> 2.14.0)
14
+ rspec-expectations (~> 2.14.0)
15
+ rspec-mocks (~> 2.14.0)
16
+ rspec-core (2.14.3)
17
+ rspec-expectations (2.14.0)
18
+ diff-lcs (>= 1.1.3, < 2.0)
19
+ rspec-mocks (2.14.1)
20
+ uuidtools (2.1.4)
21
+
22
+ PLATFORMS
23
+ ruby
24
+
25
+ DEPENDENCIES
26
+ aws-sdk
27
+ rake
28
+ rspec
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2013 Vijay Ramesh <vijay@change.org>, Tim James <tjames@change.org>
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,254 @@
1
+ aws-swf
2
+ ==========
3
+
4
+ aws-swf is our light framework for developing distributed applications in Ruby to run on [AWS Simple Workflow](http://aws.amazon.com/swf/).
5
+
6
+ At [change.org](http://www.change.org) we use aws-swf to drive parallelized and distributed processing for machine-learning driven email targeting. SWF provides the plumbing from the socket up to our application, enabling us to focus on innovating in terms of our data science, performance, UX, etc without having to worry about the complexities of message passing and asynchronous task scheduling across a decentralized system. If a worker dies, a task throws an exception, or somebody spills a cup of coffee on a rack of servers at AWS, configurable timeouts at different levels enable our workflow to be notified of the problem and decide how to handle it. Additionally by focusing on parallelizable algorithms we can scale the number of computational nodes as demand and data sizes dictate - nothing changes in the application code, we simply spin up more workers to handle the additional load.
7
+
8
+ While we use aws-swf on EC2, any resource - including your laptop - can be a task runner. This makes integration testing a breeze - you can unit test the core functionality of your activity task handlers, test the flow control of your decision task handlers, and then actually test end-to-end (against a test domain on SWF) against fixtures from your development box. It is also handy for R&D, if you want to iterate more quickly on a 24-core metal machine (running 24 activity workers, taking advantage of local filesystem speeds), before moving to EC2 and spreading those 24 workers across 12 dual-core instances.
9
+
10
+ For the purposes of this tutorial, we are going to leave dynamic resource allocation and bootstrapping off the table, and just focus on building an application that can be run locally. You can follow along with the example in [sample-app](sample-app/).
11
+
12
+ ## Amazon Simple Workflow
13
+ SWF allows you to define activities (units of work to be performed) and workflows (decision/flow-control logic that schedules activities based on dependencies, handles failures, etc). You register both under a SWF domain, and can then poll that domain for a given tasklist from any resource (EC2, metal, or other cloud). AWS serves as a centralized place to poll for your distributed deciders and workers - handling message coordination and ensuring decision tasks are processed sequentially. Your decider workflows schedule (often massively parallel) activities, await for success/failure, and then act accordingly. New deciders and activity workers can be brought up and down on demand, and with a small amount of care to handling timeouts and failures, your distributed application can be made incredibly robust and resilient to intermittent failures and network connectivity issues, as well as easily adaptable to different data-scales and time-constraints. Look through the [Introduction to Amazon Simple Workflow Service](http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-intro-to-swf.html) docs for more information.
14
+
15
+ ## App Structure
16
+
17
+ An aws-swf application has three basic components:
18
+
19
+ ### Workflows
20
+ These define your decision task handling. A workflow is responsible for starting activities/child-workflows and handling success/failure.
21
+
22
+ ### Activities
23
+ An activity is where your aws-swf application does actual units of work. Your workflow will initiate activities, passing on input data. Returns success or failure back to the workflow.
24
+
25
+ ### Runner
26
+ Your application includes a Boot module that creates a Runner instance. This is what sets a resource up to poll SWF for decisions and activities.
27
+
28
+ ## SampleApp
29
+
30
+ ###[SampleApp::Boot](sample-app/lib/boot.rb)
31
+ extends [SWF::Boot](lib/swf/boot.rb), loads settings from the environment (or a chef data bag, or S3, or locally on the worker node, etc), and defines `swf_runner` which calls your Runner, passing any settings.
32
+
33
+ ```ruby
34
+ module SampleApp::Boot
35
+
36
+ extend SWF::Boot
37
+ extend self
38
+
39
+ def swf_runner
40
+ SampleApp::Runner.new(settings)
41
+ end
42
+
43
+ def settings
44
+ {
45
+ swf_domain: ENV["SWF_DOMAIN"],
46
+ s3_bucket: ENV["S3_BUCKET"],
47
+ s3_path: ENV["S3_PATH"],
48
+ local_data_dir: ENV["LOCAL_DATA_DIR"]
49
+ }
50
+ end
51
+ end
52
+ ```
53
+
54
+ ###[SampleApp::Runner](sample-app/lib/runner.rb)
55
+ subclass of [SWF::Runner](lib/swf/runner.rb), allows you to setup any global settings you want accessible to all workers. Your runner must define `domain_name` and `task_list_name` (probably as methods that parse settings)
56
+
57
+ ```ruby
58
+ def domain_name
59
+ settings[:swf_domain]
60
+ end
61
+
62
+ def task_list_name
63
+ [ settings[:s3_bucket], settings[:s3_path] ].join(":")
64
+ end
65
+ ```
66
+
67
+ You can also redefine `be_worker` or `be_decider` to add before and after hooks:
68
+
69
+ ```ruby
70
+ def be_worker
71
+ # we want this to be done before any activity handler
72
+ # reports to SWF it is ready to pick up an activity task
73
+ build_data_index
74
+ super
75
+ end
76
+
77
+ def build_data_index
78
+ # fetch data from s3, build a binary index, etc
79
+ # make sure to wrap in a mutex so multiple workers
80
+ # on the same resource don't override one-another
81
+ ...
82
+ end
83
+ ```
84
+
85
+ ###[SampleApp::SampleWorkflow](sample-app/lib/sample_workflow.rb)
86
+ A workflow extends [SWF::Workflow](lib/workflows.rb). It should also define a `self.workflow_type` method that calls `effect_workflow_type` to register the module. This is where you can set default timeouts for the workflow type (see the [aws-sdk docs](http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/SimpleWorkflow/WorkflowType.html) for all available parameters). Note that if you change one of these defaults, you must increment `WORKFLOW_VERSION`:
87
+
88
+ ```ruby
89
+ def self.workflow_type
90
+ effect_workflow_type(WORKFLOW_TYPE, WORKFLOW_VERSION,
91
+ default_child_policy: :request_cancel,
92
+ default_task_start_to_close_timeout: 3600,
93
+ default_execution_start_to_close_timeout: 3600,
94
+ )
95
+ end
96
+ ```
97
+
98
+
99
+ The workflow module should also have a `DecisionTaskHandler` inner-class that registers and defines `handle`. This method will be called as new events occur.
100
+
101
+ ```ruby
102
+ class DecisionTaskHandler < SWF::DecisionTaskHandler
103
+ register(WORKFLOW_TYPE, WORKFLOW_VERSION)
104
+
105
+ def handle
106
+ new_events.each {|e| ... }
107
+ end
108
+ end
109
+ ```
110
+
111
+ ####Event handling
112
+ Your workflow does sequential event handling across a distributed network of deciders - scheduling activities, acting on success/failure, creating child workflows, etc. For a full list of history events, [see the docs](http://docs.aws.amazon.com/sdkfornet/latest/apidocs/html/T_Amazon_SimpleWorkflow_Model_HistoryEvent.htm).
113
+
114
+ #####Simple workflow - single activity
115
+ ```ruby
116
+ def handle
117
+ new_events.each {|event|
118
+ case event.event_type
119
+ when 'WorkflowExecutionStarted'
120
+ schedule_sample_activity
121
+ when 'ActivityTaskCompleted'
122
+ decision_task.complete_workflow_execution
123
+ when 'ActivityTaskFailed'
124
+ decision_task.fail_workflow_execution
125
+ end
126
+ }
127
+ end
128
+
129
+ def schedule_sample_activity
130
+ decision_task.schedule_activity_task(SampleActivity.activity_type_sample_activity(runner),
131
+ input: workflow_input.merge({decision_param: 'decision'}).to_json,
132
+ task_list: workflow_task_list
133
+ )
134
+ end
135
+ ```
136
+
137
+ #####Child workflows
138
+ There is a one-to-one correspondance between a workflow module and a workflow type on SWF. However, an application may have multiple child workflows that a parent workflow initiates and handles. A child workflow is just a normal workflow that signals to the parent workflow when execution is complete/failed.
139
+
140
+ ```ruby
141
+ def handle
142
+ child_workflow_failed = false
143
+ scheduled_child_workflows = []
144
+ completed_child_workflows = []
145
+ new_events.each {|event|
146
+ case event.event_type
147
+ when 'WorkflowExecutionStarted'
148
+ scheduled_child_workflows = schedule_child_workflows
149
+ when 'ChildWorkflowExecutionFailed'
150
+ child_workflow_failed = true
151
+ when 'ChildWorkflowExecutionCompleted'
152
+ completed_child_workflows << event.attributes.workflow_execution
153
+ end
154
+ }
155
+
156
+ if child_workflow_failed
157
+ decision_task.fail_workflow_execution
158
+ elsif (scheduled_child_workflows - completed_child_workflows).empty?
159
+ decision_task.complete_workflow_execution
160
+ end
161
+ end
162
+
163
+ def schedule_child_workflows
164
+ 10.times.map {|i|
165
+ decision_task.start_child_workflow_execution(
166
+ AnotherWorkflow.workflow_type,
167
+ input: another_input_hash(i).to_json,
168
+ task_list: decision_task.workflow_execution.task_list,
169
+ tag_list: another_tag_array(i),
170
+ )
171
+ }
172
+ end
173
+ ```
174
+
175
+ #####Multiple activities
176
+ TODO
177
+
178
+ ###[SampleApp::SampleActivity](sample-app/lib/sample_activity.rb)
179
+ An activity module can handle multiple activity types. For each it must define an `activity_type_<activity_name>` class method that receives a runner and calls `runner.effect_activity_type`. This is where you can set activity specific timeouts (again, [see the docs](http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/SimpleWorkflow/ActivityType.html))
180
+
181
+ ```ruby
182
+ def self.activity_type_sample_activity(runner)
183
+ runner.effect_activity_type('sample_activity', '1',
184
+ default_task_heartbeat_timeout: 3600,
185
+ default_task_schedule_to_start_timeout: 3600,
186
+ default_task_schedule_to_close_timeout: 7200,
187
+ default_task_start_to_close_timeout: 3600
188
+ )
189
+ end
190
+ ```
191
+
192
+ Your activity module should also have a `ActivityTaskHandler` inner-class that registers and defines `handle_<activity_name>` methods to handle activity tasks as they are scheduled by decision tasks.
193
+
194
+ ```ruby
195
+ class ActivityTaskHandler < SWF::ActivityTaskHandler
196
+ register
197
+
198
+ def handle_sample_activity
199
+ ...
200
+ end
201
+ end
202
+ ```
203
+
204
+ ##Running your application
205
+
206
+ ###Launching Workers
207
+ Launching workers for workflow and activity tasks is just as simple as calling `SampleApp::Boot.startup(num_deciders, num_workers, wait_for_children, &rescue)`. However in automating resource bootstrapping you might want a simple shell script like [sample-app/bin/swf_run.rb](sample-app/bin/swf_run.rb)
208
+
209
+ ```ruby
210
+ #!/usr/bin/env ruby
211
+
212
+ require './lib/boot'
213
+
214
+ def run!
215
+ startup_hash = ARGV.inject(Hash.new(0)) {|h,i| h[i.to_sym] += 1; h }
216
+ SampleApp::Boot.startup(startup_hash[:d], startup_hash[:w], true)
217
+ end
218
+
219
+ run!
220
+ ```
221
+
222
+ which you can then call via init/upstart/monit/etc:
223
+
224
+ ```shell
225
+ $ SWF_DOMAIN=some_domain S3_BUCKET=some_bucket S3_PATH=some_path LOCAL_DATA_DIR=/tmp ruby ./sample-app/bin/swf_run.rb d d w w w
226
+ ```
227
+
228
+ TODO
229
+ - demonstrate starting workers on multiple physical resources
230
+ - demonstrate automating launching EC2 resources, using tags to bootstrap
231
+ - demonstrate rescue logging to S3
232
+
233
+
234
+ ###Starting a Workflow
235
+
236
+ You start a workflow by calling the `start` method on your workflow module, passing input and configuration options (see [the docs](http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/SimpleWorkflow/WorkflowType.html#start_execution-instance_method) for configuration specifics)
237
+
238
+ ```ruby
239
+ SWF.domain_name = "some_domain"
240
+ SampleWorkflow.start(
241
+ { input_param: "some input" },
242
+ task_list: "some_task_list",
243
+ execution_start_to_close_timeout: 3600,
244
+ )
245
+ ```
246
+
247
+ The workflow will be submitted to SWF; assuming you have started a decision task handler on that domain and task list, the WorkflowExecutionStarted event will be picked up by SampleWorkflow::DecisionTaskHandler#handle
248
+
249
+ See [the integration spec](sample-app/spec/integration/sample_workflow_spec.rb) for an end-to-end example.
250
+
251
+
252
+
253
+ ##Shameless Plug
254
+ This project was supported in very large part by change.org. And we are hiring! If you want to come work with us and help empower people to Change the world while working on amazing technology [check out our jobs page](http://www.change.org/hiring).
@@ -0,0 +1,11 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require './lib/boot'
4
+
5
+ def run!
6
+ startup_hash = ARGV.inject(Hash.new(0)) {|h,i| h[i.to_sym] += 1; h }
7
+ SWF::Boot.startup(startup_hash[:d], startup_hash[:w], true)
8
+ end
9
+
10
+ run!
11
+
@@ -0,0 +1,5 @@
1
+ require 'swf'
2
+ require 'workflows'
3
+ require 'swf/boot'
4
+ require 'swf/decision_task_handler'
5
+ require 'swf/activity_task_handler'
@@ -0,0 +1,54 @@
1
+ require 'aws-sdk'
2
+
3
+ module SWF
4
+
5
+ class UnknownSWFDomain < StandardError; end
6
+ class UndefinedDomainName < StandardError; end
7
+ class UndefinedTaskList < StandardError; end
8
+
9
+ extend self
10
+
11
+ def swf
12
+ @swf ||= AWS::SimpleWorkflow.new
13
+ end
14
+
15
+ def domain_name
16
+ raise UndefinedDomainName, "domain name not defined" unless @domain_name
17
+ @domain_name
18
+ end
19
+
20
+ # in the runner context, where domain_name comes from ENV settings we call
21
+ # FeatureMatrix::SWF.domain_name = FeatureMatrix::Settings.swf_domain
22
+ def domain_name=(d)
23
+ @domain_name = d
24
+ end
25
+
26
+ SLOT_TIME = 1
27
+
28
+ def domain_exists?(d)
29
+ collision = 0
30
+ begin
31
+ swf.domains[d].exists?
32
+ rescue => e
33
+ collision += 1 if collision < 10
34
+ max_slot_delay = 2**collision - 1
35
+ sleep(SLOT_TIME * rand(0 .. max_slot_delay))
36
+ retry
37
+ end
38
+ end
39
+
40
+ def domain
41
+ # if we need a new domain, make it in the aws console
42
+ raise UnknownSWFDomain, "#{domain_name} is not a valid SWF domain" unless domain_exists?(domain_name)
43
+ swf.domains[domain_name]
44
+ end
45
+
46
+ def task_list=(tl)
47
+ @task_list = tl
48
+ end
49
+
50
+ def task_list
51
+ @task_list or raise UndefinedTaskList, "task_list must be defined via SWF.task_list = <task_list>"
52
+ end
53
+
54
+ end
@@ -0,0 +1,51 @@
1
+ require 'swf/task_handler'
2
+ require 'set'
3
+
4
+ module SWF
5
+
6
+ # subclass must call .register(), and define #handle(runner, task)
7
+ class ActivityTaskHandler
8
+ extend TaskHandler
9
+
10
+ @@handler_classes = Set.new
11
+
12
+ attr_reader :runner, :activity_task
13
+ def initialize(runner, task)
14
+ @runner = runner
15
+ @activity_task = task
16
+ end
17
+
18
+ def call_handle
19
+ send self.class.handler_method_name(activity_task)
20
+ end
21
+
22
+ def activity_task_input
23
+ JSON.parse(activity_task.input)
24
+ end
25
+
26
+ # Register statically self (subclass) to handle activities
27
+ def self.register
28
+ @@handler_classes << self
29
+ end
30
+
31
+ def self.fail!(task, args={})
32
+ task.fail!(args)
33
+ end
34
+
35
+ def self.find_handler_class(task)
36
+ @@handler_classes.find {|x| x.instance_methods.include? handler_method_name task }
37
+ # TODO: detect when two classes define the same named handle_* method ?!?!
38
+ end
39
+
40
+ def self.configuration_help_message
41
+ "Each activity task handler running on this task list in this domain must provide a handler class with a handle_* function for this activity_type's name.\n" +
42
+ "I only have these classes: #{@@handler_classes.inspect}"
43
+ end
44
+
45
+ def self.handler_method_name(task)
46
+ "handle_#{task.activity_type.name}".to_sym
47
+ end
48
+
49
+ end
50
+
51
+ end
@@ -0,0 +1,104 @@
1
+ #!/usr/bin/env ruby
2
+ require 'json'
3
+ require 'swf/runner'
4
+
5
+ module SWF; end
6
+
7
+ module SWF::Boot
8
+
9
+ class DeciderStartupFailure < StandardError; end
10
+ class WorkerStartupFailure < StandardError; end
11
+
12
+ extend self
13
+
14
+ def startup(deciders, workers, wait_for_children = false, &at_rescue)
15
+ child_pids = deciders.to_i.times.map {
16
+ Process.fork {
17
+ Process.daemon(true) unless wait_for_children
18
+ rescued = false
19
+ begin
20
+ swf_runner.be_decider
21
+ rescue => e
22
+ error = {
23
+ error: e.inspect,
24
+ backtrace: e.backtrace
25
+ }
26
+ if rescued
27
+ begin
28
+ raise SWF::Boot::DeciderStartupFailure, JSON.pretty_unparse(error)
29
+ rescue SWF::Boot::DeciderStartupFailure => rescued_e
30
+ if at_rescue
31
+ at_rescue.call(rescued_e.to_s)
32
+ else
33
+ raise rescued_e
34
+ end
35
+ end
36
+ else
37
+ rescued = true
38
+ retry
39
+ end
40
+ end
41
+ }
42
+ }
43
+
44
+ child_pids += workers.to_i.times.map {
45
+ Process.fork {
46
+ Process.daemon(true) unless wait_for_children
47
+ rescued = false
48
+ begin
49
+ swf_runner.be_worker
50
+ rescue => e
51
+ error = {
52
+ error: e.inspect,
53
+ backtrace: e.backtrace
54
+ }
55
+ if rescued
56
+ begin
57
+ raise SWF::Boot::WorkerStartupFailure, JSON.pretty_unparse(error)
58
+ rescue SWF::Boot::WorkerStartupFailure => rescued_e
59
+ if at_rescue
60
+ at_rescue.call(rescued_e.to_s)
61
+ else
62
+ raise rescued_e
63
+ end
64
+ end
65
+ else
66
+ rescued = true
67
+ retry
68
+ end
69
+ end
70
+ }
71
+ }
72
+
73
+ puts "Forked #{deciders} deciders and #{workers} workers..."
74
+
75
+ if wait_for_children
76
+ %w(TERM INT).each {|signal| Signal.trap(signal) { terminate_children(child_pids) } }
77
+ puts "Waiting on them..."
78
+ child_pids.each {|pid| Process.wait(pid) }
79
+ else
80
+ child_pids.each {|pid| Process.detach(pid) }
81
+ end
82
+
83
+ child_pids
84
+
85
+ end
86
+
87
+ def terminate_children(child_pids)
88
+ child_pids.each {|pid|
89
+ puts "Terminating #{pid}"
90
+ Process.kill("TERM", pid)
91
+ }
92
+ end
93
+
94
+ def swf_runner
95
+ # define this in your usage
96
+ SWF::Runner.new(settings[:domain_name], settings[:task_list_name])
97
+ end
98
+
99
+ def settings
100
+ # override this in your usage to pull settings from, e.g., ENV or EC2 tags
101
+ {domain_name: 'domain', task_list_name: 'task_list_name'}
102
+ end
103
+
104
+ end