gush 0.4.1 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -1
- data/.travis.yml +2 -2
- data/CHANGELOG.md +33 -1
- data/README.md +195 -80
- data/gush.gemspec +6 -5
- data/lib/gush.rb +0 -14
- data/lib/gush/cli.rb +3 -18
- data/lib/gush/cli/overview.rb +1 -1
- data/lib/gush/client.rb +8 -32
- data/lib/gush/configuration.rb +4 -6
- data/lib/gush/graph.rb +2 -1
- data/lib/gush/job.rb +12 -16
- data/lib/gush/worker.rb +21 -49
- data/lib/gush/workflow.rb +9 -5
- data/spec/{Gushfile.rb → Gushfile} +0 -0
- data/spec/features/integration_spec.rb +62 -23
- data/spec/gush/client_spec.rb +1 -1
- data/spec/gush/configuration_spec.rb +0 -3
- data/spec/gush/job_spec.rb +3 -3
- data/spec/gush/worker_spec.rb +33 -41
- data/spec/gush/workflow_spec.rb +4 -2
- data/spec/gush_spec.rb +4 -4
- data/spec/spec_helper.rb +23 -9
- metadata +26 -12
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5ac3b43abd8eeb7cc6403a0be5064b0dd0ba6f2a
|
4
|
+
data.tar.gz: 8038bf272a8562a632cd67ae9dc8adfdc59e50a6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 155e88cd3026703aac1a00ebbd02d64380402982eeabd7f384f09aec654f93f48bb934861c6d7208e45eba860c8dadbae0f7e0f19c280cb68ed26941d674dd72
|
7
|
+
data.tar.gz: 5214ae1bc690bf3d84afaf27e3748be2526b63112c56f203e292fb73604606942172e94a7722028f3f809012b84f708978320ce868ca075d6427c391112cb9c3
|
data/.gitignore
CHANGED
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,35 @@
|
|
1
|
-
#
|
1
|
+
# Changelog
|
2
|
+
|
3
|
+
All notable changes to this project will be documented in this file.
|
4
|
+
|
5
|
+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
6
|
+
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
7
|
+
|
8
|
+
## 1.0.0 - 2017-10-02
|
9
|
+
|
10
|
+
### Added
|
11
|
+
|
12
|
+
- **BREAKING CHANGE** Gush now uses ActiveJob instead of directly Sidekiq, this allows programmers to use multiple backends, instead of just one. Including in-process or even synchronous backends. See http://guides.rubyonrails.org/active_job_basics.html
|
13
|
+
|
14
|
+
### Fixed
|
15
|
+
|
16
|
+
- Fix graph rendering with `gush viz` command. Sometimes it rendered the last job detached from others, because it was using a class name instead of job name as ID.
|
17
|
+
- Fix performance problems with unserializing jobs. This greatly **increased performance** by avoiding redundant calls to Redis storage. Should help a lot with huge workflows spawning thousands of jobs. Previously each job loaded whole workflow instance when executed.
|
18
|
+
|
19
|
+
### Changed
|
20
|
+
|
21
|
+
- **BREAKING CHANGE** `Gushfile.rb` is now renamed to `Gushfile`
|
22
|
+
- **BREAKING CHANGE** Internal code for reporting status via Redis pub/sub has been removed, since it wasn't used for a long time.
|
23
|
+
- **BREAKING CHANGE** jobs are expected to have a `perform` method instead of `work` like in < 1.0.0 versions.
|
24
|
+
- **BREAKING CHANGE** `payloads` method available inside jobs is now an array of hashes, instead of a hash, this allows for a more flexible approach to reusing a single job in many situations. Previously payloads were grouped by predecessor's class name, so you were forced to hardcode that class name in its descendants' code.
|
25
|
+
|
26
|
+
### Removed
|
27
|
+
|
28
|
+
- `gush workers` command is now removed. This is now up to the developer to start background processes depending on chosen ActiveJob adapter.
|
29
|
+
- `environment` was removed since it was no longer needed (it was Sidekiq specific)
|
30
|
+
|
31
|
+
## 0.4.0
|
32
|
+
|
33
|
+
### Removed
|
2
34
|
|
3
35
|
- remove hard dependency on Yajl, so Gush can work with non-MRI Rubies ([#31](https://github.com/chaps-io/gush/pull/31) by [Nick Rakochy](https://github.com/chaps-io/gush/pull/31))
|
data/README.md
CHANGED
@@ -2,38 +2,65 @@
|
|
2
2
|
|
3
3
|
## [![](http://i.imgur.com/ya8Wnyl.png)](https://chaps.io) proudly made by [Chaps](https://chaps.io)
|
4
4
|
|
5
|
-
Gush is a parallel workflow runner using only Redis as
|
5
|
+
Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
|
6
6
|
|
7
7
|
## Theory
|
8
8
|
|
9
|
-
Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
|
9
|
+
Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
|
10
|
+
|
11
|
+
## **WARNING - version notice**
|
12
|
+
|
13
|
+
This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
|
14
|
+
|
10
15
|
## Installation
|
11
16
|
|
12
|
-
Add
|
17
|
+
### 1. Add `gush` to Gemfile
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
gem 'gush', '~> 1.0.0'
|
21
|
+
```
|
22
|
+
|
23
|
+
### 2. Create `Gushfile`
|
24
|
+
|
25
|
+
When using Gush and its CLI commands you need a `Gushfile` in the root directory.
|
26
|
+
`Gushfile` should require all your workflows and jobs.
|
13
27
|
|
14
|
-
|
28
|
+
#### Ruby on Rails
|
15
29
|
|
16
|
-
|
30
|
+
For RoR it is enough to require the full environment:
|
17
31
|
|
18
|
-
|
32
|
+
```ruby
|
33
|
+
require_relative './config/environment.rb'
|
34
|
+
```
|
19
35
|
|
20
|
-
|
36
|
+
and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
|
21
37
|
|
22
|
-
|
38
|
+
```ruby
|
39
|
+
config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
|
40
|
+
```
|
23
41
|
|
24
|
-
|
42
|
+
#### Ruby
|
25
43
|
|
26
|
-
|
44
|
+
Simply require any jobs and workflows manually in `Gushfile`:
|
45
|
+
|
46
|
+
```ruby
|
47
|
+
require_relative 'lib/workflows/example_workflow.rb'
|
48
|
+
require_relative 'lib/jobs/some_job.rb'
|
49
|
+
require_relative 'lib/jobs/some_other_job.rb'
|
50
|
+
```
|
51
|
+
|
52
|
+
|
53
|
+
## Example
|
27
54
|
|
28
55
|
The DSL for defining jobs consists of a single `run` method.
|
29
56
|
Here is a complete example of a workflow you can create:
|
30
57
|
|
31
58
|
```ruby
|
32
|
-
# workflows/sample_workflow.rb
|
59
|
+
# app/workflows/sample_workflow.rb
|
33
60
|
class SampleWorkflow < Gush::Workflow
|
34
61
|
def configure(url_to_fetch_from)
|
35
62
|
run FetchJob1, params: { url: url_to_fetch_from }
|
36
|
-
run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
|
63
|
+
run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
|
37
64
|
|
38
65
|
run PersistJob1, after: FetchJob1
|
39
66
|
run PersistJob2, after: FetchJob2
|
@@ -47,62 +74,82 @@ class SampleWorkflow < Gush::Workflow
|
|
47
74
|
end
|
48
75
|
```
|
49
76
|
|
50
|
-
|
77
|
+
and this is how the graph will look like:
|
51
78
|
|
52
|
-
|
53
|
-
|
54
|
-
|
79
|
+
![SampleWorkflow](https://i.imgur.com/DFh6j51.png)
|
80
|
+
|
81
|
+
|
82
|
+
## Defining workflows
|
83
|
+
|
84
|
+
Let's start with the simplest workflow possible, consisting of a single job:
|
55
85
|
|
56
|
-
|
86
|
+
```ruby
|
87
|
+
class SimpleWorkflow < Gush::Workflow
|
88
|
+
def configure
|
89
|
+
run DownloadJob
|
90
|
+
end
|
91
|
+
end
|
92
|
+
```
|
57
93
|
|
58
|
-
|
94
|
+
Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
|
59
95
|
|
96
|
+
```ruby
|
97
|
+
class SimpleWorkflow < Gush::Workflow
|
98
|
+
def configure
|
99
|
+
run DownloadJob
|
100
|
+
run SaveJob, after: DownloadJob
|
101
|
+
end
|
102
|
+
end
|
103
|
+
```
|
60
104
|
|
61
|
-
|
105
|
+
We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
|
62
106
|
|
63
|
-
|
107
|
+
But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
|
64
108
|
|
65
109
|
```ruby
|
66
|
-
|
67
|
-
class SampleWorkflow < Gush::Workflow
|
110
|
+
class SimpleWorkflow < Gush::Workflow
|
68
111
|
def configure
|
69
|
-
run
|
112
|
+
run FirstDownloadJob
|
113
|
+
run SecondDownloadJob
|
114
|
+
|
115
|
+
run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
|
70
116
|
end
|
71
117
|
end
|
72
118
|
```
|
73
119
|
|
74
|
-
|
120
|
+
Now `SaveJob` will only execute after both its parents finish without errors.
|
121
|
+
|
122
|
+
With this simple syntax you can build any complex workflows you can imagine!
|
75
123
|
|
76
|
-
####
|
124
|
+
#### Alternative way
|
77
125
|
|
78
|
-
|
126
|
+
`run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
|
79
127
|
|
80
128
|
```ruby
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
129
|
+
class SimpleWorkflow < Gush::Workflow
|
130
|
+
def configure
|
131
|
+
run FirstDownloadJob, before: SaveJob
|
132
|
+
run SecondDownloadJob, before: SaveJob
|
85
133
|
|
86
|
-
|
134
|
+
run SaveJob
|
87
135
|
end
|
88
136
|
end
|
89
137
|
```
|
90
138
|
|
91
|
-
|
139
|
+
You can use whatever way you find more readable or even both at once :)
|
92
140
|
|
93
|
-
|
141
|
+
### Passing arguments to workflows
|
94
142
|
|
95
143
|
Workflows can accept any primitive arguments in their constructor, which then will be available in your
|
96
144
|
`configure` method.
|
97
145
|
|
98
|
-
|
146
|
+
Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
|
99
147
|
|
100
148
|
```ruby
|
101
|
-
# app/workflows/sample_workflow.rb
|
102
149
|
class PublishBookWorkflow < Gush::Workflow
|
103
150
|
def configure(url, isbn)
|
104
151
|
run FetchBook, params: { url: url }
|
105
|
-
run PublishBook, params: { book_isbn: isbn }
|
152
|
+
run PublishBook, params: { book_isbn: isbn }, after: FetchBook
|
106
153
|
end
|
107
154
|
end
|
108
155
|
```
|
@@ -110,57 +157,106 @@ end
|
|
110
157
|
and then create your workflow with those arguments:
|
111
158
|
|
112
159
|
```ruby
|
113
|
-
PublishBookWorkflow.
|
160
|
+
PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
|
161
|
+
```
|
162
|
+
|
163
|
+
and that's basically it for defining workflows, see below on how to define jobs:
|
164
|
+
|
165
|
+
## Defining jobs
|
166
|
+
|
167
|
+
The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
|
168
|
+
|
169
|
+
```ruby
|
170
|
+
class FetchBook < Gush::Job
|
171
|
+
def perform
|
172
|
+
# do some fetching from remote APIs
|
173
|
+
end
|
174
|
+
end
|
114
175
|
```
|
115
176
|
|
177
|
+
But what about those params we passed in the previous step?
|
116
178
|
|
117
|
-
|
179
|
+
## Passing parameters into jobs
|
118
180
|
|
119
|
-
|
181
|
+
To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
|
120
182
|
|
121
|
-
|
183
|
+
So, inside workflow:
|
122
184
|
|
123
185
|
```ruby
|
124
|
-
|
125
|
-
|
186
|
+
(...)
|
187
|
+
run FetchBook, params: {url: "http://url.com/book.pdf"}
|
188
|
+
(...)
|
126
189
|
```
|
127
190
|
|
128
|
-
|
191
|
+
and within the job we can access them like this:
|
129
192
|
|
130
193
|
```ruby
|
131
|
-
|
194
|
+
class FetchBook < Gush::Job
|
195
|
+
def perform
|
196
|
+
# you can access `params` method here, for example:
|
197
|
+
|
198
|
+
params #=> {url: "http://url.com/book.pdf"}
|
199
|
+
end
|
200
|
+
end
|
132
201
|
```
|
133
202
|
|
134
|
-
|
203
|
+
## Executing workflows
|
135
204
|
|
136
|
-
|
205
|
+
Now that we have defined our workflow and its jobs, we can use it:
|
206
|
+
|
207
|
+
### 1. Start background worker process
|
208
|
+
|
209
|
+
**Important**: The command to start background workers depends on the backend you chose for ActiveJob.
|
210
|
+
For example, in case of Sidekiq this would be:
|
137
211
|
|
138
212
|
```
|
139
|
-
bundle exec gush
|
213
|
+
bundle exec sidekiq -q gush
|
140
214
|
```
|
141
215
|
|
142
|
-
|
216
|
+
**[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
|
217
|
+
|
218
|
+
**Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
|
219
|
+
|
220
|
+
|
221
|
+
### 2. Create the workflow instance
|
222
|
+
|
223
|
+
```ruby
|
224
|
+
flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
|
225
|
+
```
|
226
|
+
|
227
|
+
### 3. Start the workflow
|
143
228
|
|
144
229
|
```ruby
|
145
230
|
flow.start!
|
146
231
|
```
|
147
232
|
|
148
|
-
Now Gush will start processing jobs in background using
|
149
|
-
|
233
|
+
Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
|
234
|
+
|
235
|
+
### 4. Monitor its progress:
|
236
|
+
|
237
|
+
```ruby
|
238
|
+
flow.reload
|
239
|
+
flow.status
|
240
|
+
#=> :running|:finished|:failed
|
241
|
+
```
|
242
|
+
|
243
|
+
`reload` is needed to see the latest status, since workflows are updated asynchronously.
|
244
|
+
|
245
|
+
## Advanced features
|
150
246
|
|
151
247
|
### Pipelining
|
152
248
|
|
153
|
-
Gush offers a useful
|
249
|
+
Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
|
154
250
|
|
155
251
|
**Example:**
|
156
252
|
|
157
253
|
Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
|
158
|
-
The latter needs to know where the first one
|
254
|
+
The latter needs to know where the first one saved the file to be able to open it.
|
159
255
|
|
160
256
|
|
161
257
|
```ruby
|
162
258
|
class DownloadVideo < Gush::Job
|
163
|
-
def
|
259
|
+
def perform
|
164
260
|
downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
|
165
261
|
|
166
262
|
output(downloader.file_path)
|
@@ -168,44 +264,68 @@ class DownloadVideo < Gush::Job
|
|
168
264
|
end
|
169
265
|
```
|
170
266
|
|
171
|
-
`output` method is
|
267
|
+
`output` method is used to ouput data from the job to all dependant jobs.
|
172
268
|
|
173
|
-
Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload
|
269
|
+
Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
|
174
270
|
|
175
271
|
```ruby
|
176
272
|
class EncodeVideo < Gush::Job
|
177
|
-
def
|
178
|
-
video_path = payloads[
|
273
|
+
def perform
|
274
|
+
video_path = payloads.first[:output]
|
179
275
|
end
|
180
276
|
end
|
181
277
|
```
|
182
278
|
|
183
|
-
`payloads` is
|
279
|
+
`payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
|
184
280
|
|
185
|
-
**Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
|
186
|
-
the `payloads` hash will look like this:
|
187
281
|
|
188
282
|
```ruby
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
283
|
+
[
|
284
|
+
{
|
285
|
+
id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
|
286
|
+
class: "DownloadVideo",
|
287
|
+
output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
|
288
|
+
}
|
289
|
+
]
|
193
290
|
```
|
194
291
|
|
292
|
+
**Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
|
293
|
+
|
294
|
+
### Dynamic workflows
|
195
295
|
|
196
|
-
|
296
|
+
There might be a case when you have to construct the workflow dynamically depending on the input.
|
297
|
+
|
298
|
+
As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
|
197
299
|
|
198
|
-
#### In Ruby:
|
199
300
|
|
200
301
|
```ruby
|
201
|
-
|
202
|
-
|
203
|
-
|
302
|
+
|
303
|
+
class NotifyWorkflow < Gush::Workflow
|
304
|
+
def configure(user_ids)
|
305
|
+
notification_jobs = user_ids.map do |user_id|
|
306
|
+
run NotificationJob, params: {user_id: user_id}
|
307
|
+
end
|
308
|
+
|
309
|
+
run AdminNotificationJob, after: notification_jobs
|
310
|
+
end
|
311
|
+
end
|
204
312
|
```
|
205
313
|
|
206
|
-
`
|
314
|
+
We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
|
207
315
|
|
208
|
-
|
316
|
+
Now, when we create the workflow like this:
|
317
|
+
|
318
|
+
```ruby
|
319
|
+
flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
|
320
|
+
```
|
321
|
+
|
322
|
+
it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
|
323
|
+
|
324
|
+
![DynamicWorkflow](https://i.imgur.com/HOI3fjc.png)
|
325
|
+
|
326
|
+
## Command line interface (CLI)
|
327
|
+
|
328
|
+
### Checking status
|
209
329
|
|
210
330
|
- of a specific workflow:
|
211
331
|
|
@@ -219,18 +339,13 @@ flow.status
|
|
219
339
|
bundle exec gush list
|
220
340
|
```
|
221
341
|
|
342
|
+
### Vizualizing workflows as image
|
222
343
|
|
223
|
-
|
224
|
-
|
225
|
-
When using Gush and its CLI commands you need a Gushfile.rb in root directory.
|
226
|
-
Gushfile should require all your Workflows and jobs, for example:
|
344
|
+
This requires that you have imagemagick installed on your computer:
|
227
345
|
|
228
|
-
```ruby
|
229
|
-
require_relative './lib/your_project'
|
230
346
|
|
231
|
-
|
232
|
-
|
233
|
-
end
|
347
|
+
```
|
348
|
+
bundle exec gush viz <NameOfTheWorkflow>
|
234
349
|
```
|
235
350
|
|
236
351
|
## Contributors
|