gush 0.4.1 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -1
- data/.travis.yml +2 -2
- data/CHANGELOG.md +33 -1
- data/README.md +195 -80
- data/gush.gemspec +6 -5
- data/lib/gush.rb +0 -14
- data/lib/gush/cli.rb +3 -18
- data/lib/gush/cli/overview.rb +1 -1
- data/lib/gush/client.rb +8 -32
- data/lib/gush/configuration.rb +4 -6
- data/lib/gush/graph.rb +2 -1
- data/lib/gush/job.rb +12 -16
- data/lib/gush/worker.rb +21 -49
- data/lib/gush/workflow.rb +9 -5
- data/spec/{Gushfile.rb → Gushfile} +0 -0
- data/spec/features/integration_spec.rb +62 -23
- data/spec/gush/client_spec.rb +1 -1
- data/spec/gush/configuration_spec.rb +0 -3
- data/spec/gush/job_spec.rb +3 -3
- data/spec/gush/worker_spec.rb +33 -41
- data/spec/gush/workflow_spec.rb +4 -2
- data/spec/gush_spec.rb +4 -4
- data/spec/spec_helper.rb +23 -9
- metadata +26 -12
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5ac3b43abd8eeb7cc6403a0be5064b0dd0ba6f2a
|
4
|
+
data.tar.gz: 8038bf272a8562a632cd67ae9dc8adfdc59e50a6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 155e88cd3026703aac1a00ebbd02d64380402982eeabd7f384f09aec654f93f48bb934861c6d7208e45eba860c8dadbae0f7e0f19c280cb68ed26941d674dd72
|
7
|
+
data.tar.gz: 5214ae1bc690bf3d84afaf27e3748be2526b63112c56f203e292fb73604606942172e94a7722028f3f809012b84f708978320ce868ca075d6427c391112cb9c3
|
data/.gitignore
CHANGED
data/.travis.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,35 @@
|
|
1
|
-
#
|
1
|
+
# Changelog
|
2
|
+
|
3
|
+
All notable changes to this project will be documented in this file.
|
4
|
+
|
5
|
+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
6
|
+
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
7
|
+
|
8
|
+
## 1.0.0 - 2017-10-02
|
9
|
+
|
10
|
+
### Added
|
11
|
+
|
12
|
+
- **BREAKING CHANGE** Gush now uses ActiveJob instead of directly Sidekiq, this allows programmers to use multiple backends, instead of just one. Including in-process or even synchronous backends. See http://guides.rubyonrails.org/active_job_basics.html
|
13
|
+
|
14
|
+
### Fixed
|
15
|
+
|
16
|
+
- Fix graph rendering with `gush viz` command. Sometimes it rendered the last job detached from others, because it was using a class name instead of job name as ID.
|
17
|
+
- Fix performance problems with unserializing jobs. This greatly **increased performance** by avoiding redundant calls to Redis storage. Should help a lot with huge workflows spawning thousands of jobs. Previously each job loaded whole workflow instance when executed.
|
18
|
+
|
19
|
+
### Changed
|
20
|
+
|
21
|
+
- **BREAKING CHANGE** `Gushfile.rb` is now renamed to `Gushfile`
|
22
|
+
- **BREAKING CHANGE** Internal code for reporting status via Redis pub/sub has been removed, since it wasn't used for a long time.
|
23
|
+
- **BREAKING CHANGE** jobs are expected to have a `perform` method instead of `work` like in < 1.0.0 versions.
|
24
|
+
- **BREAKING CHANGE** `payloads` method available inside jobs is now an array of hashes, instead of a hash, this allows for a more flexible approach to reusing a single job in many situations. Previously payloads were grouped by predecessor's class name, so you were forced to hardcode that class name in its descendants' code.
|
25
|
+
|
26
|
+
### Removed
|
27
|
+
|
28
|
+
- `gush workers` command is now removed. This is now up to the developer to start background processes depending on chosen ActiveJob adapter.
|
29
|
+
- `environment` was removed since it was no longer needed (it was Sidekiq specific)
|
30
|
+
|
31
|
+
## 0.4.0
|
32
|
+
|
33
|
+
### Removed
|
2
34
|
|
3
35
|
- remove hard dependency on Yajl, so Gush can work with non-MRI Rubies ([#31](https://github.com/chaps-io/gush/pull/31) by [Nick Rakochy](https://github.com/chaps-io/gush/pull/31))
|
data/README.md
CHANGED
@@ -2,38 +2,65 @@
|
|
2
2
|
|
3
3
|
## [](https://chaps.io) proudly made by [Chaps](https://chaps.io)
|
4
4
|
|
5
|
-
Gush is a parallel workflow runner using only Redis as
|
5
|
+
Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
|
6
6
|
|
7
7
|
## Theory
|
8
8
|
|
9
|
-
Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
|
9
|
+
Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
|
10
|
+
|
11
|
+
## **WARNING - version notice**
|
12
|
+
|
13
|
+
This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
|
14
|
+
|
10
15
|
## Installation
|
11
16
|
|
12
|
-
Add
|
17
|
+
### 1. Add `gush` to Gemfile
|
18
|
+
|
19
|
+
```ruby
|
20
|
+
gem 'gush', '~> 1.0.0'
|
21
|
+
```
|
22
|
+
|
23
|
+
### 2. Create `Gushfile`
|
24
|
+
|
25
|
+
When using Gush and its CLI commands you need a `Gushfile` in the root directory.
|
26
|
+
`Gushfile` should require all your workflows and jobs.
|
13
27
|
|
14
|
-
|
28
|
+
#### Ruby on Rails
|
15
29
|
|
16
|
-
|
30
|
+
For RoR it is enough to require the full environment:
|
17
31
|
|
18
|
-
|
32
|
+
```ruby
|
33
|
+
require_relative './config/environment.rb'
|
34
|
+
```
|
19
35
|
|
20
|
-
|
36
|
+
and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
|
21
37
|
|
22
|
-
|
38
|
+
```ruby
|
39
|
+
config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
|
40
|
+
```
|
23
41
|
|
24
|
-
|
42
|
+
#### Ruby
|
25
43
|
|
26
|
-
|
44
|
+
Simply require any jobs and workflows manually in `Gushfile`:
|
45
|
+
|
46
|
+
```ruby
|
47
|
+
require_relative 'lib/workflows/example_workflow.rb'
|
48
|
+
require_relative 'lib/jobs/some_job.rb'
|
49
|
+
require_relative 'lib/jobs/some_other_job.rb'
|
50
|
+
```
|
51
|
+
|
52
|
+
|
53
|
+
## Example
|
27
54
|
|
28
55
|
The DSL for defining jobs consists of a single `run` method.
|
29
56
|
Here is a complete example of a workflow you can create:
|
30
57
|
|
31
58
|
```ruby
|
32
|
-
# workflows/sample_workflow.rb
|
59
|
+
# app/workflows/sample_workflow.rb
|
33
60
|
class SampleWorkflow < Gush::Workflow
|
34
61
|
def configure(url_to_fetch_from)
|
35
62
|
run FetchJob1, params: { url: url_to_fetch_from }
|
36
|
-
run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
|
63
|
+
run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
|
37
64
|
|
38
65
|
run PersistJob1, after: FetchJob1
|
39
66
|
run PersistJob2, after: FetchJob2
|
@@ -47,62 +74,82 @@ class SampleWorkflow < Gush::Workflow
|
|
47
74
|
end
|
48
75
|
```
|
49
76
|
|
50
|
-
|
77
|
+
and this is how the graph will look like:
|
51
78
|
|
52
|
-
|
53
|
-
|
54
|
-
|
79
|
+

|
80
|
+
|
81
|
+
|
82
|
+
## Defining workflows
|
83
|
+
|
84
|
+
Let's start with the simplest workflow possible, consisting of a single job:
|
55
85
|
|
56
|
-
|
86
|
+
```ruby
|
87
|
+
class SimpleWorkflow < Gush::Workflow
|
88
|
+
def configure
|
89
|
+
run DownloadJob
|
90
|
+
end
|
91
|
+
end
|
92
|
+
```
|
57
93
|
|
58
|
-
|
94
|
+
Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
|
59
95
|
|
96
|
+
```ruby
|
97
|
+
class SimpleWorkflow < Gush::Workflow
|
98
|
+
def configure
|
99
|
+
run DownloadJob
|
100
|
+
run SaveJob, after: DownloadJob
|
101
|
+
end
|
102
|
+
end
|
103
|
+
```
|
60
104
|
|
61
|
-
|
105
|
+
We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
|
62
106
|
|
63
|
-
|
107
|
+
But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
|
64
108
|
|
65
109
|
```ruby
|
66
|
-
|
67
|
-
class SampleWorkflow < Gush::Workflow
|
110
|
+
class SimpleWorkflow < Gush::Workflow
|
68
111
|
def configure
|
69
|
-
run
|
112
|
+
run FirstDownloadJob
|
113
|
+
run SecondDownloadJob
|
114
|
+
|
115
|
+
run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
|
70
116
|
end
|
71
117
|
end
|
72
118
|
```
|
73
119
|
|
74
|
-
|
120
|
+
Now `SaveJob` will only execute after both its parents finish without errors.
|
121
|
+
|
122
|
+
With this simple syntax you can build any complex workflows you can imagine!
|
75
123
|
|
76
|
-
####
|
124
|
+
#### Alternative way
|
77
125
|
|
78
|
-
|
126
|
+
`run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
|
79
127
|
|
80
128
|
```ruby
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
129
|
+
class SimpleWorkflow < Gush::Workflow
|
130
|
+
def configure
|
131
|
+
run FirstDownloadJob, before: SaveJob
|
132
|
+
run SecondDownloadJob, before: SaveJob
|
85
133
|
|
86
|
-
|
134
|
+
run SaveJob
|
87
135
|
end
|
88
136
|
end
|
89
137
|
```
|
90
138
|
|
91
|
-
|
139
|
+
You can use whatever way you find more readable or even both at once :)
|
92
140
|
|
93
|
-
|
141
|
+
### Passing arguments to workflows
|
94
142
|
|
95
143
|
Workflows can accept any primitive arguments in their constructor, which then will be available in your
|
96
144
|
`configure` method.
|
97
145
|
|
98
|
-
|
146
|
+
Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
|
99
147
|
|
100
148
|
```ruby
|
101
|
-
# app/workflows/sample_workflow.rb
|
102
149
|
class PublishBookWorkflow < Gush::Workflow
|
103
150
|
def configure(url, isbn)
|
104
151
|
run FetchBook, params: { url: url }
|
105
|
-
run PublishBook, params: { book_isbn: isbn }
|
152
|
+
run PublishBook, params: { book_isbn: isbn }, after: FetchBook
|
106
153
|
end
|
107
154
|
end
|
108
155
|
```
|
@@ -110,57 +157,106 @@ end
|
|
110
157
|
and then create your workflow with those arguments:
|
111
158
|
|
112
159
|
```ruby
|
113
|
-
PublishBookWorkflow.
|
160
|
+
PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
|
161
|
+
```
|
162
|
+
|
163
|
+
and that's basically it for defining workflows, see below on how to define jobs:
|
164
|
+
|
165
|
+
## Defining jobs
|
166
|
+
|
167
|
+
The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
|
168
|
+
|
169
|
+
```ruby
|
170
|
+
class FetchBook < Gush::Job
|
171
|
+
def perform
|
172
|
+
# do some fetching from remote APIs
|
173
|
+
end
|
174
|
+
end
|
114
175
|
```
|
115
176
|
|
177
|
+
But what about those params we passed in the previous step?
|
116
178
|
|
117
|
-
|
179
|
+
## Passing parameters into jobs
|
118
180
|
|
119
|
-
|
181
|
+
To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
|
120
182
|
|
121
|
-
|
183
|
+
So, inside workflow:
|
122
184
|
|
123
185
|
```ruby
|
124
|
-
|
125
|
-
|
186
|
+
(...)
|
187
|
+
run FetchBook, params: {url: "http://url.com/book.pdf"}
|
188
|
+
(...)
|
126
189
|
```
|
127
190
|
|
128
|
-
|
191
|
+
and within the job we can access them like this:
|
129
192
|
|
130
193
|
```ruby
|
131
|
-
|
194
|
+
class FetchBook < Gush::Job
|
195
|
+
def perform
|
196
|
+
# you can access `params` method here, for example:
|
197
|
+
|
198
|
+
params #=> {url: "http://url.com/book.pdf"}
|
199
|
+
end
|
200
|
+
end
|
132
201
|
```
|
133
202
|
|
134
|
-
|
203
|
+
## Executing workflows
|
135
204
|
|
136
|
-
|
205
|
+
Now that we have defined our workflow and its jobs, we can use it:
|
206
|
+
|
207
|
+
### 1. Start background worker process
|
208
|
+
|
209
|
+
**Important**: The command to start background workers depends on the backend you chose for ActiveJob.
|
210
|
+
For example, in case of Sidekiq this would be:
|
137
211
|
|
138
212
|
```
|
139
|
-
bundle exec gush
|
213
|
+
bundle exec sidekiq -q gush
|
140
214
|
```
|
141
215
|
|
142
|
-
|
216
|
+
**[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
|
217
|
+
|
218
|
+
**Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
|
219
|
+
|
220
|
+
|
221
|
+
### 2. Create the workflow instance
|
222
|
+
|
223
|
+
```ruby
|
224
|
+
flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
|
225
|
+
```
|
226
|
+
|
227
|
+
### 3. Start the workflow
|
143
228
|
|
144
229
|
```ruby
|
145
230
|
flow.start!
|
146
231
|
```
|
147
232
|
|
148
|
-
Now Gush will start processing jobs in background using
|
149
|
-
|
233
|
+
Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
|
234
|
+
|
235
|
+
### 4. Monitor its progress:
|
236
|
+
|
237
|
+
```ruby
|
238
|
+
flow.reload
|
239
|
+
flow.status
|
240
|
+
#=> :running|:finished|:failed
|
241
|
+
```
|
242
|
+
|
243
|
+
`reload` is needed to see the latest status, since workflows are updated asynchronously.
|
244
|
+
|
245
|
+
## Advanced features
|
150
246
|
|
151
247
|
### Pipelining
|
152
248
|
|
153
|
-
Gush offers a useful
|
249
|
+
Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
|
154
250
|
|
155
251
|
**Example:**
|
156
252
|
|
157
253
|
Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
|
158
|
-
The latter needs to know where the first one
|
254
|
+
The latter needs to know where the first one saved the file to be able to open it.
|
159
255
|
|
160
256
|
|
161
257
|
```ruby
|
162
258
|
class DownloadVideo < Gush::Job
|
163
|
-
def
|
259
|
+
def perform
|
164
260
|
downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
|
165
261
|
|
166
262
|
output(downloader.file_path)
|
@@ -168,44 +264,68 @@ class DownloadVideo < Gush::Job
|
|
168
264
|
end
|
169
265
|
```
|
170
266
|
|
171
|
-
`output` method is
|
267
|
+
`output` method is used to ouput data from the job to all dependant jobs.
|
172
268
|
|
173
|
-
Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload
|
269
|
+
Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
|
174
270
|
|
175
271
|
```ruby
|
176
272
|
class EncodeVideo < Gush::Job
|
177
|
-
def
|
178
|
-
video_path = payloads[
|
273
|
+
def perform
|
274
|
+
video_path = payloads.first[:output]
|
179
275
|
end
|
180
276
|
end
|
181
277
|
```
|
182
278
|
|
183
|
-
`payloads` is
|
279
|
+
`payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
|
184
280
|
|
185
|
-
**Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
|
186
|
-
the `payloads` hash will look like this:
|
187
281
|
|
188
282
|
```ruby
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
283
|
+
[
|
284
|
+
{
|
285
|
+
id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
|
286
|
+
class: "DownloadVideo",
|
287
|
+
output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
|
288
|
+
}
|
289
|
+
]
|
193
290
|
```
|
194
291
|
|
292
|
+
**Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
|
293
|
+
|
294
|
+
### Dynamic workflows
|
195
295
|
|
196
|
-
|
296
|
+
There might be a case when you have to construct the workflow dynamically depending on the input.
|
297
|
+
|
298
|
+
As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
|
197
299
|
|
198
|
-
#### In Ruby:
|
199
300
|
|
200
301
|
```ruby
|
201
|
-
|
202
|
-
|
203
|
-
|
302
|
+
|
303
|
+
class NotifyWorkflow < Gush::Workflow
|
304
|
+
def configure(user_ids)
|
305
|
+
notification_jobs = user_ids.map do |user_id|
|
306
|
+
run NotificationJob, params: {user_id: user_id}
|
307
|
+
end
|
308
|
+
|
309
|
+
run AdminNotificationJob, after: notification_jobs
|
310
|
+
end
|
311
|
+
end
|
204
312
|
```
|
205
313
|
|
206
|
-
`
|
314
|
+
We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
|
207
315
|
|
208
|
-
|
316
|
+
Now, when we create the workflow like this:
|
317
|
+
|
318
|
+
```ruby
|
319
|
+
flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
|
320
|
+
```
|
321
|
+
|
322
|
+
it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
|
323
|
+
|
324
|
+

|
325
|
+
|
326
|
+
## Command line interface (CLI)
|
327
|
+
|
328
|
+
### Checking status
|
209
329
|
|
210
330
|
- of a specific workflow:
|
211
331
|
|
@@ -219,18 +339,13 @@ flow.status
|
|
219
339
|
bundle exec gush list
|
220
340
|
```
|
221
341
|
|
342
|
+
### Vizualizing workflows as image
|
222
343
|
|
223
|
-
|
224
|
-
|
225
|
-
When using Gush and its CLI commands you need a Gushfile.rb in root directory.
|
226
|
-
Gushfile should require all your Workflows and jobs, for example:
|
344
|
+
This requires that you have imagemagick installed on your computer:
|
227
345
|
|
228
|
-
```ruby
|
229
|
-
require_relative './lib/your_project'
|
230
346
|
|
231
|
-
|
232
|
-
|
233
|
-
end
|
347
|
+
```
|
348
|
+
bundle exec gush viz <NameOfTheWorkflow>
|
234
349
|
```
|
235
350
|
|
236
351
|
## Contributors
|