gush 0.4.1 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d6bc6665af4376d837c94695ebcd5d2076abd0a5
4
- data.tar.gz: 2aaf48dc13f289a1a61d3d1afbd231dcc511cb16
3
+ metadata.gz: 5ac3b43abd8eeb7cc6403a0be5064b0dd0ba6f2a
4
+ data.tar.gz: 8038bf272a8562a632cd67ae9dc8adfdc59e50a6
5
5
  SHA512:
6
- metadata.gz: 3ae3dafb868d3c48ee2468f0fbf88d94dcc25dcbb83827ce5c855832118dfb256db5ee7c05d8eee21930f46b04c20e775aa5057a0739212417f11d03d9666865
7
- data.tar.gz: dcc89596cf05c6d5f96e6b4ec54879a774a6c1e476acd51ae159872494f180840f92757f58d5b1801117c6b1d3864a41132a60e9a3563ce92a780ce2570a0d6d
6
+ metadata.gz: 155e88cd3026703aac1a00ebbd02d64380402982eeabd7f384f09aec654f93f48bb934861c6d7208e45eba860c8dadbae0f7e0f19c280cb68ed26941d674dd72
7
+ data.tar.gz: 5214ae1bc690bf3d84afaf27e3748be2526b63112c56f203e292fb73604606942172e94a7722028f3f809012b84f708978320ce868ca075d6427c391112cb9c3
data/.gitignore CHANGED
@@ -17,5 +17,5 @@ test/version_tmp
17
17
  workflows/
18
18
  tmp
19
19
  test.rb
20
- /Gushfile.rb
20
+ /Gushfile
21
21
  dump.rdb
data/.travis.yml CHANGED
@@ -1,9 +1,9 @@
1
1
  language: ruby
2
2
  script: "bundle exec rspec"
3
3
  rvm:
4
- - 2.0.0
5
- - 2.1.6
6
4
  - 2.2.2
5
+ - 2.3.4
6
+ - 2.4.1
7
7
  services:
8
8
  - redis-server
9
9
  email:
data/CHANGELOG.md CHANGED
@@ -1,3 +1,35 @@
1
- # 0.4
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6
+ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7
+
8
+ ## 1.0.0 - 2017-10-02
9
+
10
+ ### Added
11
+
12
+ - **BREAKING CHANGE** Gush now uses ActiveJob instead of directly Sidekiq, this allows programmers to use multiple backends, instead of just one. Including in-process or even synchronous backends. See http://guides.rubyonrails.org/active_job_basics.html
13
+
14
+ ### Fixed
15
+
16
+ - Fix graph rendering with `gush viz` command. Sometimes it rendered the last job detached from others, because it was using a class name instead of job name as ID.
17
+ - Fix performance problems with unserializing jobs. This greatly **increased performance** by avoiding redundant calls to Redis storage. Should help a lot with huge workflows spawning thousands of jobs. Previously each job loaded whole workflow instance when executed.
18
+
19
+ ### Changed
20
+
21
+ - **BREAKING CHANGE** `Gushfile.rb` is now renamed to `Gushfile`
22
+ - **BREAKING CHANGE** Internal code for reporting status via Redis pub/sub has been removed, since it wasn't used for a long time.
23
+ - **BREAKING CHANGE** jobs are expected to have a `perform` method instead of `work` like in < 1.0.0 versions.
24
+ - **BREAKING CHANGE** `payloads` method available inside jobs is now an array of hashes, instead of a hash, this allows for a more flexible approach to reusing a single job in many situations. Previously payloads were grouped by predecessor's class name, so you were forced to hardcode that class name in its descendants' code.
25
+
26
+ ### Removed
27
+
28
+ - `gush workers` command is now removed. This is now up to the developer to start background processes depending on chosen ActiveJob adapter.
29
+ - `environment` was removed since it was no longer needed (it was Sidekiq specific)
30
+
31
+ ## 0.4.0
32
+
33
+ ### Removed
2
34
 
3
35
  - remove hard dependency on Yajl, so Gush can work with non-MRI Rubies ([#31](https://github.com/chaps-io/gush/pull/31) by [Nick Rakochy](https://github.com/chaps-io/gush/pull/31))
data/README.md CHANGED
@@ -2,38 +2,65 @@
2
2
 
3
3
  ## [![](http://i.imgur.com/ya8Wnyl.png)](https://chaps.io) proudly made by [Chaps](https://chaps.io)
4
4
 
5
- Gush is a parallel workflow runner using only Redis as its message broker and Sidekiq for workers.
5
+ Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
6
6
 
7
7
  ## Theory
8
8
 
9
- Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
9
+ Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
10
+
11
+ ## **WARNING - version notice**
12
+
13
+ This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
14
+
10
15
  ## Installation
11
16
 
12
- Add this line to your application's Gemfile:
17
+ ### 1. Add `gush` to Gemfile
18
+
19
+ ```ruby
20
+ gem 'gush', '~> 1.0.0'
21
+ ```
22
+
23
+ ### 2. Create `Gushfile`
24
+
25
+ When using Gush and its CLI commands you need a `Gushfile` in the root directory.
26
+ `Gushfile` should require all your workflows and jobs.
13
27
 
14
- gem 'gush'
28
+ #### Ruby on Rails
15
29
 
16
- And then execute:
30
+ For RoR it is enough to require the full environment:
17
31
 
18
- $ bundle
32
+ ```ruby
33
+ require_relative './config/environment.rb'
34
+ ```
19
35
 
20
- Or install it yourself as:
36
+ and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
21
37
 
22
- $ gem install gush
38
+ ```ruby
39
+ config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
40
+ ```
23
41
 
24
- ## Usage
42
+ #### Ruby
25
43
 
26
- ### Defining workflows
44
+ Simply require any jobs and workflows manually in `Gushfile`:
45
+
46
+ ```ruby
47
+ require_relative 'lib/workflows/example_workflow.rb'
48
+ require_relative 'lib/jobs/some_job.rb'
49
+ require_relative 'lib/jobs/some_other_job.rb'
50
+ ```
51
+
52
+
53
+ ## Example
27
54
 
28
55
  The DSL for defining jobs consists of a single `run` method.
29
56
  Here is a complete example of a workflow you can create:
30
57
 
31
58
  ```ruby
32
- # workflows/sample_workflow.rb
59
+ # app/workflows/sample_workflow.rb
33
60
  class SampleWorkflow < Gush::Workflow
34
61
  def configure(url_to_fetch_from)
35
62
  run FetchJob1, params: { url: url_to_fetch_from }
36
- run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
63
+ run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
37
64
 
38
65
  run PersistJob1, after: FetchJob1
39
66
  run PersistJob2, after: FetchJob2
@@ -47,62 +74,82 @@ class SampleWorkflow < Gush::Workflow
47
74
  end
48
75
  ```
49
76
 
50
- **Hint:** For debugging purposes you can vizualize the graph using `viz` command:
77
+ and this is how the graph will look like:
51
78
 
52
- ```
53
- bundle exec gush viz SampleWorkflow
54
- ```
79
+ ![SampleWorkflow](https://i.imgur.com/DFh6j51.png)
80
+
81
+
82
+ ## Defining workflows
83
+
84
+ Let's start with the simplest workflow possible, consisting of a single job:
55
85
 
56
- For the Workflow above, the graph will look like this:
86
+ ```ruby
87
+ class SimpleWorkflow < Gush::Workflow
88
+ def configure
89
+ run DownloadJob
90
+ end
91
+ end
92
+ ```
57
93
 
58
- ![SampleWorkflow](http://i.imgur.com/SmeRRVT.png)
94
+ Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
59
95
 
96
+ ```ruby
97
+ class SimpleWorkflow < Gush::Workflow
98
+ def configure
99
+ run DownloadJob
100
+ run SaveJob, after: DownloadJob
101
+ end
102
+ end
103
+ ```
60
104
 
61
- #### Passing parameters to jobs
105
+ We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
62
106
 
63
- You can pass any primitive arguments into jobs while defining your workflow:
107
+ But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
64
108
 
65
109
  ```ruby
66
- # app/workflows/sample_workflow.rb
67
- class SampleWorkflow < Gush::Workflow
110
+ class SimpleWorkflow < Gush::Workflow
68
111
  def configure
69
- run FetchJob1, params: { url: "http://some.com/url" }
112
+ run FirstDownloadJob
113
+ run SecondDownloadJob
114
+
115
+ run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
70
116
  end
71
117
  end
72
118
  ```
73
119
 
74
- See below to learn how to access those params inside your job.
120
+ Now `SaveJob` will only execute after both its parents finish without errors.
121
+
122
+ With this simple syntax you can build any complex workflows you can imagine!
75
123
 
76
- #### Defining jobs
124
+ #### Alternative way
77
125
 
78
- Jobs are classes inheriting from `Gush::Job`:
126
+ `run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
79
127
 
80
128
  ```ruby
81
- # app/jobs/fetch_job.rb
82
- class FetchJob < Gush::Job
83
- def work
84
- # do some fetching from remote APIs
129
+ class SimpleWorkflow < Gush::Workflow
130
+ def configure
131
+ run FirstDownloadJob, before: SaveJob
132
+ run SecondDownloadJob, before: SaveJob
85
133
 
86
- params #=> {url: "http://some.com/url"}
134
+ run SaveJob
87
135
  end
88
136
  end
89
137
  ```
90
138
 
91
- `params` method is a hash containing your (optional) parameters passed to `run` method in the workflow.
139
+ You can use whatever way you find more readable or even both at once :)
92
140
 
93
- #### Passing arguments to workflows
141
+ ### Passing arguments to workflows
94
142
 
95
143
  Workflows can accept any primitive arguments in their constructor, which then will be available in your
96
144
  `configure` method.
97
145
 
98
- Here's an example of a workflow responsible for publishing a book:
146
+ Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
99
147
 
100
148
  ```ruby
101
- # app/workflows/sample_workflow.rb
102
149
  class PublishBookWorkflow < Gush::Workflow
103
150
  def configure(url, isbn)
104
151
  run FetchBook, params: { url: url }
105
- run PublishBook, params: { book_isbn: isbn }
152
+ run PublishBook, params: { book_isbn: isbn }, after: FetchBook
106
153
  end
107
154
  end
108
155
  ```
@@ -110,57 +157,106 @@ end
110
157
  and then create your workflow with those arguments:
111
158
 
112
159
  ```ruby
113
- PublishBookWorkflow.new("http://url.com/book.pdf", "978-0470081204")
160
+ PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
161
+ ```
162
+
163
+ and that's basically it for defining workflows, see below on how to define jobs:
164
+
165
+ ## Defining jobs
166
+
167
+ The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
168
+
169
+ ```ruby
170
+ class FetchBook < Gush::Job
171
+ def perform
172
+ # do some fetching from remote APIs
173
+ end
174
+ end
114
175
  ```
115
176
 
177
+ But what about those params we passed in the previous step?
116
178
 
117
- ### Running workflows
179
+ ## Passing parameters into jobs
118
180
 
119
- Now that we have defined our workflow we can use it:
181
+ To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
120
182
 
121
- #### 1. Initialize and save it
183
+ So, inside workflow:
122
184
 
123
185
  ```ruby
124
- flow = SampleWorkflow.new(optional, arguments)
125
- flow.save # saves workflow and its jobs to Redis
186
+ (...)
187
+ run FetchBook, params: {url: "http://url.com/book.pdf"}
188
+ (...)
126
189
  ```
127
190
 
128
- **or:** you can also use a shortcut:
191
+ and within the job we can access them like this:
129
192
 
130
193
  ```ruby
131
- flow = SampleWorkflow.create(optional, arguments)
194
+ class FetchBook < Gush::Job
195
+ def perform
196
+ # you can access `params` method here, for example:
197
+
198
+ params #=> {url: "http://url.com/book.pdf"}
199
+ end
200
+ end
132
201
  ```
133
202
 
134
- #### 2. Start workflow
203
+ ## Executing workflows
135
204
 
136
- First you need to start Sidekiq workers:
205
+ Now that we have defined our workflow and its jobs, we can use it:
206
+
207
+ ### 1. Start background worker process
208
+
209
+ **Important**: The command to start background workers depends on the backend you chose for ActiveJob.
210
+ For example, in case of Sidekiq this would be:
137
211
 
138
212
  ```
139
- bundle exec gush workers
213
+ bundle exec sidekiq -q gush
140
214
  ```
141
215
 
142
- and then start your workflow:
216
+ **[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
217
+
218
+ **Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
219
+
220
+
221
+ ### 2. Create the workflow instance
222
+
223
+ ```ruby
224
+ flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
225
+ ```
226
+
227
+ ### 3. Start the workflow
143
228
 
144
229
  ```ruby
145
230
  flow.start!
146
231
  ```
147
232
 
148
- Now Gush will start processing jobs in background using Sidekiq
149
- in the order defined in `configure` method inside Workflow.
233
+ Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
234
+
235
+ ### 4. Monitor its progress:
236
+
237
+ ```ruby
238
+ flow.reload
239
+ flow.status
240
+ #=> :running|:finished|:failed
241
+ ```
242
+
243
+ `reload` is needed to see the latest status, since workflows are updated asynchronously.
244
+
245
+ ## Advanced features
150
246
 
151
247
  ### Pipelining
152
248
 
153
- Gush offers a useful feature which lets you pass results of a job to its dependencies, so they can act accordingly.
249
+ Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
154
250
 
155
251
  **Example:**
156
252
 
157
253
  Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
158
- The latter needs to know where the first one downloaded the file to be able to open it.
254
+ The latter needs to know where the first one saved the file to be able to open it.
159
255
 
160
256
 
161
257
  ```ruby
162
258
  class DownloadVideo < Gush::Job
163
- def work
259
+ def perform
164
260
  downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
165
261
 
166
262
  output(downloader.file_path)
@@ -168,44 +264,68 @@ class DownloadVideo < Gush::Job
168
264
  end
169
265
  ```
170
266
 
171
- `output` method is Gush's way of saying: "I want to pass this down to my descendants".
267
+ `output` method is used to ouput data from the job to all dependant jobs.
172
268
 
173
- Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload down the (pipe)line:
269
+ Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
174
270
 
175
271
  ```ruby
176
272
  class EncodeVideo < Gush::Job
177
- def work
178
- video_path = payloads["DownloadVideo"]
273
+ def perform
274
+ video_path = payloads.first[:output]
179
275
  end
180
276
  end
181
277
  ```
182
278
 
183
- `payloads` is a hash containing outputs from all parent jobs, where job class names are the keys.
279
+ `payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
184
280
 
185
- **Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
186
- the `payloads` hash will look like this:
187
281
 
188
282
  ```ruby
189
- {
190
- "B" => (...),
191
- "C" => (...)
192
- }
283
+ [
284
+ {
285
+ id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
286
+ class: "DownloadVideo",
287
+ output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
288
+ }
289
+ ]
193
290
  ```
194
291
 
292
+ **Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
293
+
294
+ ### Dynamic workflows
195
295
 
196
- ### Checking status:
296
+ There might be a case when you have to construct the workflow dynamically depending on the input.
297
+
298
+ As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
197
299
 
198
- #### In Ruby:
199
300
 
200
301
  ```ruby
201
- flow.reload
202
- flow.status
203
- #=> :running|:finished|:failed
302
+
303
+ class NotifyWorkflow < Gush::Workflow
304
+ def configure(user_ids)
305
+ notification_jobs = user_ids.map do |user_id|
306
+ run NotificationJob, params: {user_id: user_id}
307
+ end
308
+
309
+ run AdminNotificationJob, after: notification_jobs
310
+ end
311
+ end
204
312
  ```
205
313
 
206
- `reload` is needed to see the latest status, since workflows are updated asynchronously.
314
+ We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
207
315
 
208
- #### Via CLI:
316
+ Now, when we create the workflow like this:
317
+
318
+ ```ruby
319
+ flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
320
+ ```
321
+
322
+ it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
323
+
324
+ ![DynamicWorkflow](https://i.imgur.com/HOI3fjc.png)
325
+
326
+ ## Command line interface (CLI)
327
+
328
+ ### Checking status
209
329
 
210
330
  - of a specific workflow:
211
331
 
@@ -219,18 +339,13 @@ flow.status
219
339
  bundle exec gush list
220
340
  ```
221
341
 
342
+ ### Vizualizing workflows as image
222
343
 
223
- ### Requiring workflows inside your projects
224
-
225
- When using Gush and its CLI commands you need a Gushfile.rb in root directory.
226
- Gushfile should require all your Workflows and jobs, for example:
344
+ This requires that you have imagemagick installed on your computer:
227
345
 
228
- ```ruby
229
- require_relative './lib/your_project'
230
346
 
231
- Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
232
- require file
233
- end
347
+ ```
348
+ bundle exec gush viz <NameOfTheWorkflow>
234
349
  ```
235
350
 
236
351
  ## Contributors