gush 0.4.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d6bc6665af4376d837c94695ebcd5d2076abd0a5
4
- data.tar.gz: 2aaf48dc13f289a1a61d3d1afbd231dcc511cb16
3
+ metadata.gz: 5ac3b43abd8eeb7cc6403a0be5064b0dd0ba6f2a
4
+ data.tar.gz: 8038bf272a8562a632cd67ae9dc8adfdc59e50a6
5
5
  SHA512:
6
- metadata.gz: 3ae3dafb868d3c48ee2468f0fbf88d94dcc25dcbb83827ce5c855832118dfb256db5ee7c05d8eee21930f46b04c20e775aa5057a0739212417f11d03d9666865
7
- data.tar.gz: dcc89596cf05c6d5f96e6b4ec54879a774a6c1e476acd51ae159872494f180840f92757f58d5b1801117c6b1d3864a41132a60e9a3563ce92a780ce2570a0d6d
6
+ metadata.gz: 155e88cd3026703aac1a00ebbd02d64380402982eeabd7f384f09aec654f93f48bb934861c6d7208e45eba860c8dadbae0f7e0f19c280cb68ed26941d674dd72
7
+ data.tar.gz: 5214ae1bc690bf3d84afaf27e3748be2526b63112c56f203e292fb73604606942172e94a7722028f3f809012b84f708978320ce868ca075d6427c391112cb9c3
data/.gitignore CHANGED
@@ -17,5 +17,5 @@ test/version_tmp
17
17
  workflows/
18
18
  tmp
19
19
  test.rb
20
- /Gushfile.rb
20
+ /Gushfile
21
21
  dump.rdb
data/.travis.yml CHANGED
@@ -1,9 +1,9 @@
1
1
  language: ruby
2
2
  script: "bundle exec rspec"
3
3
  rvm:
4
- - 2.0.0
5
- - 2.1.6
6
4
  - 2.2.2
5
+ - 2.3.4
6
+ - 2.4.1
7
7
  services:
8
8
  - redis-server
9
9
  email:
data/CHANGELOG.md CHANGED
@@ -1,3 +1,35 @@
1
- # 0.4
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6
+ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7
+
8
+ ## 1.0.0 - 2017-10-02
9
+
10
+ ### Added
11
+
12
+ - **BREAKING CHANGE** Gush now uses ActiveJob instead of directly Sidekiq, this allows programmers to use multiple backends, instead of just one. Including in-process or even synchronous backends. See http://guides.rubyonrails.org/active_job_basics.html
13
+
14
+ ### Fixed
15
+
16
+ - Fix graph rendering with `gush viz` command. Sometimes it rendered the last job detached from others, because it was using a class name instead of job name as ID.
17
+ - Fix performance problems with unserializing jobs. This greatly **increased performance** by avoiding redundant calls to Redis storage. Should help a lot with huge workflows spawning thousands of jobs. Previously each job loaded whole workflow instance when executed.
18
+
19
+ ### Changed
20
+
21
+ - **BREAKING CHANGE** `Gushfile.rb` is now renamed to `Gushfile`
22
+ - **BREAKING CHANGE** Internal code for reporting status via Redis pub/sub has been removed, since it wasn't used for a long time.
23
+ - **BREAKING CHANGE** jobs are expected to have a `perform` method instead of `work` like in < 1.0.0 versions.
24
+ - **BREAKING CHANGE** `payloads` method available inside jobs is now an array of hashes, instead of a hash, this allows for a more flexible approach to reusing a single job in many situations. Previously payloads were grouped by predecessor's class name, so you were forced to hardcode that class name in its descendants' code.
25
+
26
+ ### Removed
27
+
28
+ - `gush workers` command is now removed. This is now up to the developer to start background processes depending on chosen ActiveJob adapter.
29
+ - `environment` was removed since it was no longer needed (it was Sidekiq specific)
30
+
31
+ ## 0.4.0
32
+
33
+ ### Removed
2
34
 
3
35
  - remove hard dependency on Yajl, so Gush can work with non-MRI Rubies ([#31](https://github.com/chaps-io/gush/pull/31) by [Nick Rakochy](https://github.com/chaps-io/gush/pull/31))
data/README.md CHANGED
@@ -2,38 +2,65 @@
2
2
 
3
3
  ## [![](http://i.imgur.com/ya8Wnyl.png)](https://chaps.io) proudly made by [Chaps](https://chaps.io)
4
4
 
5
- Gush is a parallel workflow runner using only Redis as its message broker and Sidekiq for workers.
5
+ Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
6
6
 
7
7
  ## Theory
8
8
 
9
- Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
9
+ Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
10
+
11
+ ## **WARNING - version notice**
12
+
13
+ This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
14
+
10
15
  ## Installation
11
16
 
12
- Add this line to your application's Gemfile:
17
+ ### 1. Add `gush` to Gemfile
18
+
19
+ ```ruby
20
+ gem 'gush', '~> 1.0.0'
21
+ ```
22
+
23
+ ### 2. Create `Gushfile`
24
+
25
+ When using Gush and its CLI commands you need a `Gushfile` in the root directory.
26
+ `Gushfile` should require all your workflows and jobs.
13
27
 
14
- gem 'gush'
28
+ #### Ruby on Rails
15
29
 
16
- And then execute:
30
+ For RoR it is enough to require the full environment:
17
31
 
18
- $ bundle
32
+ ```ruby
33
+ require_relative './config/environment.rb'
34
+ ```
19
35
 
20
- Or install it yourself as:
36
+ and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
21
37
 
22
- $ gem install gush
38
+ ```ruby
39
+ config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
40
+ ```
23
41
 
24
- ## Usage
42
+ #### Ruby
25
43
 
26
- ### Defining workflows
44
+ Simply require any jobs and workflows manually in `Gushfile`:
45
+
46
+ ```ruby
47
+ require_relative 'lib/workflows/example_workflow.rb'
48
+ require_relative 'lib/jobs/some_job.rb'
49
+ require_relative 'lib/jobs/some_other_job.rb'
50
+ ```
51
+
52
+
53
+ ## Example
27
54
 
28
55
  The DSL for defining jobs consists of a single `run` method.
29
56
  Here is a complete example of a workflow you can create:
30
57
 
31
58
  ```ruby
32
- # workflows/sample_workflow.rb
59
+ # app/workflows/sample_workflow.rb
33
60
  class SampleWorkflow < Gush::Workflow
34
61
  def configure(url_to_fetch_from)
35
62
  run FetchJob1, params: { url: url_to_fetch_from }
36
- run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
63
+ run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
37
64
 
38
65
  run PersistJob1, after: FetchJob1
39
66
  run PersistJob2, after: FetchJob2
@@ -47,62 +74,82 @@ class SampleWorkflow < Gush::Workflow
47
74
  end
48
75
  ```
49
76
 
50
- **Hint:** For debugging purposes you can vizualize the graph using `viz` command:
77
+ and this is how the graph will look like:
51
78
 
52
- ```
53
- bundle exec gush viz SampleWorkflow
54
- ```
79
+ ![SampleWorkflow](https://i.imgur.com/DFh6j51.png)
80
+
81
+
82
+ ## Defining workflows
83
+
84
+ Let's start with the simplest workflow possible, consisting of a single job:
55
85
 
56
- For the Workflow above, the graph will look like this:
86
+ ```ruby
87
+ class SimpleWorkflow < Gush::Workflow
88
+ def configure
89
+ run DownloadJob
90
+ end
91
+ end
92
+ ```
57
93
 
58
- ![SampleWorkflow](http://i.imgur.com/SmeRRVT.png)
94
+ Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
59
95
 
96
+ ```ruby
97
+ class SimpleWorkflow < Gush::Workflow
98
+ def configure
99
+ run DownloadJob
100
+ run SaveJob, after: DownloadJob
101
+ end
102
+ end
103
+ ```
60
104
 
61
- #### Passing parameters to jobs
105
+ We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
62
106
 
63
- You can pass any primitive arguments into jobs while defining your workflow:
107
+ But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
64
108
 
65
109
  ```ruby
66
- # app/workflows/sample_workflow.rb
67
- class SampleWorkflow < Gush::Workflow
110
+ class SimpleWorkflow < Gush::Workflow
68
111
  def configure
69
- run FetchJob1, params: { url: "http://some.com/url" }
112
+ run FirstDownloadJob
113
+ run SecondDownloadJob
114
+
115
+ run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
70
116
  end
71
117
  end
72
118
  ```
73
119
 
74
- See below to learn how to access those params inside your job.
120
+ Now `SaveJob` will only execute after both its parents finish without errors.
121
+
122
+ With this simple syntax you can build any complex workflows you can imagine!
75
123
 
76
- #### Defining jobs
124
+ #### Alternative way
77
125
 
78
- Jobs are classes inheriting from `Gush::Job`:
126
+ `run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
79
127
 
80
128
  ```ruby
81
- # app/jobs/fetch_job.rb
82
- class FetchJob < Gush::Job
83
- def work
84
- # do some fetching from remote APIs
129
+ class SimpleWorkflow < Gush::Workflow
130
+ def configure
131
+ run FirstDownloadJob, before: SaveJob
132
+ run SecondDownloadJob, before: SaveJob
85
133
 
86
- params #=> {url: "http://some.com/url"}
134
+ run SaveJob
87
135
  end
88
136
  end
89
137
  ```
90
138
 
91
- `params` method is a hash containing your (optional) parameters passed to `run` method in the workflow.
139
+ You can use whatever way you find more readable or even both at once :)
92
140
 
93
- #### Passing arguments to workflows
141
+ ### Passing arguments to workflows
94
142
 
95
143
  Workflows can accept any primitive arguments in their constructor, which then will be available in your
96
144
  `configure` method.
97
145
 
98
- Here's an example of a workflow responsible for publishing a book:
146
+ Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
99
147
 
100
148
  ```ruby
101
- # app/workflows/sample_workflow.rb
102
149
  class PublishBookWorkflow < Gush::Workflow
103
150
  def configure(url, isbn)
104
151
  run FetchBook, params: { url: url }
105
- run PublishBook, params: { book_isbn: isbn }
152
+ run PublishBook, params: { book_isbn: isbn }, after: FetchBook
106
153
  end
107
154
  end
108
155
  ```
@@ -110,57 +157,106 @@ end
110
157
  and then create your workflow with those arguments:
111
158
 
112
159
  ```ruby
113
- PublishBookWorkflow.new("http://url.com/book.pdf", "978-0470081204")
160
+ PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
161
+ ```
162
+
163
+ and that's basically it for defining workflows, see below on how to define jobs:
164
+
165
+ ## Defining jobs
166
+
167
+ The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
168
+
169
+ ```ruby
170
+ class FetchBook < Gush::Job
171
+ def perform
172
+ # do some fetching from remote APIs
173
+ end
174
+ end
114
175
  ```
115
176
 
177
+ But what about those params we passed in the previous step?
116
178
 
117
- ### Running workflows
179
+ ## Passing parameters into jobs
118
180
 
119
- Now that we have defined our workflow we can use it:
181
+ To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
120
182
 
121
- #### 1. Initialize and save it
183
+ So, inside workflow:
122
184
 
123
185
  ```ruby
124
- flow = SampleWorkflow.new(optional, arguments)
125
- flow.save # saves workflow and its jobs to Redis
186
+ (...)
187
+ run FetchBook, params: {url: "http://url.com/book.pdf"}
188
+ (...)
126
189
  ```
127
190
 
128
- **or:** you can also use a shortcut:
191
+ and within the job we can access them like this:
129
192
 
130
193
  ```ruby
131
- flow = SampleWorkflow.create(optional, arguments)
194
+ class FetchBook < Gush::Job
195
+ def perform
196
+ # you can access `params` method here, for example:
197
+
198
+ params #=> {url: "http://url.com/book.pdf"}
199
+ end
200
+ end
132
201
  ```
133
202
 
134
- #### 2. Start workflow
203
+ ## Executing workflows
135
204
 
136
- First you need to start Sidekiq workers:
205
+ Now that we have defined our workflow and its jobs, we can use it:
206
+
207
+ ### 1. Start background worker process
208
+
209
+ **Important**: The command to start background workers depends on the backend you chose for ActiveJob.
210
+ For example, in case of Sidekiq this would be:
137
211
 
138
212
  ```
139
- bundle exec gush workers
213
+ bundle exec sidekiq -q gush
140
214
  ```
141
215
 
142
- and then start your workflow:
216
+ **[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
217
+
218
+ **Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
219
+
220
+
221
+ ### 2. Create the workflow instance
222
+
223
+ ```ruby
224
+ flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
225
+ ```
226
+
227
+ ### 3. Start the workflow
143
228
 
144
229
  ```ruby
145
230
  flow.start!
146
231
  ```
147
232
 
148
- Now Gush will start processing jobs in background using Sidekiq
149
- in the order defined in `configure` method inside Workflow.
233
+ Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
234
+
235
+ ### 4. Monitor its progress:
236
+
237
+ ```ruby
238
+ flow.reload
239
+ flow.status
240
+ #=> :running|:finished|:failed
241
+ ```
242
+
243
+ `reload` is needed to see the latest status, since workflows are updated asynchronously.
244
+
245
+ ## Advanced features
150
246
 
151
247
  ### Pipelining
152
248
 
153
- Gush offers a useful feature which lets you pass results of a job to its dependencies, so they can act accordingly.
249
+ Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
154
250
 
155
251
  **Example:**
156
252
 
157
253
  Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
158
- The latter needs to know where the first one downloaded the file to be able to open it.
254
+ The latter needs to know where the first one saved the file to be able to open it.
159
255
 
160
256
 
161
257
  ```ruby
162
258
  class DownloadVideo < Gush::Job
163
- def work
259
+ def perform
164
260
  downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
165
261
 
166
262
  output(downloader.file_path)
@@ -168,44 +264,68 @@ class DownloadVideo < Gush::Job
168
264
  end
169
265
  ```
170
266
 
171
- `output` method is Gush's way of saying: "I want to pass this down to my descendants".
267
+ `output` method is used to ouput data from the job to all dependant jobs.
172
268
 
173
- Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload down the (pipe)line:
269
+ Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
174
270
 
175
271
  ```ruby
176
272
  class EncodeVideo < Gush::Job
177
- def work
178
- video_path = payloads["DownloadVideo"]
273
+ def perform
274
+ video_path = payloads.first[:output]
179
275
  end
180
276
  end
181
277
  ```
182
278
 
183
- `payloads` is a hash containing outputs from all parent jobs, where job class names are the keys.
279
+ `payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
184
280
 
185
- **Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
186
- the `payloads` hash will look like this:
187
281
 
188
282
  ```ruby
189
- {
190
- "B" => (...),
191
- "C" => (...)
192
- }
283
+ [
284
+ {
285
+ id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
286
+ class: "DownloadVideo",
287
+ output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
288
+ }
289
+ ]
193
290
  ```
194
291
 
292
+ **Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
293
+
294
+ ### Dynamic workflows
195
295
 
196
- ### Checking status:
296
+ There might be a case when you have to construct the workflow dynamically depending on the input.
297
+
298
+ As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
197
299
 
198
- #### In Ruby:
199
300
 
200
301
  ```ruby
201
- flow.reload
202
- flow.status
203
- #=> :running|:finished|:failed
302
+
303
+ class NotifyWorkflow < Gush::Workflow
304
+ def configure(user_ids)
305
+ notification_jobs = user_ids.map do |user_id|
306
+ run NotificationJob, params: {user_id: user_id}
307
+ end
308
+
309
+ run AdminNotificationJob, after: notification_jobs
310
+ end
311
+ end
204
312
  ```
205
313
 
206
- `reload` is needed to see the latest status, since workflows are updated asynchronously.
314
+ We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
207
315
 
208
- #### Via CLI:
316
+ Now, when we create the workflow like this:
317
+
318
+ ```ruby
319
+ flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
320
+ ```
321
+
322
+ it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
323
+
324
+ ![DynamicWorkflow](https://i.imgur.com/HOI3fjc.png)
325
+
326
+ ## Command line interface (CLI)
327
+
328
+ ### Checking status
209
329
 
210
330
  - of a specific workflow:
211
331
 
@@ -219,18 +339,13 @@ flow.status
219
339
  bundle exec gush list
220
340
  ```
221
341
 
342
+ ### Vizualizing workflows as image
222
343
 
223
- ### Requiring workflows inside your projects
224
-
225
- When using Gush and its CLI commands you need a Gushfile.rb in root directory.
226
- Gushfile should require all your Workflows and jobs, for example:
344
+ This requires that you have imagemagick installed on your computer:
227
345
 
228
- ```ruby
229
- require_relative './lib/your_project'
230
346
 
231
- Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
232
- require file
233
- end
347
+ ```
348
+ bundle exec gush viz <NameOfTheWorkflow>
234
349
  ```
235
350
 
236
351
  ## Contributors