RubyGems - gush - Versions diffs - 0.4.1 → 1.0.0 - Mend

gush 0.4.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml +4 -4
data/.gitignore +1 -1
data/.travis.yml +2 -2
data/CHANGELOG.md +33 -1
data/README.md +195 -80
data/gush.gemspec +6 -5
data/lib/gush.rb +0 -14
data/lib/gush/cli.rb +3 -18
data/lib/gush/cli/overview.rb +1 -1
data/lib/gush/client.rb +8 -32
data/lib/gush/configuration.rb +4 -6
data/lib/gush/graph.rb +2 -1
data/lib/gush/job.rb +12 -16
data/lib/gush/worker.rb +21 -49
data/lib/gush/workflow.rb +9 -5
data/spec/{Gushfile.rb → Gushfile} +0 -0
data/spec/features/integration_spec.rb +62 -23
data/spec/gush/client_spec.rb +1 -1
data/spec/gush/configuration_spec.rb +0 -3
data/spec/gush/job_spec.rb +3 -3
data/spec/gush/worker_spec.rb +33 -41
data/spec/gush/workflow_spec.rb +4 -2
data/spec/gush_spec.rb +4 -4
data/spec/spec_helper.rb +23 -9
metadata +26 -12

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: d6bc6665af4376d837c94695ebcd5d2076abd0a5
-  data.tar.gz: 2aaf48dc13f289a1a61d3d1afbd231dcc511cb16
+  metadata.gz: 5ac3b43abd8eeb7cc6403a0be5064b0dd0ba6f2a
+  data.tar.gz: 8038bf272a8562a632cd67ae9dc8adfdc59e50a6
 SHA512:
-  metadata.gz: 3ae3dafb868d3c48ee2468f0fbf88d94dcc25dcbb83827ce5c855832118dfb256db5ee7c05d8eee21930f46b04c20e775aa5057a0739212417f11d03d9666865
-  data.tar.gz: dcc89596cf05c6d5f96e6b4ec54879a774a6c1e476acd51ae159872494f180840f92757f58d5b1801117c6b1d3864a41132a60e9a3563ce92a780ce2570a0d6d
+  metadata.gz: 155e88cd3026703aac1a00ebbd02d64380402982eeabd7f384f09aec654f93f48bb934861c6d7208e45eba860c8dadbae0f7e0f19c280cb68ed26941d674dd72
+  data.tar.gz: 5214ae1bc690bf3d84afaf27e3748be2526b63112c56f203e292fb73604606942172e94a7722028f3f809012b84f708978320ce868ca075d6427c391112cb9c3

data/.gitignore CHANGED Viewed

@@ -17,5 +17,5 @@ test/version_tmp
 workflows/
 tmp
 test.rb
-/Gushfile.rb
+/Gushfile
 dump.rdb

data/.travis.yml CHANGED Viewed

@@ -1,9 +1,9 @@
 language: ruby
 script: "bundle exec rspec"
 rvm:
-  - 2.0.0
-  - 2.1.6
   - 2.2.2
+  - 2.3.4
+  - 2.4.1
 services:
   - redis-server
 email:

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,35 @@
-# 0.4
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
+and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+## 1.0.0 - 2017-10-02
+### Added
+-  **BREAKING CHANGE** Gush now uses ActiveJob instead of directly Sidekiq, this allows programmers to use multiple backends, instead of just one. Including in-process or even synchronous backends. See http://guides.rubyonrails.org/active_job_basics.html
+### Fixed
+- Fix graph rendering with `gush viz` command. Sometimes it rendered the last job detached from others, because it was using a class name instead of job name as ID.
+- Fix performance problems with unserializing jobs. This greatly **increased performance** by avoiding redundant calls to Redis storage. Should help a lot with huge workflows spawning thousands of jobs. Previously each job loaded whole workflow instance when executed.
+### Changed
+- **BREAKING CHANGE** `Gushfile.rb` is now renamed to `Gushfile`
+- **BREAKING CHANGE** Internal code for reporting status via Redis pub/sub has been removed, since it wasn't used for a long time.
+- **BREAKING CHANGE** jobs are expected to have a `perform` method instead of `work` like in < 1.0.0 versions.
+- **BREAKING CHANGE** `payloads` method available inside jobs is now an array of hashes, instead of a hash, this allows for a more flexible approach to reusing a single job in many situations. Previously payloads were grouped by predecessor's class name, so you were forced to hardcode that class name in its descendants' code.
+### Removed
+- `gush workers` command is now removed. This is now up to the developer to start background processes depending on chosen ActiveJob adapter.
+- `environment` was removed since it was no longer needed (it was Sidekiq specific)
+## 0.4.0
+### Removed
 - remove hard dependency on Yajl, so Gush can work with non-MRI Rubies ([#31](https://github.com/chaps-io/gush/pull/31) by [Nick Rakochy](https://github.com/chaps-io/gush/pull/31))

data/README.md CHANGED Viewed

@@ -2,38 +2,65 @@
 ## [![](http://i.imgur.com/ya8Wnyl.png)](https://chaps.io) proudly made by [Chaps](https://chaps.io)
-Gush is a parallel workflow runner using only Redis as its message broker and Sidekiq for workers.
+Gush is a parallel workflow runner using only Redis as storage and [ActiveJob](http://guides.rubyonrails.org/v4.2/active_job_basics.html#introduction) for scheduling and executing jobs.
 ## Theory
-Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub.
+Gush relies on directed acyclic graphs to store dependencies, see [Parallelizing Operations With Dependencies](https://msdn.microsoft.com/en-us/magazine/dd569760.aspx) by Stephen Toub to learn more about this method.
+## **WARNING - version notice**
+This README is about the `1.0.0` version, which has breaking changes compared to < 1.0.0 versions. [See here for 0.4.1 documentation](https://github.com/chaps-io/gush/blob/349c5aff0332fd14b1cb517115c26d415aa24841/README.md).
 ## Installation
-Add this line to your application's Gemfile:
+### 1. Add `gush` to Gemfile
+```ruby
+gem 'gush', '~> 1.0.0'
+```
+### 2. Create `Gushfile`
+When using Gush and its CLI commands you need a `Gushfile` in the root directory.
+`Gushfile` should require all your workflows and jobs.
-    gem 'gush'
+#### Ruby on Rails
-And then execute:
+For RoR it is enough to require the full environment:
-    $ bundle
+```ruby
+require_relative './config/environment.rb'
+```
-Or install it yourself as:
+and make sure your jobs and workflows are correctly loaded by adding their directories to autoload_paths, inside `config/application.rb`:
-    $ gem install gush
+```ruby
+config.autoload_paths += ["#{Rails.root}/app/jobs", "#{Rails.root}/app/workflows"]
+```
-## Usage
+#### Ruby
-### Defining workflows
+Simply require any jobs and workflows manually in `Gushfile`:
+```ruby
+require_relative 'lib/workflows/example_workflow.rb'
+require_relative 'lib/jobs/some_job.rb'
+require_relative 'lib/jobs/some_other_job.rb'
+```
+## Example
 The DSL for defining jobs consists of a single `run` method.
 Here is a complete example of a workflow you can create:
 ```ruby
-# workflows/sample_workflow.rb
+# app/workflows/sample_workflow.rb
 class SampleWorkflow < Gush::Workflow
   def configure(url_to_fetch_from)
     run FetchJob1, params: { url: url_to_fetch_from }
-    run FetchJob2, params: {some_flag: true, url: 'http://url.com'}
+    run FetchJob2, params: { some_flag: true, url: 'http://url.com' }
     run PersistJob1, after: FetchJob1
     run PersistJob2, after: FetchJob2
@@ -47,62 +74,82 @@ class SampleWorkflow < Gush::Workflow
 end
 ```
-**Hint:** For debugging purposes you can vizualize the graph using `viz` command:
+and this is how the graph will look like:
-```
-bundle exec gush viz SampleWorkflow
-```
+![SampleWorkflow](https://i.imgur.com/DFh6j51.png)
+## Defining workflows
+Let's start with the simplest workflow possible, consisting of a single job:
-For the Workflow above, the graph will look like this:
+```ruby
+class SimpleWorkflow < Gush::Workflow
+  def configure
+    run DownloadJob
+  end
+end
+```
-![SampleWorkflow](http://i.imgur.com/SmeRRVT.png)
+Of course having a workflow with only a single job does not make sense, so it's time to define dependencies:
+```ruby
+class SimpleWorkflow < Gush::Workflow
+  def configure
+    run DownloadJob
+    run SaveJob, after: DownloadJob
+  end
+end
+```
-#### Passing parameters to jobs
+We just told Gush to execute `SaveJob` right after `DownloadJob` finishes **successfully**.
-You can pass any primitive arguments into jobs while defining your workflow:
+But what if your job must have multiple dependencies? That's easy, just provide an array to the `after` attribute:
 ```ruby
-# app/workflows/sample_workflow.rb
-class SampleWorkflow < Gush::Workflow
+class SimpleWorkflow < Gush::Workflow
   def configure
-    run FetchJob1, params: { url: "http://some.com/url" }
+    run FirstDownloadJob
+    run SecondDownloadJob
+    run SaveJob, after: [FirstDownloadJob, SecondDownloadJob]
   end
 end
 ```
-See below to learn how to access those params inside your job.
+Now `SaveJob` will only execute after both its parents finish without errors.
+With this simple syntax you can build any complex workflows you can imagine!
-#### Defining jobs
+#### Alternative way
-Jobs are classes inheriting from `Gush::Job`:
+`run` method also accepts `before:` attribute to define the opposite association. So we can write the same workflow as above, but like this:
 ```ruby
-# app/jobs/fetch_job.rb
-class FetchJob < Gush::Job
-  def work
-    # do some fetching from remote APIs
+class SimpleWorkflow < Gush::Workflow
+  def configure
+    run FirstDownloadJob, before: SaveJob
+    run SecondDownloadJob, before: SaveJob
-    params #=> {url: "http://some.com/url"}
+    run SaveJob
   end
 end
 ```
-`params` method is a hash containing your (optional) parameters passed to `run` method in the workflow.
+You can use whatever way you find more readable or even both at once :)
-#### Passing arguments to workflows
+### Passing arguments to workflows
 Workflows can accept any primitive arguments in their constructor, which then will be available in your
 `configure` method.
-Here's an example of a workflow responsible for publishing a book:
+Let's assume we are writing a book publishing workflow which needs to know where the PDF of the book is and under what ISBN it will be released:
 ```ruby
-# app/workflows/sample_workflow.rb
 class PublishBookWorkflow < Gush::Workflow
   def configure(url, isbn)
     run FetchBook, params: { url: url }
-    run PublishBook, params: { book_isbn: isbn }
+    run PublishBook, params: { book_isbn: isbn }, after: FetchBook
   end
 end
 ```
@@ -110,57 +157,106 @@ end
 and then create your workflow with those arguments:
 ```ruby
-PublishBookWorkflow.new("http://url.com/book.pdf", "978-0470081204")
+PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
+```
+and that's basically it for defining workflows, see below on how to define jobs:
+## Defining jobs
+The simplest job is a class inheriting from `Gush::Job` and responding to `perform` method. Much like any other ActiveJob class.
+```ruby
+class FetchBook < Gush::Job
+  def perform
+    # do some fetching from remote APIs
+  end
+end
 ```
+But what about those params we passed in the previous step?
-### Running workflows
+## Passing parameters into jobs
-Now that we have defined our workflow we can use it:
+To do that, simply provide a `params:` attribute with a hash of parameters you'd like to have available inside the `perform` method of the job.
-#### 1. Initialize and save it
+So, inside workflow:
 ```ruby
-flow = SampleWorkflow.new(optional, arguments)
-flow.save # saves workflow and its jobs to Redis
+(...)
+run FetchBook, params: {url: "http://url.com/book.pdf"}
+(...)
 ```
-**or:** you can also use a shortcut:
+and within the job we can access them like this:
 ```ruby
-flow = SampleWorkflow.create(optional, arguments)
+class FetchBook < Gush::Job
+  def perform
+    # you can access `params` method here, for example:
+    params #=> {url: "http://url.com/book.pdf"}
+  end
+end
 ```
-#### 2. Start workflow
+## Executing workflows
-First you need to start Sidekiq workers:
+Now that we have defined our workflow and its jobs, we can use it:
+### 1. Start background worker process
+**Important**: The command to start background workers depends on the backend you chose for ActiveJob.
+For example, in case of Sidekiq this would be:
 ```
-bundle exec gush workers
+bundle exec sidekiq -q gush
 ```
-and then start your workflow:
+**[Click here to see backends section in official ActiveJob documentation about configuring backends](http://guides.rubyonrails.org/v4.2/active_job_basics.html#backends)**
+**Hint**: gush uses `gush` queue name by default. Keep that in mind, because some backends (like Sidekiq) will only run jobs from explicitly stated queues.
+### 2. Create the workflow instance
+```ruby
+flow = PublishBookWorkflow.create("http://url.com/book.pdf", "978-0470081204")
+```
+### 3. Start the workflow
 ```ruby
 flow.start!
 ```
-Now Gush will start processing jobs in background using Sidekiq
-in the order defined in `configure` method inside Workflow.
+Now Gush will start processing jobs in the background using ActiveJob and your chosen backend.
+### 4. Monitor its progress:
+```ruby
+flow.reload
+flow.status
+#=> :running|:finished|:failed
+```
+`reload` is needed to see the latest status, since workflows are updated asynchronously.
+## Advanced features
 ### Pipelining
-Gush offers a useful feature which lets you pass results of a job to its dependencies, so they can act accordingly.
+Gush offers a useful tool to pass results of a job to its dependencies, so they can act differently.
 **Example:**
 Let's assume you have two jobs, `DownloadVideo`, `EncodeVideo`.
-The latter needs to know where the first one downloaded the file to be able to open it.
+The latter needs to know where the first one saved the file to be able to open it.
 ```ruby
 class DownloadVideo < Gush::Job
-  def work
+  def perform
     downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")
     output(downloader.file_path)
@@ -168,44 +264,68 @@ class DownloadVideo < Gush::Job
 end
 ```
-`output` method is Gush's way of saying: "I want to pass this down to my descendants".
+`output` method is used to ouput data from the job to all dependant jobs.
-Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload down the (pipe)line:
+Now, since `DownloadVideo` finished and its dependant job `EncodeVideo` started, we can access that payload inside it:
 ```ruby
 class EncodeVideo < Gush::Job
-  def work
-    video_path = payloads["DownloadVideo"]
+  def perform
+    video_path = payloads.first[:output]
   end
 end
 ```
-`payloads` is a hash containing outputs from all parent jobs, where job class names are the keys.
+`payloads` is an array containing outputs from all ancestor jobs. So for our `EncodeVide` job from above, the array will look like:
-**Note:** `payloads` will only contain outputs of the job's ancestors. So if job `A` depends on `B` and `C`,
-the `payloads` hash will look like this:
 ```ruby
-{
-  "B" => (...),
-  "C" => (...)
-}
+[
+  {
+    id: "DownloadVideo-41bfb730-b49f-42ac-a808-156327989294" # unique id of the ancestor job
+    class: "DownloadVideo",
+    output: "https://s3.amazonaws.com/somebucket/downloaded-file.mp4" #the payload returned by DownloadVideo job using `output()` method
+  }
+]
 ```
+**Note:** Keep in mind that payloads can only contain data which **can be serialized as JSON**, because that's how Gush stores them internally.
+### Dynamic workflows
-### Checking status:
+There might be a case when you have to construct the workflow dynamically depending on the input.
+As an example, let's write a workflow which accepts an array of users and has to send an email to each one. Additionally after it sends the e-mail to every user, it also has to notify the admin about finishing.
-#### In Ruby:
 ```ruby
-flow.reload
-flow.status
-#=> :running|:finished|:failed
+class NotifyWorkflow < Gush::Workflow
+  def configure(user_ids)
+    notification_jobs = user_ids.map do |user_id|
+      run NotificationJob, params: {user_id: user_id}
+    end
+    run AdminNotificationJob, after: notification_jobs
+  end
+end
 ```
-`reload` is needed to see the latest status, since workflows are updated asynchronously.
+We can achieve that because `run` method returns the id of the created job, which we can use for chaining dependencies.
-#### Via CLI:
+Now, when we create the workflow like this:
+```ruby
+flow = NotifyWorkflow.create([54, 21, 24, 154, 65]) # 5 user ids as an argument
+```
+it will generate a workflow with 5 `NotificationJob`s and one `AdminNotificationJob` which will depend on all of them:
+![DynamicWorkflow](https://i.imgur.com/HOI3fjc.png)
+## Command line interface (CLI)
+### Checking status
 - of a specific workflow:
@@ -219,18 +339,13 @@ flow.status
   bundle exec gush list
   ```
+### Vizualizing workflows as image
-### Requiring workflows inside your projects
-When using Gush and its CLI commands you need a Gushfile.rb in root directory.
-Gushfile should require all your Workflows and jobs, for example:
+This requires that you have imagemagick installed on your computer:
-```ruby
-require_relative './lib/your_project'
-Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
-  require file
-end
+```
+bundle exec gush viz <NameOfTheWorkflow>
 ```
 ## Contributors