RubyGems - gouda - Versions diffs - 0.1.1 → 0.1.2 - Mend

gouda 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 69ee0eda4e10e8777d9cb1df596472103558ace52ba2fd018de2fcbc28a59ead
-  data.tar.gz: 7d2eb26cafd53af69e52f8a5de8adf8d08678d7618dbccc6d50c500e49780c92
+  metadata.gz: 0fa853c78222eb23897ccb31ed465fc231aa5894641fe0d1991ade90a5e3fc8d
+  data.tar.gz: e9680b441d3fe9c3da7fadf50272c033db712296104870d016b278da2c1d92bd
 SHA512:
-  metadata.gz: 7bb748afdacae3fe76ee49a4b82e45c282d3feeee6fb69cb390b57e9b05db9108d692521454121724fbec81970c297117befd900dde9a687bfc51d8b08a4cd8e
-  data.tar.gz: f14f7bad25ba4d0b2c22ec0dad98747ef1a1e79fa3a5978b987f13fbe340220eddfe6958fa1751083402c7f65a6e3f4129fdbb53bfa0d6e87ecc45692b7d4548
+  metadata.gz: 9a64544cd45d14400ab949a848e0325ee5d5305d648f7f38239279f93e1f8d2d32dac368708317aafbf470b48e16f88a0ffe4bad6890798ef53adea0566da5f6
+  data.tar.gz: e2140d4da50c4afe8edadd51bb3049b60935e4c0273d235dcb1988efa2900362ea81451a58df04ba3f4db58209380ea9a58ccedd42972806bd5be2cd9f19d7d4

data/.github/workflows/ci.yml CHANGED Viewed

@@ -15,9 +15,6 @@ jobs:
       matrix:
         ruby:
           - '2.7'
-          - '3.0'
-          - '3.1'
-          - '3.2'
           - '3.3'
     services:
       postgres:

data/CHANGELOG.md CHANGED Viewed

@@ -7,3 +7,7 @@
 ## [0.1.1] - 2023-06-10
 - Fix support for older ruby versions until 2.7
+## [0.1.2] - 2023-06-11
+- Updated readme and method renaming in Scheduler

data/README.md CHANGED Viewed

@@ -11,7 +11,96 @@ $ bundle install
 $ bin/rails g gouda:install
 ```
-## Usage
+Gouda is build as a lightweight alternative to [good_job](https://github.com/bensheldon/good_job) and has been created before [solid_queue.](https://github.com/rails/solid_queue/)
+It is _smaller_ than solid_queue though.
+It was designed to enable job processing using `SELECT ... FOR UPDATE SKIP LOCKED` on Postgres so that we could use pg_bouncer in our system setup.
+## Key concepts in Gouda: Workload
+Gouda is built around the concept of a **Workload.** A workload is not the same as an ActiveJob. A workload is a single execution of a task - the task may be an entire ActiveJob, or a retry of an ActiveJob, or a part of a sequence of ActiveJobs initiated using [job-iteration](https://github.com/shopify/job-iteration)
+You can easily have multiple `Workloads` stored in your queue which reference the same job. However, when you are using Gouda it is important to always keep the distinction between the two in mind.
+When an ActiveJob gets first initialised, it receives a randomly-generated ActiveJob ID, which is normally a UUID. This UUID will be reused when a job gets retried, or when job-iteration is in use - but it will exist across multiple Gouda workloads.
+A `Workload` can only be in one of the three states: `enqueued`, `executing` and `finished`. It does not matter whether the workload has raised an exception, or was manually canceled before it started performing, or succeeded - its terminal state is always going to be `finished`, regardless. This is done on purpose: Gouda uses a number of partial indexes in Postgres which allows it to maintain uniqueness, but only among jobs which are either waiting to start or already running. Additionally, _only the transitions between those states_ are guarded by `BEGIN...COMMIT` and it is the selection on those states that is supplemented by `SELECT ... FOR UPDATE SKIP LOCKED`. The only time locks are placed on a particular `gouda_workloads` row is when this update is about to take place (`SELECT` then `UPDATE`). This makes Gouda a good fit for use with pg_bouncer in transaction mode.
+Understanding workload identity is key for making good use of Gouda. For example, an ActiveJob that gets retried can take the following shape in Gouda:
+```
+ ____________________________         _______________________________________________
+| ActiveJob(id="0abc-...34") | ----> |  Workload(id="f67b-...123",state="finished")  |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+ ____________________________         _______________________________________________
+| ActiveJob(id="0abc-...34") | ----> |  Workload(id="5e52-...456",state="finished")  |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+ ____________________________         _______________________________________________
+| ActiveJob(id="0abc-...34") | ----> |  Workload(id="8a41-...789",state="enqueued")  |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+```
+This would happen if, for example, the ActiveJob raises an exception inside `perform` and is configured to `retry_on` after this exception. Same for job-iteration:
+```
+ _______________________________________         _______________________________________________
+| ActiveJob(id="0abc-...34",cursor=nil) | ----> |  Workload(id="f67b-...123",state="finished")  |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+ _______________________________________         _______________________________________________
+| ActiveJob(id="0abc-...34",cursor=123) | ----> |  Workload(id="5e52-...456",state="finished")  |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+ _______________________________________         _______________________________________________
+| ActiveJob(id="0abc-...34",cursor=456) | ----> |  Workload(id="8a41-...789",state="executing") |
+ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
+```
+A key thing to remember when reading the Gouda source code is that **workloads and jobs are not the same thing.** A single job **may span multiple workloads.**
+## Key concepts in Gouda: concurrency keys
+Gouda has a few indexes on the `gouda_workloads` table which will:
+* Forbid inserting another `enqueued` workload with the same `enqueue_concurrency_key` value. Uniqueness is on that column only.
+* Forbid a workload from transition into `executing` when another workload with the same `execution_concurrency_key` is already running.
+These are compatible with good_job concurrency keys, with one major distinction: we use unique indices and not counters, so these keys can be used
+to **prevent concurrent executions** but not to **limit the load on the system**, and the limit of 1 is always enforced.
+## Key concepts in Gouda: `executing_on`
+A `Workload` is executing on a particular `executing_on` entity - usually a worker thread. That entity gets a pseudorandom ID . The `executing_on` value can be used to see, for example, whether a particular worker thread has hung. If multiple jobs have a far-behind `updated_at` and are all `executing`, this likely means that the worker has crashed or hung. The value can also be used to build a table of currently running workers.
+## Usage tips: bulkify your enqueues
+When possible, Gouda uses `enqueue_all` to `INSERT` as many jobs at once as possible. With modern servers this allows for very rapid insertion of very large
+batches of jobs. It is supplemented by a module which will make all `perform_later` calls buffered and submitted to the queue in bulk:
+```ruby
+Gouda.in_bulk do
+  User.joined_recently.find_each do |user|
+    WelcomeMailer.with(user:).welcome_email.deliver_later
+  end
+end
+```
+If there are multiple ActiveJob adapters configured and you bulk-enqueue a job which uses an adapter different than Gouda, `in_bulk` will try to use `enqueue_all` on that
+adapter as well.
+## Usage tips: co-commit
+Gouda is designed to `COMMIT` the workload together with your business data. It does not need `after_commit` unless you so choose. In fact,
+the main advantage of DB-based job queues such as Gouda is that you can always rely on the fact that the workload will be enqueued only
+once the data it needs to operate on is already available for reading. This is guaranteed to work:
+```ruby
+User.transaction do
+  freshly_joined_user = User.create!(user_params)
+  WelcomeMailer.with(user: freshly_joined_user).welcome_email.deliver_later
+end
+```
+## Web UI
 At the moment the Gouda UI is proprietary, so this gem only provides a "headless" implementation. We expect this to change in the future.

data/lib/gouda/railtie.rb CHANGED Viewed

@@ -34,8 +34,6 @@ module Gouda
     # The `to_prepare` block which is executed once in production
     # and before each request in development.
     config.to_prepare do
-      Gouda::Scheduler.update_schedule_from_config!
       if defined?(Rails) && Rails.respond_to?(:application)
         config_from_rails = Rails.application.config.try(:gouda)
         if config_from_rails
@@ -52,6 +50,9 @@ module Gouda
         Gouda.config.polling_sleep_interval_seconds = 0.2
         Gouda.config.logger.level = Gouda.config.log_level
       end
+      Gouda::Scheduler.build_scheduler_entries_list!
+      Gouda::Scheduler.upsert_workloads_from_entries_list!
     end
   end
 end

data/lib/gouda/scheduler.rb CHANGED Viewed

@@ -53,7 +53,33 @@ module Gouda::Scheduler
     end
   end
-  def self.update_schedule_from_config!(cron_table_hash = nil)
+  # Takes in a Hash formatted with cron entries in the format similar
+  # to good_job, and builds a table of scheduler entries. A scheduler
+  # entry references a particular job class name, the set of arguments to
+  # be passed to the job when performing it, and either the interval
+  # to repeat the job after or a cron pattern. This method does not
+  # insert the actual Workloads into the database but just builds the
+  # table of the entries. That table gets consulted when workloads finish
+  # to determine whether the workload that just ran was scheduled or ad-hoc,
+  # and whether the subsequent workload has to be enqueued.
+  #
+  # If no table is given the method will attempt to read the table from
+  # Rails application config from `[:gouda][:cron]`.
+  #
+  # The table is a Hash of entries, and the keys are the names of the workload
+  # to be enqueued - those keys are also used to ensure scheduled workloads
+  # only get scheduled once.
+  #
+  # @param cron_table_hash[Hash] a hash of the following shape:
+  #     {
+  #       download_invoices_every_minute: {
+  #         cron: "* * * * *",
+  #         class: "DownloadInvoicesJob",
+  #         args: ["immediate"]
+  #       }
+  #     }
+  # @return Array[Entry]
+  def self.build_scheduler_entries_list!(cron_table_hash = nil)
     Gouda.logger.info "Updating scheduled workload entries..."
     if cron_table_hash.blank?
       config_from_rails = Rails.application.config.try(:gouda)
@@ -76,6 +102,12 @@ module Gouda::Scheduler
     end
   end
+  # Once a workload has finished (doesn't matter whether it raised an exception
+  # or completed successfully), it is going to be passed to this method to enqueue
+  # the next scheduled workload
+  #
+  # @param finished_workload[Gouda::Workload]
+  # @return void
   def self.enqueue_next_scheduled_workload_for(finished_workload)
     return unless finished_workload.scheduler_key
@@ -86,11 +118,23 @@ module Gouda::Scheduler
     Gouda.enqueue_jobs_via_their_adapters([timer_entry.build_active_job])
   end
+  # Returns the list of entries of the scheduler which are currently known. Normally the
+  # scheduler will hold the list of entries loaded from the Rails config.
+  #
+  # @return Array[Entry]
   def self.entries
     @cron_table || []
   end
-  def self.update_scheduled_workloads!
+  # Will upsert (`INSERT ... ON CONFLICT UPDATE`) workloads for all entries which are in the scheduler entries
+  # table (the table needs to be read or hydrated first using `build_scheduler_entries_list!`). This is done
+  # in a transaction. Any workloads which have been previously inserted from the scheduled entries, but no
+  # longer have a corresponding scheduler entry, will be deleted from the database. If there already are workloads
+  # with the corresponding scheduler key they will not be touched and will be performed with their previously-defined
+  # arguments.
+  #
+  # @return void
+  def self.upsert_workloads_from_entries_list!
     table_entries = @cron_table || []
     # Remove any cron keyed workloads which no longer match config-wise

data/lib/gouda/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Gouda
-  VERSION = "0.1.1"
+  VERSION = "0.1.2"
 end

data/lib/gouda.rb CHANGED Viewed

@@ -46,7 +46,7 @@ module Gouda
   end
   def self.start
-    Gouda::Scheduler.update_scheduled_workloads!
+    Gouda::Scheduler.upsert_workloads_from_entries_list!
     queue_constraint = if ENV["GOUDA_QUEUES"]
       Gouda.parse_queue_constraint(ENV["GOUDA_QUEUES"])

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: gouda
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.1.2
 platform: ruby
 authors:
 - Sebastian van Hesteren
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2024-06-10 00:00:00.000000000 Z
+date: 2024-06-11 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activerecord