gouda 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/ci.yml +0 -3
- data/CHANGELOG.md +4 -0
- data/README.md +90 -1
- data/lib/gouda/railtie.rb +3 -2
- data/lib/gouda/scheduler.rb +46 -2
- data/lib/gouda/version.rb +1 -1
- data/lib/gouda.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0fa853c78222eb23897ccb31ed465fc231aa5894641fe0d1991ade90a5e3fc8d
|
4
|
+
data.tar.gz: e9680b441d3fe9c3da7fadf50272c033db712296104870d016b278da2c1d92bd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9a64544cd45d14400ab949a848e0325ee5d5305d648f7f38239279f93e1f8d2d32dac368708317aafbf470b48e16f88a0ffe4bad6890798ef53adea0566da5f6
|
7
|
+
data.tar.gz: e2140d4da50c4afe8edadd51bb3049b60935e4c0273d235dcb1988efa2900362ea81451a58df04ba3f4db58209380ea9a58ccedd42972806bd5be2cd9f19d7d4
|
data/.github/workflows/ci.yml
CHANGED
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -11,7 +11,96 @@ $ bundle install
|
|
11
11
|
$ bin/rails g gouda:install
|
12
12
|
```
|
13
13
|
|
14
|
-
|
14
|
+
Gouda is build as a lightweight alternative to [good_job](https://github.com/bensheldon/good_job) and has been created before [solid_queue.](https://github.com/rails/solid_queue/)
|
15
|
+
It is _smaller_ than solid_queue though.
|
16
|
+
|
17
|
+
It was designed to enable job processing using `SELECT ... FOR UPDATE SKIP LOCKED` on Postgres so that we could use pg_bouncer in our system setup.
|
18
|
+
|
19
|
+
|
20
|
+
## Key concepts in Gouda: Workload
|
21
|
+
|
22
|
+
Gouda is built around the concept of a **Workload.** A workload is not the same as an ActiveJob. A workload is a single execution of a task - the task may be an entire ActiveJob, or a retry of an ActiveJob, or a part of a sequence of ActiveJobs initiated using [job-iteration](https://github.com/shopify/job-iteration)
|
23
|
+
|
24
|
+
You can easily have multiple `Workloads` stored in your queue which reference the same job. However, when you are using Gouda it is important to always keep the distinction between the two in mind.
|
25
|
+
|
26
|
+
When an ActiveJob gets first initialised, it receives a randomly-generated ActiveJob ID, which is normally a UUID. This UUID will be reused when a job gets retried, or when job-iteration is in use - but it will exist across multiple Gouda workloads.
|
27
|
+
|
28
|
+
A `Workload` can only be in one of the three states: `enqueued`, `executing` and `finished`. It does not matter whether the workload has raised an exception, or was manually canceled before it started performing, or succeeded - its terminal state is always going to be `finished`, regardless. This is done on purpose: Gouda uses a number of partial indexes in Postgres which allows it to maintain uniqueness, but only among jobs which are either waiting to start or already running. Additionally, _only the transitions between those states_ are guarded by `BEGIN...COMMIT` and it is the selection on those states that is supplemented by `SELECT ... FOR UPDATE SKIP LOCKED`. The only time locks are placed on a particular `gouda_workloads` row is when this update is about to take place (`SELECT` then `UPDATE`). This makes Gouda a good fit for use with pg_bouncer in transaction mode.
|
29
|
+
|
30
|
+
Understanding workload identity is key for making good use of Gouda. For example, an ActiveJob that gets retried can take the following shape in Gouda:
|
31
|
+
|
32
|
+
```
|
33
|
+
____________________________ _______________________________________________
|
34
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="f67b-...123",state="finished") |
|
35
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
36
|
+
____________________________ _______________________________________________
|
37
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="5e52-...456",state="finished") |
|
38
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
39
|
+
____________________________ _______________________________________________
|
40
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="8a41-...789",state="enqueued") |
|
41
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
42
|
+
```
|
43
|
+
|
44
|
+
This would happen if, for example, the ActiveJob raises an exception inside `perform` and is configured to `retry_on` after this exception. Same for job-iteration:
|
45
|
+
|
46
|
+
```
|
47
|
+
_______________________________________ _______________________________________________
|
48
|
+
| ActiveJob(id="0abc-...34",cursor=nil) | ----> | Workload(id="f67b-...123",state="finished") |
|
49
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
50
|
+
_______________________________________ _______________________________________________
|
51
|
+
| ActiveJob(id="0abc-...34",cursor=123) | ----> | Workload(id="5e52-...456",state="finished") |
|
52
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
53
|
+
_______________________________________ _______________________________________________
|
54
|
+
| ActiveJob(id="0abc-...34",cursor=456) | ----> | Workload(id="8a41-...789",state="executing") |
|
55
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
56
|
+
```
|
57
|
+
|
58
|
+
A key thing to remember when reading the Gouda source code is that **workloads and jobs are not the same thing.** A single job **may span multiple workloads.**
|
59
|
+
|
60
|
+
## Key concepts in Gouda: concurrency keys
|
61
|
+
|
62
|
+
Gouda has a few indexes on the `gouda_workloads` table which will:
|
63
|
+
|
64
|
+
* Forbid inserting another `enqueued` workload with the same `enqueue_concurrency_key` value. Uniqueness is on that column only.
|
65
|
+
* Forbid a workload from transition into `executing` when another workload with the same `execution_concurrency_key` is already running.
|
66
|
+
|
67
|
+
These are compatible with good_job concurrency keys, with one major distinction: we use unique indices and not counters, so these keys can be used
|
68
|
+
to **prevent concurrent executions** but not to **limit the load on the system**, and the limit of 1 is always enforced.
|
69
|
+
|
70
|
+
## Key concepts in Gouda: `executing_on`
|
71
|
+
|
72
|
+
A `Workload` is executing on a particular `executing_on` entity - usually a worker thread. That entity gets a pseudorandom ID . The `executing_on` value can be used to see, for example, whether a particular worker thread has hung. If multiple jobs have a far-behind `updated_at` and are all `executing`, this likely means that the worker has crashed or hung. The value can also be used to build a table of currently running workers.
|
73
|
+
|
74
|
+
## Usage tips: bulkify your enqueues
|
75
|
+
|
76
|
+
When possible, Gouda uses `enqueue_all` to `INSERT` as many jobs at once as possible. With modern servers this allows for very rapid insertion of very large
|
77
|
+
batches of jobs. It is supplemented by a module which will make all `perform_later` calls buffered and submitted to the queue in bulk:
|
78
|
+
|
79
|
+
```ruby
|
80
|
+
Gouda.in_bulk do
|
81
|
+
User.joined_recently.find_each do |user|
|
82
|
+
WelcomeMailer.with(user:).welcome_email.deliver_later
|
83
|
+
end
|
84
|
+
end
|
85
|
+
```
|
86
|
+
|
87
|
+
If there are multiple ActiveJob adapters configured and you bulk-enqueue a job which uses an adapter different than Gouda, `in_bulk` will try to use `enqueue_all` on that
|
88
|
+
adapter as well.
|
89
|
+
|
90
|
+
## Usage tips: co-commit
|
91
|
+
|
92
|
+
Gouda is designed to `COMMIT` the workload together with your business data. It does not need `after_commit` unless you so choose. In fact,
|
93
|
+
the main advantage of DB-based job queues such as Gouda is that you can always rely on the fact that the workload will be enqueued only
|
94
|
+
once the data it needs to operate on is already available for reading. This is guaranteed to work:
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
User.transaction do
|
98
|
+
freshly_joined_user = User.create!(user_params)
|
99
|
+
WelcomeMailer.with(user: freshly_joined_user).welcome_email.deliver_later
|
100
|
+
end
|
101
|
+
```
|
102
|
+
|
103
|
+
## Web UI
|
15
104
|
|
16
105
|
At the moment the Gouda UI is proprietary, so this gem only provides a "headless" implementation. We expect this to change in the future.
|
17
106
|
|
data/lib/gouda/railtie.rb
CHANGED
@@ -34,8 +34,6 @@ module Gouda
|
|
34
34
|
# The `to_prepare` block which is executed once in production
|
35
35
|
# and before each request in development.
|
36
36
|
config.to_prepare do
|
37
|
-
Gouda::Scheduler.update_schedule_from_config!
|
38
|
-
|
39
37
|
if defined?(Rails) && Rails.respond_to?(:application)
|
40
38
|
config_from_rails = Rails.application.config.try(:gouda)
|
41
39
|
if config_from_rails
|
@@ -52,6 +50,9 @@ module Gouda
|
|
52
50
|
Gouda.config.polling_sleep_interval_seconds = 0.2
|
53
51
|
Gouda.config.logger.level = Gouda.config.log_level
|
54
52
|
end
|
53
|
+
|
54
|
+
Gouda::Scheduler.build_scheduler_entries_list!
|
55
|
+
Gouda::Scheduler.upsert_workloads_from_entries_list!
|
55
56
|
end
|
56
57
|
end
|
57
58
|
end
|
data/lib/gouda/scheduler.rb
CHANGED
@@ -53,7 +53,33 @@ module Gouda::Scheduler
|
|
53
53
|
end
|
54
54
|
end
|
55
55
|
|
56
|
-
|
56
|
+
# Takes in a Hash formatted with cron entries in the format similar
|
57
|
+
# to good_job, and builds a table of scheduler entries. A scheduler
|
58
|
+
# entry references a particular job class name, the set of arguments to
|
59
|
+
# be passed to the job when performing it, and either the interval
|
60
|
+
# to repeat the job after or a cron pattern. This method does not
|
61
|
+
# insert the actual Workloads into the database but just builds the
|
62
|
+
# table of the entries. That table gets consulted when workloads finish
|
63
|
+
# to determine whether the workload that just ran was scheduled or ad-hoc,
|
64
|
+
# and whether the subsequent workload has to be enqueued.
|
65
|
+
#
|
66
|
+
# If no table is given the method will attempt to read the table from
|
67
|
+
# Rails application config from `[:gouda][:cron]`.
|
68
|
+
#
|
69
|
+
# The table is a Hash of entries, and the keys are the names of the workload
|
70
|
+
# to be enqueued - those keys are also used to ensure scheduled workloads
|
71
|
+
# only get scheduled once.
|
72
|
+
#
|
73
|
+
# @param cron_table_hash[Hash] a hash of the following shape:
|
74
|
+
# {
|
75
|
+
# download_invoices_every_minute: {
|
76
|
+
# cron: "* * * * *",
|
77
|
+
# class: "DownloadInvoicesJob",
|
78
|
+
# args: ["immediate"]
|
79
|
+
# }
|
80
|
+
# }
|
81
|
+
# @return Array[Entry]
|
82
|
+
def self.build_scheduler_entries_list!(cron_table_hash = nil)
|
57
83
|
Gouda.logger.info "Updating scheduled workload entries..."
|
58
84
|
if cron_table_hash.blank?
|
59
85
|
config_from_rails = Rails.application.config.try(:gouda)
|
@@ -76,6 +102,12 @@ module Gouda::Scheduler
|
|
76
102
|
end
|
77
103
|
end
|
78
104
|
|
105
|
+
# Once a workload has finished (doesn't matter whether it raised an exception
|
106
|
+
# or completed successfully), it is going to be passed to this method to enqueue
|
107
|
+
# the next scheduled workload
|
108
|
+
#
|
109
|
+
# @param finished_workload[Gouda::Workload]
|
110
|
+
# @return void
|
79
111
|
def self.enqueue_next_scheduled_workload_for(finished_workload)
|
80
112
|
return unless finished_workload.scheduler_key
|
81
113
|
|
@@ -86,11 +118,23 @@ module Gouda::Scheduler
|
|
86
118
|
Gouda.enqueue_jobs_via_their_adapters([timer_entry.build_active_job])
|
87
119
|
end
|
88
120
|
|
121
|
+
# Returns the list of entries of the scheduler which are currently known. Normally the
|
122
|
+
# scheduler will hold the list of entries loaded from the Rails config.
|
123
|
+
#
|
124
|
+
# @return Array[Entry]
|
89
125
|
def self.entries
|
90
126
|
@cron_table || []
|
91
127
|
end
|
92
128
|
|
93
|
-
|
129
|
+
# Will upsert (`INSERT ... ON CONFLICT UPDATE`) workloads for all entries which are in the scheduler entries
|
130
|
+
# table (the table needs to be read or hydrated first using `build_scheduler_entries_list!`). This is done
|
131
|
+
# in a transaction. Any workloads which have been previously inserted from the scheduled entries, but no
|
132
|
+
# longer have a corresponding scheduler entry, will be deleted from the database. If there already are workloads
|
133
|
+
# with the corresponding scheduler key they will not be touched and will be performed with their previously-defined
|
134
|
+
# arguments.
|
135
|
+
#
|
136
|
+
# @return void
|
137
|
+
def self.upsert_workloads_from_entries_list!
|
94
138
|
table_entries = @cron_table || []
|
95
139
|
|
96
140
|
# Remove any cron keyed workloads which no longer match config-wise
|
data/lib/gouda/version.rb
CHANGED
data/lib/gouda.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gouda
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sebastian van Hesteren
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2024-06-
|
12
|
+
date: 2024-06-11 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: activerecord
|