gouda 0.1.1 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/ci.yml +0 -3
- data/CHANGELOG.md +4 -0
- data/README.md +90 -1
- data/lib/gouda/railtie.rb +3 -2
- data/lib/gouda/scheduler.rb +46 -2
- data/lib/gouda/version.rb +1 -1
- data/lib/gouda.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0fa853c78222eb23897ccb31ed465fc231aa5894641fe0d1991ade90a5e3fc8d
|
4
|
+
data.tar.gz: e9680b441d3fe9c3da7fadf50272c033db712296104870d016b278da2c1d92bd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9a64544cd45d14400ab949a848e0325ee5d5305d648f7f38239279f93e1f8d2d32dac368708317aafbf470b48e16f88a0ffe4bad6890798ef53adea0566da5f6
|
7
|
+
data.tar.gz: e2140d4da50c4afe8edadd51bb3049b60935e4c0273d235dcb1988efa2900362ea81451a58df04ba3f4db58209380ea9a58ccedd42972806bd5be2cd9f19d7d4
|
data/.github/workflows/ci.yml
CHANGED
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -11,7 +11,96 @@ $ bundle install
|
|
11
11
|
$ bin/rails g gouda:install
|
12
12
|
```
|
13
13
|
|
14
|
-
|
14
|
+
Gouda is build as a lightweight alternative to [good_job](https://github.com/bensheldon/good_job) and has been created before [solid_queue.](https://github.com/rails/solid_queue/)
|
15
|
+
It is _smaller_ than solid_queue though.
|
16
|
+
|
17
|
+
It was designed to enable job processing using `SELECT ... FOR UPDATE SKIP LOCKED` on Postgres so that we could use pg_bouncer in our system setup.
|
18
|
+
|
19
|
+
|
20
|
+
## Key concepts in Gouda: Workload
|
21
|
+
|
22
|
+
Gouda is built around the concept of a **Workload.** A workload is not the same as an ActiveJob. A workload is a single execution of a task - the task may be an entire ActiveJob, or a retry of an ActiveJob, or a part of a sequence of ActiveJobs initiated using [job-iteration](https://github.com/shopify/job-iteration)
|
23
|
+
|
24
|
+
You can easily have multiple `Workloads` stored in your queue which reference the same job. However, when you are using Gouda it is important to always keep the distinction between the two in mind.
|
25
|
+
|
26
|
+
When an ActiveJob gets first initialised, it receives a randomly-generated ActiveJob ID, which is normally a UUID. This UUID will be reused when a job gets retried, or when job-iteration is in use - but it will exist across multiple Gouda workloads.
|
27
|
+
|
28
|
+
A `Workload` can only be in one of the three states: `enqueued`, `executing` and `finished`. It does not matter whether the workload has raised an exception, or was manually canceled before it started performing, or succeeded - its terminal state is always going to be `finished`, regardless. This is done on purpose: Gouda uses a number of partial indexes in Postgres which allows it to maintain uniqueness, but only among jobs which are either waiting to start or already running. Additionally, _only the transitions between those states_ are guarded by `BEGIN...COMMIT` and it is the selection on those states that is supplemented by `SELECT ... FOR UPDATE SKIP LOCKED`. The only time locks are placed on a particular `gouda_workloads` row is when this update is about to take place (`SELECT` then `UPDATE`). This makes Gouda a good fit for use with pg_bouncer in transaction mode.
|
29
|
+
|
30
|
+
Understanding workload identity is key for making good use of Gouda. For example, an ActiveJob that gets retried can take the following shape in Gouda:
|
31
|
+
|
32
|
+
```
|
33
|
+
____________________________ _______________________________________________
|
34
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="f67b-...123",state="finished") |
|
35
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
36
|
+
____________________________ _______________________________________________
|
37
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="5e52-...456",state="finished") |
|
38
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
39
|
+
____________________________ _______________________________________________
|
40
|
+
| ActiveJob(id="0abc-...34") | ----> | Workload(id="8a41-...789",state="enqueued") |
|
41
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
42
|
+
```
|
43
|
+
|
44
|
+
This would happen if, for example, the ActiveJob raises an exception inside `perform` and is configured to `retry_on` after this exception. Same for job-iteration:
|
45
|
+
|
46
|
+
```
|
47
|
+
_______________________________________ _______________________________________________
|
48
|
+
| ActiveJob(id="0abc-...34",cursor=nil) | ----> | Workload(id="f67b-...123",state="finished") |
|
49
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
50
|
+
_______________________________________ _______________________________________________
|
51
|
+
| ActiveJob(id="0abc-...34",cursor=123) | ----> | Workload(id="5e52-...456",state="finished") |
|
52
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
53
|
+
_______________________________________ _______________________________________________
|
54
|
+
| ActiveJob(id="0abc-...34",cursor=456) | ----> | Workload(id="8a41-...789",state="executing") |
|
55
|
+
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|
56
|
+
```
|
57
|
+
|
58
|
+
A key thing to remember when reading the Gouda source code is that **workloads and jobs are not the same thing.** A single job **may span multiple workloads.**
|
59
|
+
|
60
|
+
## Key concepts in Gouda: concurrency keys
|
61
|
+
|
62
|
+
Gouda has a few indexes on the `gouda_workloads` table which will:
|
63
|
+
|
64
|
+
* Forbid inserting another `enqueued` workload with the same `enqueue_concurrency_key` value. Uniqueness is on that column only.
|
65
|
+
* Forbid a workload from transition into `executing` when another workload with the same `execution_concurrency_key` is already running.
|
66
|
+
|
67
|
+
These are compatible with good_job concurrency keys, with one major distinction: we use unique indices and not counters, so these keys can be used
|
68
|
+
to **prevent concurrent executions** but not to **limit the load on the system**, and the limit of 1 is always enforced.
|
69
|
+
|
70
|
+
## Key concepts in Gouda: `executing_on`
|
71
|
+
|
72
|
+
A `Workload` is executing on a particular `executing_on` entity - usually a worker thread. That entity gets a pseudorandom ID . The `executing_on` value can be used to see, for example, whether a particular worker thread has hung. If multiple jobs have a far-behind `updated_at` and are all `executing`, this likely means that the worker has crashed or hung. The value can also be used to build a table of currently running workers.
|
73
|
+
|
74
|
+
## Usage tips: bulkify your enqueues
|
75
|
+
|
76
|
+
When possible, Gouda uses `enqueue_all` to `INSERT` as many jobs at once as possible. With modern servers this allows for very rapid insertion of very large
|
77
|
+
batches of jobs. It is supplemented by a module which will make all `perform_later` calls buffered and submitted to the queue in bulk:
|
78
|
+
|
79
|
+
```ruby
|
80
|
+
Gouda.in_bulk do
|
81
|
+
User.joined_recently.find_each do |user|
|
82
|
+
WelcomeMailer.with(user:).welcome_email.deliver_later
|
83
|
+
end
|
84
|
+
end
|
85
|
+
```
|
86
|
+
|
87
|
+
If there are multiple ActiveJob adapters configured and you bulk-enqueue a job which uses an adapter different than Gouda, `in_bulk` will try to use `enqueue_all` on that
|
88
|
+
adapter as well.
|
89
|
+
|
90
|
+
## Usage tips: co-commit
|
91
|
+
|
92
|
+
Gouda is designed to `COMMIT` the workload together with your business data. It does not need `after_commit` unless you so choose. In fact,
|
93
|
+
the main advantage of DB-based job queues such as Gouda is that you can always rely on the fact that the workload will be enqueued only
|
94
|
+
once the data it needs to operate on is already available for reading. This is guaranteed to work:
|
95
|
+
|
96
|
+
```ruby
|
97
|
+
User.transaction do
|
98
|
+
freshly_joined_user = User.create!(user_params)
|
99
|
+
WelcomeMailer.with(user: freshly_joined_user).welcome_email.deliver_later
|
100
|
+
end
|
101
|
+
```
|
102
|
+
|
103
|
+
## Web UI
|
15
104
|
|
16
105
|
At the moment the Gouda UI is proprietary, so this gem only provides a "headless" implementation. We expect this to change in the future.
|
17
106
|
|
data/lib/gouda/railtie.rb
CHANGED
@@ -34,8 +34,6 @@ module Gouda
|
|
34
34
|
# The `to_prepare` block which is executed once in production
|
35
35
|
# and before each request in development.
|
36
36
|
config.to_prepare do
|
37
|
-
Gouda::Scheduler.update_schedule_from_config!
|
38
|
-
|
39
37
|
if defined?(Rails) && Rails.respond_to?(:application)
|
40
38
|
config_from_rails = Rails.application.config.try(:gouda)
|
41
39
|
if config_from_rails
|
@@ -52,6 +50,9 @@ module Gouda
|
|
52
50
|
Gouda.config.polling_sleep_interval_seconds = 0.2
|
53
51
|
Gouda.config.logger.level = Gouda.config.log_level
|
54
52
|
end
|
53
|
+
|
54
|
+
Gouda::Scheduler.build_scheduler_entries_list!
|
55
|
+
Gouda::Scheduler.upsert_workloads_from_entries_list!
|
55
56
|
end
|
56
57
|
end
|
57
58
|
end
|
data/lib/gouda/scheduler.rb
CHANGED
@@ -53,7 +53,33 @@ module Gouda::Scheduler
|
|
53
53
|
end
|
54
54
|
end
|
55
55
|
|
56
|
-
|
56
|
+
# Takes in a Hash formatted with cron entries in the format similar
|
57
|
+
# to good_job, and builds a table of scheduler entries. A scheduler
|
58
|
+
# entry references a particular job class name, the set of arguments to
|
59
|
+
# be passed to the job when performing it, and either the interval
|
60
|
+
# to repeat the job after or a cron pattern. This method does not
|
61
|
+
# insert the actual Workloads into the database but just builds the
|
62
|
+
# table of the entries. That table gets consulted when workloads finish
|
63
|
+
# to determine whether the workload that just ran was scheduled or ad-hoc,
|
64
|
+
# and whether the subsequent workload has to be enqueued.
|
65
|
+
#
|
66
|
+
# If no table is given the method will attempt to read the table from
|
67
|
+
# Rails application config from `[:gouda][:cron]`.
|
68
|
+
#
|
69
|
+
# The table is a Hash of entries, and the keys are the names of the workload
|
70
|
+
# to be enqueued - those keys are also used to ensure scheduled workloads
|
71
|
+
# only get scheduled once.
|
72
|
+
#
|
73
|
+
# @param cron_table_hash[Hash] a hash of the following shape:
|
74
|
+
# {
|
75
|
+
# download_invoices_every_minute: {
|
76
|
+
# cron: "* * * * *",
|
77
|
+
# class: "DownloadInvoicesJob",
|
78
|
+
# args: ["immediate"]
|
79
|
+
# }
|
80
|
+
# }
|
81
|
+
# @return Array[Entry]
|
82
|
+
def self.build_scheduler_entries_list!(cron_table_hash = nil)
|
57
83
|
Gouda.logger.info "Updating scheduled workload entries..."
|
58
84
|
if cron_table_hash.blank?
|
59
85
|
config_from_rails = Rails.application.config.try(:gouda)
|
@@ -76,6 +102,12 @@ module Gouda::Scheduler
|
|
76
102
|
end
|
77
103
|
end
|
78
104
|
|
105
|
+
# Once a workload has finished (doesn't matter whether it raised an exception
|
106
|
+
# or completed successfully), it is going to be passed to this method to enqueue
|
107
|
+
# the next scheduled workload
|
108
|
+
#
|
109
|
+
# @param finished_workload[Gouda::Workload]
|
110
|
+
# @return void
|
79
111
|
def self.enqueue_next_scheduled_workload_for(finished_workload)
|
80
112
|
return unless finished_workload.scheduler_key
|
81
113
|
|
@@ -86,11 +118,23 @@ module Gouda::Scheduler
|
|
86
118
|
Gouda.enqueue_jobs_via_their_adapters([timer_entry.build_active_job])
|
87
119
|
end
|
88
120
|
|
121
|
+
# Returns the list of entries of the scheduler which are currently known. Normally the
|
122
|
+
# scheduler will hold the list of entries loaded from the Rails config.
|
123
|
+
#
|
124
|
+
# @return Array[Entry]
|
89
125
|
def self.entries
|
90
126
|
@cron_table || []
|
91
127
|
end
|
92
128
|
|
93
|
-
|
129
|
+
# Will upsert (`INSERT ... ON CONFLICT UPDATE`) workloads for all entries which are in the scheduler entries
|
130
|
+
# table (the table needs to be read or hydrated first using `build_scheduler_entries_list!`). This is done
|
131
|
+
# in a transaction. Any workloads which have been previously inserted from the scheduled entries, but no
|
132
|
+
# longer have a corresponding scheduler entry, will be deleted from the database. If there already are workloads
|
133
|
+
# with the corresponding scheduler key they will not be touched and will be performed with their previously-defined
|
134
|
+
# arguments.
|
135
|
+
#
|
136
|
+
# @return void
|
137
|
+
def self.upsert_workloads_from_entries_list!
|
94
138
|
table_entries = @cron_table || []
|
95
139
|
|
96
140
|
# Remove any cron keyed workloads which no longer match config-wise
|
data/lib/gouda/version.rb
CHANGED
data/lib/gouda.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: gouda
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Sebastian van Hesteren
|
@@ -9,7 +9,7 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2024-06-
|
12
|
+
date: 2024-06-11 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: activerecord
|