cloudtasker 0.10.rc5 → 0.10.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/test.yml +7 -3
- data/.rubocop.yml +7 -1
- data/Appraisals +16 -0
- data/CHANGELOG.md +32 -4
- data/README.md +169 -37
- data/app/controllers/cloudtasker/worker_controller.rb +11 -2
- data/cloudtasker.gemspec +3 -3
- data/docs/UNIQUE_JOBS.md +62 -0
- data/gemfiles/semantic_logger_3.4.gemfile +7 -0
- data/gemfiles/semantic_logger_4.6.gemfile +7 -0
- data/gemfiles/semantic_logger_4.7.0.gemfile +7 -0
- data/gemfiles/semantic_logger_4.7.2.gemfile +7 -0
- data/gemfiles/semantic_logger_4.7.gemfile +7 -0
- data/lib/cloudtasker/backend/google_cloud_task.rb +19 -7
- data/lib/cloudtasker/backend/memory_task.rb +17 -5
- data/lib/cloudtasker/backend/redis_task.rb +2 -1
- data/lib/cloudtasker/batch/middleware/server.rb +1 -1
- data/lib/cloudtasker/config.rb +3 -0
- data/lib/cloudtasker/cron/job.rb +0 -5
- data/lib/cloudtasker/cron/middleware/server.rb +1 -1
- data/lib/cloudtasker/cron/schedule.rb +0 -3
- data/lib/cloudtasker/unique_job.rb +27 -0
- data/lib/cloudtasker/unique_job/job.rb +41 -6
- data/lib/cloudtasker/unique_job/middleware/client.rb +1 -1
- data/lib/cloudtasker/unique_job/middleware/server.rb +1 -1
- data/lib/cloudtasker/version.rb +1 -1
- data/lib/cloudtasker/worker.rb +43 -9
- data/lib/cloudtasker/worker_handler.rb +3 -26
- data/lib/cloudtasker/worker_logger.rb +2 -2
- metadata +39 -6
@@ -51,19 +51,28 @@ module Cloudtasker
|
|
51
51
|
end
|
52
52
|
|
53
53
|
# Return content parsed as JSON and add job retries count
|
54
|
-
JSON.parse(content).merge(job_retries: job_retries)
|
54
|
+
JSON.parse(content).merge(job_retries: job_retries, task_id: task_id)
|
55
55
|
end
|
56
56
|
end
|
57
57
|
|
58
58
|
#
|
59
59
|
# Extract the number of times this task failed at runtime.
|
60
60
|
#
|
61
|
-
# @return [Integer] The number of failures
|
61
|
+
# @return [Integer] The number of failures.
|
62
62
|
#
|
63
63
|
def job_retries
|
64
64
|
request.headers[Cloudtasker::Config::RETRY_HEADER].to_i
|
65
65
|
end
|
66
66
|
|
67
|
+
#
|
68
|
+
# Return the Google Cloud Task ID from headers.
|
69
|
+
#
|
70
|
+
# @return [String] The task ID.
|
71
|
+
#
|
72
|
+
def task_id
|
73
|
+
request.headers[Cloudtasker::Config::TASK_ID_HEADER]
|
74
|
+
end
|
75
|
+
|
67
76
|
#
|
68
77
|
# Authenticate incoming requests using a bearer token
|
69
78
|
#
|
data/cloudtasker.gemspec
CHANGED
@@ -15,8 +15,6 @@ Gem::Specification.new do |spec|
|
|
15
15
|
spec.homepage = 'https://github.com/keypup-io/cloudtasker'
|
16
16
|
spec.license = 'MIT'
|
17
17
|
|
18
|
-
# spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
|
19
|
-
|
20
18
|
spec.metadata['homepage_uri'] = spec.homepage
|
21
19
|
spec.metadata['source_code_uri'] = 'https://github.com/keypup-io/cloudtasker'
|
22
20
|
spec.metadata['changelog_uri'] = 'https://github.com/keypup-io/cloudtasker/master/tree/CHANGELOG.md'
|
@@ -33,9 +31,10 @@ Gem::Specification.new do |spec|
|
|
33
31
|
spec.add_dependency 'activesupport'
|
34
32
|
spec.add_dependency 'connection_pool'
|
35
33
|
spec.add_dependency 'fugit'
|
36
|
-
spec.add_dependency 'google-cloud-tasks'
|
34
|
+
spec.add_dependency 'google-cloud-tasks', '~> 1.0'
|
37
35
|
spec.add_dependency 'jwt'
|
38
36
|
spec.add_dependency 'redis'
|
37
|
+
spec.add_dependency 'retriable'
|
39
38
|
|
40
39
|
spec.add_development_dependency 'appraisal'
|
41
40
|
spec.add_development_dependency 'bundler', '~> 2.0'
|
@@ -44,6 +43,7 @@ Gem::Specification.new do |spec|
|
|
44
43
|
spec.add_development_dependency 'rspec', '~> 3.0'
|
45
44
|
spec.add_development_dependency 'rubocop', '0.76.0'
|
46
45
|
spec.add_development_dependency 'rubocop-rspec', '1.37.0'
|
46
|
+
spec.add_development_dependency 'semantic_logger'
|
47
47
|
spec.add_development_dependency 'timecop'
|
48
48
|
spec.add_development_dependency 'webmock'
|
49
49
|
|
data/docs/UNIQUE_JOBS.md
CHANGED
@@ -81,6 +81,68 @@ Below is the list of available conflict strategies can be specified through the
|
|
81
81
|
| `raise` | All locks | A `Cloudtasker::UniqueJob::LockError` will be raised when a conflict occurs |
|
82
82
|
| `reschedule` | `while_executing` | The job will be rescheduled 5 seconds later when a conflict occurs |
|
83
83
|
|
84
|
+
## Lock Time To Live (TTL) & deadlocks
|
85
|
+
**Note**: Lock TTL has been introduced in `v0.10.rc6`
|
86
|
+
|
87
|
+
To make jobs unique Cloudtasker sets a lock key - a hash of class name + job arguments - in Redis. Unique crash situations may lead to lock keys not being cleaned up when jobs complete - e.g. Redis crash with rollback from last known state on disk. Situations like these may lead to having a unique job deadlock: jobs with the same class and arguments would stop being processed because they're unable to acquire a lock that will never be cleaned up.
|
88
|
+
|
89
|
+
In order to prevent deadlocks Cloudtasker configures lock keys to automatically expire in Redis after `job schedule time + lock_ttl (default: 10 minutes)`. This forced expiration ensures that deadlocks eventually get cleaned up shortly after the expected run time of a job.
|
90
|
+
|
91
|
+
The `lock_ttl (default: 10 minutes)` duration represent the expected max duration of the job. The default 10 minutes value was chosen because it's twice the default request timeout value in Cloud Run. This usually leaves enough room for queue lag (5 minutes) + job processing (5 minutes).
|
92
|
+
|
93
|
+
Queue lag is certainly the most unpredictable factor here. Job processing time is less of a factor. Jobs running for more than 5 minutes should be split into sub-jobs to limit invocation time over HTTP anyway. Cloudtasker [batch jobs](BATCH_JOBS.md) can help split big jobs into sub-jobs in an atomic way.
|
94
|
+
|
95
|
+
The default lock key expiration of `job schedule time + 10 minutes` may look aggressive but it is a better choice than having real-time jobs stuck for X hours after a crash recovery.
|
96
|
+
|
97
|
+
We **strongly recommend** adapting the `lock_ttl` option either globally or for each worker based on expected queue lag and job duration.
|
98
|
+
|
99
|
+
**Example 1**: Global configuration
|
100
|
+
```ruby
|
101
|
+
# config/initializers/cloudtasker.rb
|
102
|
+
|
103
|
+
# General Cloudtasker configuration
|
104
|
+
Cloudtasker.configure do |config|
|
105
|
+
# ...
|
106
|
+
end
|
107
|
+
|
108
|
+
# Unique job extension configuration
|
109
|
+
Cloudtasker::UniqueJob.configure do |config|
|
110
|
+
config.lock_ttl = 3 * 60 # 3 minutes
|
111
|
+
end
|
112
|
+
```
|
113
|
+
|
114
|
+
**Example 2**: Worker-level - fast
|
115
|
+
```ruby
|
116
|
+
# app/workers/realtime_worker_on_fast_queue.rb
|
117
|
+
|
118
|
+
class RealtimeWorkerOnFastQueue
|
119
|
+
include Cloudtasker::Worker
|
120
|
+
|
121
|
+
# Ensure lock is removed 30 seconds after schedule time
|
122
|
+
cloudtasker_options lock: :until_executing, lock_ttl: 30
|
123
|
+
|
124
|
+
def perform(arg1, arg2)
|
125
|
+
# ...
|
126
|
+
end
|
127
|
+
end
|
128
|
+
```
|
129
|
+
|
130
|
+
**Example 3**: Worker-level - slow
|
131
|
+
```ruby
|
132
|
+
# app/workers/non_critical_worker_on_slow_queue.rb
|
133
|
+
|
134
|
+
class NonCriticalWorkerOnSlowQueue
|
135
|
+
include Cloudtasker::Worker
|
136
|
+
|
137
|
+
# Ensure lock is removed 24 hours after schedule time
|
138
|
+
cloudtasker_options lock: :until_executing, lock_ttl: 3600 * 24
|
139
|
+
|
140
|
+
def perform(arg1, arg2)
|
141
|
+
# ...
|
142
|
+
end
|
143
|
+
end
|
144
|
+
```
|
145
|
+
|
84
146
|
## Configuring unique arguments
|
85
147
|
|
86
148
|
By default Cloudtasker considers all job arguments to evaluate the uniqueness of a job. This behaviour is configurable per worker by defining a `unique_args` method on the worker itself returning the list of args defining uniqueness.
|
@@ -1,5 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
+
require 'google/cloud/tasks'
|
4
|
+
require 'retriable'
|
5
|
+
|
3
6
|
module Cloudtasker
|
4
7
|
module Backend
|
5
8
|
# Manage tasks pushed to GCP Cloud Task
|
@@ -113,9 +116,10 @@ module Cloudtasker
|
|
113
116
|
# @return [Cloudtasker::Backend::GoogleCloudTask, nil] The retrieved task.
|
114
117
|
#
|
115
118
|
def self.find(id)
|
116
|
-
resp = client.get_task(id)
|
119
|
+
resp = with_gax_retries { client.get_task(id) }
|
117
120
|
resp ? new(resp) : nil
|
118
|
-
rescue Google::Gax::RetryError
|
121
|
+
rescue Google::Gax::RetryError, Google::Gax::NotFoundError, GRPC::NotFound
|
122
|
+
# The ID does not exist
|
119
123
|
nil
|
120
124
|
end
|
121
125
|
|
@@ -133,10 +137,8 @@ module Cloudtasker
|
|
133
137
|
relative_queue = payload.delete(:queue)
|
134
138
|
|
135
139
|
# Create task
|
136
|
-
resp = client.create_task(queue_path(relative_queue), payload)
|
140
|
+
resp = with_gax_retries { client.create_task(queue_path(relative_queue), payload) }
|
137
141
|
resp ? new(resp) : nil
|
138
|
-
rescue Google::Gax::RetryError
|
139
|
-
nil
|
140
142
|
end
|
141
143
|
|
142
144
|
#
|
@@ -145,11 +147,21 @@ module Cloudtasker
|
|
145
147
|
# @param [String] id The id of the task.
|
146
148
|
#
|
147
149
|
def self.delete(id)
|
148
|
-
client.delete_task(id)
|
149
|
-
rescue Google::Gax::
|
150
|
+
with_gax_retries { client.delete_task(id) }
|
151
|
+
rescue Google::Gax::RetryError, Google::Gax::NotFoundError, GRPC::NotFound, Google::Gax::PermissionDeniedError
|
152
|
+
# The ID does not exist
|
150
153
|
nil
|
151
154
|
end
|
152
155
|
|
156
|
+
#
|
157
|
+
# Helper method encapsulating the retry strategy for GAX calls
|
158
|
+
#
|
159
|
+
def self.with_gax_retries
|
160
|
+
Retriable.retriable(on: [Google::Gax::UnavailableError], tries: 3) do
|
161
|
+
yield
|
162
|
+
end
|
163
|
+
end
|
164
|
+
|
153
165
|
#
|
154
166
|
# Build a new instance of the class.
|
155
167
|
#
|
@@ -1,7 +1,5 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
|
-
require 'cloudtasker/redis_client'
|
4
|
-
|
5
3
|
module Cloudtasker
|
6
4
|
module Backend
|
7
5
|
# Manage local tasks pushed to memory.
|
@@ -10,6 +8,15 @@ module Cloudtasker
|
|
10
8
|
attr_accessor :job_retries
|
11
9
|
attr_reader :id, :http_request, :schedule_time, :queue
|
12
10
|
|
11
|
+
#
|
12
|
+
# Return true if we are in test inline execution mode.
|
13
|
+
#
|
14
|
+
# @return [Boolean] True if inline mode enabled.
|
15
|
+
#
|
16
|
+
def self.inline_mode?
|
17
|
+
defined?(Cloudtasker::Testing) && Cloudtasker::Testing.inline?
|
18
|
+
end
|
19
|
+
|
13
20
|
#
|
14
21
|
# Return the task queue. A worker class name
|
15
22
|
#
|
@@ -59,7 +66,7 @@ module Cloudtasker
|
|
59
66
|
queue << task
|
60
67
|
|
61
68
|
# Execute task immediately if in testing and inline mode enabled
|
62
|
-
task.execute if
|
69
|
+
task.execute if inline_mode?
|
63
70
|
|
64
71
|
task
|
65
72
|
end
|
@@ -153,13 +160,18 @@ module Cloudtasker
|
|
153
160
|
#
|
154
161
|
def execute
|
155
162
|
# Execute worker
|
156
|
-
|
163
|
+
worker_payload = payload.merge(job_retries: job_retries, task_id: id)
|
164
|
+
resp = WorkerHandler.with_worker_handling(worker_payload, &:execute)
|
157
165
|
|
158
166
|
# Delete task
|
159
167
|
self.class.delete(id)
|
160
168
|
resp
|
161
|
-
rescue
|
169
|
+
rescue DeadWorkerError => e
|
170
|
+
self.class.delete(id)
|
171
|
+
raise(e) if self.class.inline_mode?
|
172
|
+
rescue StandardError => e
|
162
173
|
self.job_retries += 1
|
174
|
+
raise(e) if self.class.inline_mode?
|
163
175
|
end
|
164
176
|
|
165
177
|
#
|
@@ -247,7 +247,8 @@ module Cloudtasker
|
|
247
247
|
uri = URI(http_request[:url])
|
248
248
|
req = Net::HTTP::Post.new(uri.path, http_request[:headers])
|
249
249
|
|
250
|
-
# Add
|
250
|
+
# Add task headers
|
251
|
+
req[Cloudtasker::Config::TASK_ID_HEADER] = id
|
251
252
|
req[Cloudtasker::Config::RETRY_HEADER] = retries
|
252
253
|
|
253
254
|
# Set job payload
|
data/lib/cloudtasker/config.rb
CHANGED
@@ -25,6 +25,9 @@ module Cloudtasker
|
|
25
25
|
#
|
26
26
|
RETRY_HEADER = 'X-CloudTasks-TaskRetryCount'
|
27
27
|
|
28
|
+
# Cloud Task ID header
|
29
|
+
TASK_ID_HEADER = 'X-CloudTasks-TaskName'
|
30
|
+
|
28
31
|
# Content-Transfer-Encoding header in Cloud Task responses
|
29
32
|
ENCODING_HEADER = 'Content-Transfer-Encoding'
|
30
33
|
|
data/lib/cloudtasker/cron/job.rb
CHANGED
@@ -4,15 +4,10 @@ require 'fugit'
|
|
4
4
|
|
5
5
|
module Cloudtasker
|
6
6
|
module Cron
|
7
|
-
# TODO: handle deletion of cron jobs
|
8
|
-
#
|
9
7
|
# Manage cron jobs
|
10
8
|
class Job
|
11
9
|
attr_reader :worker
|
12
10
|
|
13
|
-
# Key Namespace used for object saved under this class
|
14
|
-
SUB_NAMESPACE = 'job'
|
15
|
-
|
16
11
|
#
|
17
12
|
# Build a new instance of the class
|
18
13
|
#
|
@@ -3,3 +3,30 @@
|
|
3
3
|
require_relative 'unique_job/middleware'
|
4
4
|
|
5
5
|
Cloudtasker::UniqueJob::Middleware.configure
|
6
|
+
|
7
|
+
module Cloudtasker
|
8
|
+
# UniqueJob configurator
|
9
|
+
module UniqueJob
|
10
|
+
# The maximum duration a lock can remain in place
|
11
|
+
# after schedule time.
|
12
|
+
DEFAULT_LOCK_TTL = 10 * 60 # 10 minutes
|
13
|
+
|
14
|
+
class << self
|
15
|
+
attr_writer :lock_ttl
|
16
|
+
|
17
|
+
# Configure the middleware
|
18
|
+
def configure
|
19
|
+
yield(self)
|
20
|
+
end
|
21
|
+
|
22
|
+
#
|
23
|
+
# Return the max TTL for locks
|
24
|
+
#
|
25
|
+
# @return [Integer] The lock TTL.
|
26
|
+
#
|
27
|
+
def lock_ttl
|
28
|
+
@lock_ttl || DEFAULT_LOCK_TTL
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
32
|
+
end
|
@@ -5,21 +5,19 @@ module Cloudtasker
|
|
5
5
|
# Wrapper class for Cloudtasker::Worker delegating to lock
|
6
6
|
# and conflict strategies
|
7
7
|
class Job
|
8
|
-
attr_reader :worker
|
8
|
+
attr_reader :worker, :call_opts
|
9
9
|
|
10
10
|
# The default lock strategy to use. Defaults to "no lock".
|
11
11
|
DEFAULT_LOCK = UniqueJob::Lock::NoOp
|
12
12
|
|
13
|
-
# Key Namespace used for object saved under this class
|
14
|
-
SUB_NAMESPACE = 'job'
|
15
|
-
|
16
13
|
#
|
17
14
|
# Build a new instance of the class.
|
18
15
|
#
|
19
16
|
# @param [Cloudtasker::Worker] worker The worker at hand
|
20
17
|
#
|
21
|
-
def initialize(worker)
|
18
|
+
def initialize(worker, **kwargs)
|
22
19
|
@worker = worker
|
20
|
+
@call_opts = kwargs
|
23
21
|
end
|
24
22
|
|
25
23
|
#
|
@@ -31,6 +29,43 @@ module Cloudtasker
|
|
31
29
|
worker.class.cloudtasker_options_hash
|
32
30
|
end
|
33
31
|
|
32
|
+
#
|
33
|
+
# Return the Time To Live (TTL) that should be set in Redis for
|
34
|
+
# the lock key. Having a TTL on lock keys ensures that jobs
|
35
|
+
# do not end up stuck due to a dead lock situation.
|
36
|
+
#
|
37
|
+
# The TTL is calculated using schedule time + expected
|
38
|
+
# max job duration.
|
39
|
+
#
|
40
|
+
# The expected max job duration is set to 10 minutes by default.
|
41
|
+
# This value was chosen because it's twice the default request timeout
|
42
|
+
# value in Cloud Run. This leaves enough room for queue lag (5 minutes)
|
43
|
+
# + job processing (5 minutes).
|
44
|
+
#
|
45
|
+
# Queue lag is certainly the most unpredictable factor here.
|
46
|
+
# Job processing time is less of a factor. Jobs running for more than 5 minutes
|
47
|
+
# should be split into sub-jobs to limit invocation time over HTTP. Cloudtasker batch
|
48
|
+
# jobs can help achieve that if you need to make one big job split into sub-jobs "atomic".
|
49
|
+
#
|
50
|
+
# The default lock key expiration of "time_at + 10 minutes" may look aggressive but it
|
51
|
+
# is still a better choice than potentially having real-time jobs stuck for X hours.
|
52
|
+
#
|
53
|
+
# The expected max job duration can be configured via the `lock_ttl`
|
54
|
+
# option on the job itself.
|
55
|
+
#
|
56
|
+
# @return [Integer] The TTL in seconds
|
57
|
+
#
|
58
|
+
def lock_ttl
|
59
|
+
now = Time.now.to_i
|
60
|
+
|
61
|
+
# Get scheduled at and lock duration
|
62
|
+
scheduled_at = [call_opts[:time_at].to_i, now].compact.max
|
63
|
+
lock_duration = (options[:lock_ttl] || Cloudtasker::UniqueJob.lock_ttl).to_i
|
64
|
+
|
65
|
+
# Return TTL
|
66
|
+
scheduled_at + lock_duration - now
|
67
|
+
end
|
68
|
+
|
34
69
|
#
|
35
70
|
# Return the instantiated lock.
|
36
71
|
#
|
@@ -121,7 +156,7 @@ module Cloudtasker
|
|
121
156
|
raise(LockError, locked_id) if locked_id && locked_id != id
|
122
157
|
|
123
158
|
# Take job lock if the lock is currently free
|
124
|
-
redis.set(unique_gid, id) unless locked_id
|
159
|
+
redis.set(unique_gid, id, ex: lock_ttl) unless locked_id
|
125
160
|
end
|
126
161
|
end
|
127
162
|
|