cloudtasker 0.10.rc5 → 0.10.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -51,19 +51,28 @@ module Cloudtasker
51
51
  end
52
52
 
53
53
  # Return content parsed as JSON and add job retries count
54
- JSON.parse(content).merge(job_retries: job_retries)
54
+ JSON.parse(content).merge(job_retries: job_retries, task_id: task_id)
55
55
  end
56
56
  end
57
57
 
58
58
  #
59
59
  # Extract the number of times this task failed at runtime.
60
60
  #
61
- # @return [Integer] The number of failures
61
+ # @return [Integer] The number of failures.
62
62
  #
63
63
  def job_retries
64
64
  request.headers[Cloudtasker::Config::RETRY_HEADER].to_i
65
65
  end
66
66
 
67
+ #
68
+ # Return the Google Cloud Task ID from headers.
69
+ #
70
+ # @return [String] The task ID.
71
+ #
72
+ def task_id
73
+ request.headers[Cloudtasker::Config::TASK_ID_HEADER]
74
+ end
75
+
67
76
  #
68
77
  # Authenticate incoming requests using a bearer token
69
78
  #
@@ -15,8 +15,6 @@ Gem::Specification.new do |spec|
15
15
  spec.homepage = 'https://github.com/keypup-io/cloudtasker'
16
16
  spec.license = 'MIT'
17
17
 
18
- # spec.metadata["allowed_push_host"] = "TODO: Set to 'http://mygemserver.com'"
19
-
20
18
  spec.metadata['homepage_uri'] = spec.homepage
21
19
  spec.metadata['source_code_uri'] = 'https://github.com/keypup-io/cloudtasker'
22
20
  spec.metadata['changelog_uri'] = 'https://github.com/keypup-io/cloudtasker/master/tree/CHANGELOG.md'
@@ -33,9 +31,10 @@ Gem::Specification.new do |spec|
33
31
  spec.add_dependency 'activesupport'
34
32
  spec.add_dependency 'connection_pool'
35
33
  spec.add_dependency 'fugit'
36
- spec.add_dependency 'google-cloud-tasks'
34
+ spec.add_dependency 'google-cloud-tasks', '~> 1.0'
37
35
  spec.add_dependency 'jwt'
38
36
  spec.add_dependency 'redis'
37
+ spec.add_dependency 'retriable'
39
38
 
40
39
  spec.add_development_dependency 'appraisal'
41
40
  spec.add_development_dependency 'bundler', '~> 2.0'
@@ -44,6 +43,7 @@ Gem::Specification.new do |spec|
44
43
  spec.add_development_dependency 'rspec', '~> 3.0'
45
44
  spec.add_development_dependency 'rubocop', '0.76.0'
46
45
  spec.add_development_dependency 'rubocop-rspec', '1.37.0'
46
+ spec.add_development_dependency 'semantic_logger'
47
47
  spec.add_development_dependency 'timecop'
48
48
  spec.add_development_dependency 'webmock'
49
49
 
@@ -81,6 +81,68 @@ Below is the list of available conflict strategies can be specified through the
81
81
  | `raise` | All locks | A `Cloudtasker::UniqueJob::LockError` will be raised when a conflict occurs |
82
82
  | `reschedule` | `while_executing` | The job will be rescheduled 5 seconds later when a conflict occurs |
83
83
 
84
+ ## Lock Time To Live (TTL) & deadlocks
85
+ **Note**: Lock TTL has been introduced in `v0.10.rc6`
86
+
87
+ To make jobs unique Cloudtasker sets a lock key - a hash of class name + job arguments - in Redis. Unique crash situations may lead to lock keys not being cleaned up when jobs complete - e.g. Redis crash with rollback from last known state on disk. Situations like these may lead to having a unique job deadlock: jobs with the same class and arguments would stop being processed because they're unable to acquire a lock that will never be cleaned up.
88
+
89
+ In order to prevent deadlocks Cloudtasker configures lock keys to automatically expire in Redis after `job schedule time + lock_ttl (default: 10 minutes)`. This forced expiration ensures that deadlocks eventually get cleaned up shortly after the expected run time of a job.
90
+
91
+ The `lock_ttl (default: 10 minutes)` duration represent the expected max duration of the job. The default 10 minutes value was chosen because it's twice the default request timeout value in Cloud Run. This usually leaves enough room for queue lag (5 minutes) + job processing (5 minutes).
92
+
93
+ Queue lag is certainly the most unpredictable factor here. Job processing time is less of a factor. Jobs running for more than 5 minutes should be split into sub-jobs to limit invocation time over HTTP anyway. Cloudtasker [batch jobs](BATCH_JOBS.md) can help split big jobs into sub-jobs in an atomic way.
94
+
95
+ The default lock key expiration of `job schedule time + 10 minutes` may look aggressive but it is a better choice than having real-time jobs stuck for X hours after a crash recovery.
96
+
97
+ We **strongly recommend** adapting the `lock_ttl` option either globally or for each worker based on expected queue lag and job duration.
98
+
99
+ **Example 1**: Global configuration
100
+ ```ruby
101
+ # config/initializers/cloudtasker.rb
102
+
103
+ # General Cloudtasker configuration
104
+ Cloudtasker.configure do |config|
105
+ # ...
106
+ end
107
+
108
+ # Unique job extension configuration
109
+ Cloudtasker::UniqueJob.configure do |config|
110
+ config.lock_ttl = 3 * 60 # 3 minutes
111
+ end
112
+ ```
113
+
114
+ **Example 2**: Worker-level - fast
115
+ ```ruby
116
+ # app/workers/realtime_worker_on_fast_queue.rb
117
+
118
+ class RealtimeWorkerOnFastQueue
119
+ include Cloudtasker::Worker
120
+
121
+ # Ensure lock is removed 30 seconds after schedule time
122
+ cloudtasker_options lock: :until_executing, lock_ttl: 30
123
+
124
+ def perform(arg1, arg2)
125
+ # ...
126
+ end
127
+ end
128
+ ```
129
+
130
+ **Example 3**: Worker-level - slow
131
+ ```ruby
132
+ # app/workers/non_critical_worker_on_slow_queue.rb
133
+
134
+ class NonCriticalWorkerOnSlowQueue
135
+ include Cloudtasker::Worker
136
+
137
+ # Ensure lock is removed 24 hours after schedule time
138
+ cloudtasker_options lock: :until_executing, lock_ttl: 3600 * 24
139
+
140
+ def perform(arg1, arg2)
141
+ # ...
142
+ end
143
+ end
144
+ ```
145
+
84
146
  ## Configuring unique arguments
85
147
 
86
148
  By default Cloudtasker considers all job arguments to evaluate the uniqueness of a job. This behaviour is configurable per worker by defining a `unique_args` method on the worker itself returning the list of args defining uniqueness.
@@ -0,0 +1,7 @@
1
+ # This file was generated by Appraisal
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "semantic_logger", "3.4.1"
6
+
7
+ gemspec path: "../"
@@ -0,0 +1,7 @@
1
+ # This file was generated by Appraisal
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "semantic_logger", "4.6.1"
6
+
7
+ gemspec path: "../"
@@ -0,0 +1,7 @@
1
+ # This file was generated by Appraisal
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "semantic_logger", "4.7.0"
6
+
7
+ gemspec path: "../"
@@ -0,0 +1,7 @@
1
+ # This file was generated by Appraisal
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "semantic_logger", "4.7.2"
6
+
7
+ gemspec path: "../"
@@ -0,0 +1,7 @@
1
+ # This file was generated by Appraisal
2
+
3
+ source "https://rubygems.org"
4
+
5
+ gem "semantic_logger", "4.7.2"
6
+
7
+ gemspec path: "../"
@@ -1,5 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'google/cloud/tasks'
4
+ require 'retriable'
5
+
3
6
  module Cloudtasker
4
7
  module Backend
5
8
  # Manage tasks pushed to GCP Cloud Task
@@ -113,9 +116,10 @@ module Cloudtasker
113
116
  # @return [Cloudtasker::Backend::GoogleCloudTask, nil] The retrieved task.
114
117
  #
115
118
  def self.find(id)
116
- resp = client.get_task(id)
119
+ resp = with_gax_retries { client.get_task(id) }
117
120
  resp ? new(resp) : nil
118
- rescue Google::Gax::RetryError
121
+ rescue Google::Gax::RetryError, Google::Gax::NotFoundError, GRPC::NotFound
122
+ # The ID does not exist
119
123
  nil
120
124
  end
121
125
 
@@ -133,10 +137,8 @@ module Cloudtasker
133
137
  relative_queue = payload.delete(:queue)
134
138
 
135
139
  # Create task
136
- resp = client.create_task(queue_path(relative_queue), payload)
140
+ resp = with_gax_retries { client.create_task(queue_path(relative_queue), payload) }
137
141
  resp ? new(resp) : nil
138
- rescue Google::Gax::RetryError
139
- nil
140
142
  end
141
143
 
142
144
  #
@@ -145,11 +147,21 @@ module Cloudtasker
145
147
  # @param [String] id The id of the task.
146
148
  #
147
149
  def self.delete(id)
148
- client.delete_task(id)
149
- rescue Google::Gax::NotFoundError, Google::Gax::RetryError, GRPC::NotFound, Google::Gax::PermissionDeniedError
150
+ with_gax_retries { client.delete_task(id) }
151
+ rescue Google::Gax::RetryError, Google::Gax::NotFoundError, GRPC::NotFound, Google::Gax::PermissionDeniedError
152
+ # The ID does not exist
150
153
  nil
151
154
  end
152
155
 
156
+ #
157
+ # Helper method encapsulating the retry strategy for GAX calls
158
+ #
159
+ def self.with_gax_retries
160
+ Retriable.retriable(on: [Google::Gax::UnavailableError], tries: 3) do
161
+ yield
162
+ end
163
+ end
164
+
153
165
  #
154
166
  # Build a new instance of the class.
155
167
  #
@@ -1,7 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'cloudtasker/redis_client'
4
-
5
3
  module Cloudtasker
6
4
  module Backend
7
5
  # Manage local tasks pushed to memory.
@@ -10,6 +8,15 @@ module Cloudtasker
10
8
  attr_accessor :job_retries
11
9
  attr_reader :id, :http_request, :schedule_time, :queue
12
10
 
11
+ #
12
+ # Return true if we are in test inline execution mode.
13
+ #
14
+ # @return [Boolean] True if inline mode enabled.
15
+ #
16
+ def self.inline_mode?
17
+ defined?(Cloudtasker::Testing) && Cloudtasker::Testing.inline?
18
+ end
19
+
13
20
  #
14
21
  # Return the task queue. A worker class name
15
22
  #
@@ -59,7 +66,7 @@ module Cloudtasker
59
66
  queue << task
60
67
 
61
68
  # Execute task immediately if in testing and inline mode enabled
62
- task.execute if defined?(Cloudtasker::Testing) && Cloudtasker::Testing.inline?
69
+ task.execute if inline_mode?
63
70
 
64
71
  task
65
72
  end
@@ -153,13 +160,18 @@ module Cloudtasker
153
160
  #
154
161
  def execute
155
162
  # Execute worker
156
- resp = WorkerHandler.with_worker_handling(payload, &:execute)
163
+ worker_payload = payload.merge(job_retries: job_retries, task_id: id)
164
+ resp = WorkerHandler.with_worker_handling(worker_payload, &:execute)
157
165
 
158
166
  # Delete task
159
167
  self.class.delete(id)
160
168
  resp
161
- rescue StandardError
169
+ rescue DeadWorkerError => e
170
+ self.class.delete(id)
171
+ raise(e) if self.class.inline_mode?
172
+ rescue StandardError => e
162
173
  self.job_retries += 1
174
+ raise(e) if self.class.inline_mode?
163
175
  end
164
176
 
165
177
  #
@@ -247,7 +247,8 @@ module Cloudtasker
247
247
  uri = URI(http_request[:url])
248
248
  req = Net::HTTP::Post.new(uri.path, http_request[:headers])
249
249
 
250
- # Add retries header
250
+ # Add task headers
251
+ req[Cloudtasker::Config::TASK_ID_HEADER] = id
251
252
  req[Cloudtasker::Config::RETRY_HEADER] = retries
252
253
 
253
254
  # Set job payload
@@ -5,7 +5,7 @@ module Cloudtasker
5
5
  module Middleware
6
6
  # Server middleware, invoked when jobs are executed
7
7
  class Server
8
- def call(worker)
8
+ def call(worker, **_kwargs)
9
9
  Job.for(worker).execute { yield }
10
10
  end
11
11
  end
@@ -25,6 +25,9 @@ module Cloudtasker
25
25
  #
26
26
  RETRY_HEADER = 'X-CloudTasks-TaskRetryCount'
27
27
 
28
+ # Cloud Task ID header
29
+ TASK_ID_HEADER = 'X-CloudTasks-TaskName'
30
+
28
31
  # Content-Transfer-Encoding header in Cloud Task responses
29
32
  ENCODING_HEADER = 'Content-Transfer-Encoding'
30
33
 
@@ -4,15 +4,10 @@ require 'fugit'
4
4
 
5
5
  module Cloudtasker
6
6
  module Cron
7
- # TODO: handle deletion of cron jobs
8
- #
9
7
  # Manage cron jobs
10
8
  class Job
11
9
  attr_reader :worker
12
10
 
13
- # Key Namespace used for object saved under this class
14
- SUB_NAMESPACE = 'job'
15
-
16
11
  #
17
12
  # Build a new instance of the class
18
13
  #
@@ -5,7 +5,7 @@ module Cloudtasker
5
5
  module Middleware
6
6
  # Server middleware, invoked when jobs are executed
7
7
  class Server
8
- def call(worker)
8
+ def call(worker, **_kwargs)
9
9
  Job.new(worker).execute { yield }
10
10
  end
11
11
  end
@@ -9,9 +9,6 @@ module Cloudtasker
9
9
  class Schedule
10
10
  attr_accessor :id, :cron, :worker, :task_id, :job_id, :queue, :args
11
11
 
12
- # Key Namespace used for object saved under this class
13
- SUB_NAMESPACE = 'schedule'
14
-
15
12
  #
16
13
  # Return the redis client.
17
14
  #
@@ -3,3 +3,30 @@
3
3
  require_relative 'unique_job/middleware'
4
4
 
5
5
  Cloudtasker::UniqueJob::Middleware.configure
6
+
7
+ module Cloudtasker
8
+ # UniqueJob configurator
9
+ module UniqueJob
10
+ # The maximum duration a lock can remain in place
11
+ # after schedule time.
12
+ DEFAULT_LOCK_TTL = 10 * 60 # 10 minutes
13
+
14
+ class << self
15
+ attr_writer :lock_ttl
16
+
17
+ # Configure the middleware
18
+ def configure
19
+ yield(self)
20
+ end
21
+
22
+ #
23
+ # Return the max TTL for locks
24
+ #
25
+ # @return [Integer] The lock TTL.
26
+ #
27
+ def lock_ttl
28
+ @lock_ttl || DEFAULT_LOCK_TTL
29
+ end
30
+ end
31
+ end
32
+ end
@@ -5,21 +5,19 @@ module Cloudtasker
5
5
  # Wrapper class for Cloudtasker::Worker delegating to lock
6
6
  # and conflict strategies
7
7
  class Job
8
- attr_reader :worker
8
+ attr_reader :worker, :call_opts
9
9
 
10
10
  # The default lock strategy to use. Defaults to "no lock".
11
11
  DEFAULT_LOCK = UniqueJob::Lock::NoOp
12
12
 
13
- # Key Namespace used for object saved under this class
14
- SUB_NAMESPACE = 'job'
15
-
16
13
  #
17
14
  # Build a new instance of the class.
18
15
  #
19
16
  # @param [Cloudtasker::Worker] worker The worker at hand
20
17
  #
21
- def initialize(worker)
18
+ def initialize(worker, **kwargs)
22
19
  @worker = worker
20
+ @call_opts = kwargs
23
21
  end
24
22
 
25
23
  #
@@ -31,6 +29,43 @@ module Cloudtasker
31
29
  worker.class.cloudtasker_options_hash
32
30
  end
33
31
 
32
+ #
33
+ # Return the Time To Live (TTL) that should be set in Redis for
34
+ # the lock key. Having a TTL on lock keys ensures that jobs
35
+ # do not end up stuck due to a dead lock situation.
36
+ #
37
+ # The TTL is calculated using schedule time + expected
38
+ # max job duration.
39
+ #
40
+ # The expected max job duration is set to 10 minutes by default.
41
+ # This value was chosen because it's twice the default request timeout
42
+ # value in Cloud Run. This leaves enough room for queue lag (5 minutes)
43
+ # + job processing (5 minutes).
44
+ #
45
+ # Queue lag is certainly the most unpredictable factor here.
46
+ # Job processing time is less of a factor. Jobs running for more than 5 minutes
47
+ # should be split into sub-jobs to limit invocation time over HTTP. Cloudtasker batch
48
+ # jobs can help achieve that if you need to make one big job split into sub-jobs "atomic".
49
+ #
50
+ # The default lock key expiration of "time_at + 10 minutes" may look aggressive but it
51
+ # is still a better choice than potentially having real-time jobs stuck for X hours.
52
+ #
53
+ # The expected max job duration can be configured via the `lock_ttl`
54
+ # option on the job itself.
55
+ #
56
+ # @return [Integer] The TTL in seconds
57
+ #
58
+ def lock_ttl
59
+ now = Time.now.to_i
60
+
61
+ # Get scheduled at and lock duration
62
+ scheduled_at = [call_opts[:time_at].to_i, now].compact.max
63
+ lock_duration = (options[:lock_ttl] || Cloudtasker::UniqueJob.lock_ttl).to_i
64
+
65
+ # Return TTL
66
+ scheduled_at + lock_duration - now
67
+ end
68
+
34
69
  #
35
70
  # Return the instantiated lock.
36
71
  #
@@ -121,7 +156,7 @@ module Cloudtasker
121
156
  raise(LockError, locked_id) if locked_id && locked_id != id
122
157
 
123
158
  # Take job lock if the lock is currently free
124
- redis.set(unique_gid, id) unless locked_id
159
+ redis.set(unique_gid, id, ex: lock_ttl) unless locked_id
125
160
  end
126
161
  end
127
162