sqewer 5.1.1 → 6.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +342 -7
- data/lib/sqewer.rb +2 -2
- data/lib/sqewer/connection.rb +1 -1
- data/lib/sqewer/version.rb +1 -1
- data/sqewer.gemspec +1 -1
- metadata +5 -8
- data/ACTIVE_JOB.md +0 -64
- data/DETAILS.md +0 -233
- data/FAQ.md +0 -50
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 265bbc0bc13eeef0b5e99b04c509d99464e7d236
|
4
|
+
data.tar.gz: 48f95a807d92fc31956524b7ff07892851cf4369
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6e6329fcbc8e9ba24adc4f98231a6875c94536de2d74c72fd3b670f3b0b04f27466e7776e3ce9d63cb7631f4ce82bc2143d40df96d4c3f5f3659ce5d6ae90b8c
|
7
|
+
data.tar.gz: '0319a0135413668649e37b3f0ad449eff0d857d8e5cae912111385d02bc1fa5090e4f0749388eb8fb6d0fb8417fb78102022c05efaea0e87443700353eafbaf0'
|
data/README.md
CHANGED
@@ -48,19 +48,354 @@ The messages will only be deleted from SQS once the job execution completes with
|
|
48
48
|
|
49
49
|
## Requirements
|
50
50
|
|
51
|
-
Ruby 2.1+, version 2 of the AWS SDK.
|
51
|
+
Ruby 2.1+, version 2 of the AWS SDK. You can also run Sqewer backed by a SQLite database file, which can be handy for development situations.
|
52
52
|
|
53
|
-
##
|
53
|
+
## Job storage
|
54
54
|
|
55
|
-
|
55
|
+
Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
|
56
56
|
|
57
|
-
|
57
|
+
{"_job_class": "MyJob", "_job_params": null}
|
58
58
|
|
59
|
-
|
59
|
+
When this ticket is being picked up by the worker, the worker will do the following:
|
60
60
|
|
61
|
-
|
61
|
+
job = MyJob.new
|
62
|
+
job.run
|
62
63
|
|
63
|
-
|
64
|
+
So the smallest job class has to be instantiatable, and has to respond to the `run` message.
|
65
|
+
|
66
|
+
## Jobs with arguments and parameters
|
67
|
+
|
68
|
+
Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
|
69
|
+
directly translated to keyword arguments of the job constructor. With a job ticket like this:
|
70
|
+
|
71
|
+
{
|
72
|
+
"_job_class": "MyJob",
|
73
|
+
"_job_params": {"ids": [1,2,3]}
|
74
|
+
}
|
75
|
+
|
76
|
+
the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
|
77
|
+
|
78
|
+
job = MyJob.new(ids: [1,2,3])
|
79
|
+
job.run
|
80
|
+
|
81
|
+
Note that at this point only arguments that are raw JSON types are supported:
|
82
|
+
|
83
|
+
* Hash
|
84
|
+
* Array
|
85
|
+
* Numeric
|
86
|
+
* String
|
87
|
+
* nil/false/true
|
88
|
+
|
89
|
+
If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
|
90
|
+
|
91
|
+
## Jobs spawning dependent jobs
|
92
|
+
|
93
|
+
If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
|
94
|
+
be passed to the `run` method.
|
95
|
+
|
96
|
+
job = MyJob.new(ids: [1,2,3])
|
97
|
+
job.run(execution_context)
|
98
|
+
|
99
|
+
The execution context has some useful methods:
|
100
|
+
|
101
|
+
* `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
|
102
|
+
* `submit!` for submitting more jobs to the same queue
|
103
|
+
|
104
|
+
A job submitting a subsequent job could look like this:
|
105
|
+
|
106
|
+
class MyJob
|
107
|
+
def run(ctx)
|
108
|
+
...
|
109
|
+
ctx.submit!(DeferredCleanupJob.new)
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
## Job submission
|
114
|
+
|
115
|
+
In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
|
116
|
+
include all the keyword arguments needed to instantiate the job when executing. For example:
|
117
|
+
|
118
|
+
class SendMail
|
119
|
+
def initialize(to:, body:)
|
120
|
+
...
|
121
|
+
end
|
122
|
+
|
123
|
+
def run()
|
124
|
+
...
|
125
|
+
end
|
126
|
+
|
127
|
+
def to_h
|
128
|
+
{to: @to, body: @body}
|
129
|
+
end
|
130
|
+
end
|
131
|
+
|
132
|
+
Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
|
133
|
+
|
134
|
+
class SendMail < Ks.strict(:to, :body)
|
135
|
+
def run
|
136
|
+
...
|
137
|
+
end
|
138
|
+
end
|
139
|
+
|
140
|
+
## Job marshaling
|
141
|
+
|
142
|
+
By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
|
143
|
+
override that object if you need to handle job tickets that come from external sources and do not necessarily
|
144
|
+
conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
|
145
|
+
|
146
|
+
class CustomSerializer < Sqewer::Serializer
|
147
|
+
# Overridden so that we can instantiate a custom job
|
148
|
+
# from the AWS notification payload.
|
149
|
+
# Return "nil" and the job will be simply deleted from the queue
|
150
|
+
def unserialize(message_blob)
|
151
|
+
message = JSON.load(message_blob)
|
152
|
+
return if message['Service'] # AWS test
|
153
|
+
return HandleS3Notification.new(message) if message['Records']
|
154
|
+
|
155
|
+
super # as default
|
156
|
+
end
|
157
|
+
end
|
158
|
+
|
159
|
+
Or you can override the serialization method to add some metadata to the job ticket on job submission:
|
160
|
+
|
161
|
+
class CustomSerializer < Sqewer::Serializer
|
162
|
+
def serialize(job_object)
|
163
|
+
json_blob = super
|
164
|
+
parsed = JSON.load(json_blob)
|
165
|
+
parsed['_submitter_host'] = Socket.gethostname
|
166
|
+
JSON.dump(parsed)
|
167
|
+
end
|
168
|
+
end
|
169
|
+
|
170
|
+
If you return `nil` from your `unserialize` method the job will not be executed,
|
171
|
+
but will just be deleted from the SQS queue.
|
172
|
+
|
173
|
+
## Starting and running the worker
|
174
|
+
|
175
|
+
The very minimal executable for running jobs would be this:
|
176
|
+
|
177
|
+
#!/usr/bin/env ruby
|
178
|
+
require 'my_applicaion'
|
179
|
+
Sqewer::CLI.run
|
180
|
+
|
181
|
+
This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
|
182
|
+
use all the default parameters. The `CLI` module will also set up a signal handler to terminate
|
183
|
+
the current jobs cleanly if the commandline app receives a USR1 and TERM.
|
184
|
+
|
185
|
+
You can also run a worker without signal handling, for example in test
|
186
|
+
environments. Note that the worker is asynchronous, it has worker threads
|
187
|
+
which do all the operations by themselves.
|
188
|
+
|
189
|
+
worker = Sqewer::Worker.new
|
190
|
+
worker.start
|
191
|
+
# ...and once you are done testing
|
192
|
+
worker.stop
|
193
|
+
|
194
|
+
## Configuring the worker
|
195
|
+
|
196
|
+
One of the reasons this library exists is that sometimes you need to set up some more
|
197
|
+
things than usually assumed to be possible. For example, you might want to have a special
|
198
|
+
logging library:
|
199
|
+
|
200
|
+
worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
|
201
|
+
|
202
|
+
Or you might want a different job serializer/deserializer (for instance, if you want to handle
|
203
|
+
S3 bucket notifications coming into the same queue):
|
204
|
+
|
205
|
+
worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
|
206
|
+
|
207
|
+
You can also elect to inherit from the `Worker` class and override some default constructor
|
208
|
+
arguments:
|
209
|
+
|
210
|
+
class CustomWorker < Sqewer::Worker
|
211
|
+
def initialize(**kwargs)
|
212
|
+
super(serializer: CustomSerializer.new, ..., **kwargs)
|
213
|
+
end
|
214
|
+
end
|
215
|
+
|
216
|
+
The `Sqewer::CLI` module that you run from the commandline handler application can be
|
217
|
+
started with your custom Worker of choice:
|
218
|
+
|
219
|
+
custom_worker = Sqewer::Worker.new(logger: special_logger)
|
220
|
+
Sqewer::CLI.start(custom_worker)
|
221
|
+
|
222
|
+
## Threads versus processes
|
223
|
+
|
224
|
+
sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
|
225
|
+
management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
|
226
|
+
to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
|
227
|
+
you generate. For example, you could use a pipe. But in a more general case something like this can be used:
|
228
|
+
|
229
|
+
class MyJob
|
230
|
+
def run
|
231
|
+
pid = fork do
|
232
|
+
SomeRemoteService.reconnect # you are in the child process now
|
233
|
+
ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
|
234
|
+
end
|
235
|
+
end
|
236
|
+
|
237
|
+
_, status = Process.wait2(pid)
|
238
|
+
|
239
|
+
# Raise an error in the parent process to signal Sqewer that the job failed
|
240
|
+
# if the child exited with a non-0 status
|
241
|
+
raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
|
242
|
+
end
|
243
|
+
end
|
244
|
+
|
245
|
+
## Execution and serialization wrappers (middleware)
|
246
|
+
|
247
|
+
You can wrap job processing in middleware. A full-featured middleware class looks like this:
|
248
|
+
|
249
|
+
class MyWrapper
|
250
|
+
# Surrounds the job instantiation from the string coming from SQS.
|
251
|
+
def around_deserialization(serializer, msg_id, msg_payload)
|
252
|
+
# msg_id is the receipt handle, msg_payload is the message body string
|
253
|
+
yield
|
254
|
+
end
|
255
|
+
|
256
|
+
# Surrounds the actual job execution
|
257
|
+
def around_execution(job, context)
|
258
|
+
# job is the actual job you will be running, context is the ExecutionContext.
|
259
|
+
yield
|
260
|
+
end
|
261
|
+
end
|
262
|
+
|
263
|
+
You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
|
264
|
+
|
265
|
+
stack = Sqewer::MiddlewareStack.new
|
266
|
+
stack << MyWrapper.new
|
267
|
+
w = Sqewer::Worker.new(middleware_stack: stack)
|
268
|
+
|
269
|
+
# Execution guarantees
|
270
|
+
|
271
|
+
As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
|
272
|
+
`ensure` clause.
|
273
|
+
|
274
|
+
* When a job succeeds (raises no exceptions), it will be deleted from the queue
|
275
|
+
* When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
|
276
|
+
* When a job, or any wrapper routing of the job execution,
|
277
|
+
raises any exception, the job will not be deleted
|
278
|
+
* When a submit spun off from the job, or the deletion of the job itself,
|
279
|
+
cause an exception, the job will not be deleted
|
280
|
+
|
281
|
+
Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
|
282
|
+
start at the same job at the same time), idempotent (a job should be able to run twice without errors),
|
283
|
+
and traceable (make good use of logging).
|
284
|
+
|
285
|
+
# Usage with Rails via ActiveJob
|
286
|
+
|
287
|
+
This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
|
288
|
+
is well-tested and should function for any well-conforming ActiveJob subclasses.
|
289
|
+
|
290
|
+
To run the default `sqewer` worker setup against your Rails application, first set it as the
|
291
|
+
executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
|
292
|
+
in the environment variables, and make sure you can access it using your default (envvar-based
|
293
|
+
or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
|
294
|
+
|
295
|
+
class Application < Rails::Application
|
296
|
+
...
|
297
|
+
config.active_job.queue_adapter = :sqewer
|
298
|
+
end
|
299
|
+
|
300
|
+
and then run
|
301
|
+
|
302
|
+
$ bundle exec sqewer_rails
|
303
|
+
|
304
|
+
in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
|
305
|
+
for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
|
306
|
+
before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
|
307
|
+
play nice with threading. So do not forget to add this in your worker code:
|
308
|
+
|
309
|
+
Rails.application.eager_load!
|
310
|
+
|
311
|
+
For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
|
312
|
+
|
313
|
+
## ActiveJob feature support matrix
|
314
|
+
|
315
|
+
Compared to the matrix of features as seen in the
|
316
|
+
[official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
|
317
|
+
`sqewer` has the following support for various ActiveJob options, in comparison to the builtin
|
318
|
+
ActiveJob adapters:
|
319
|
+
|
320
|
+
| | Async | Queues | Delayed | Priorities | Timeout | Retries |
|
321
|
+
|-------------------|-------|--------|------------|------------|---------|---------|
|
322
|
+
| sqewer | Yes | No | Yes | No | No | Global |
|
323
|
+
| // | // | // | // | // | // | // |
|
324
|
+
| Active Job Async | Yes | Yes | Yes | No | No | No |
|
325
|
+
| Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
|
326
|
+
|
327
|
+
Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
|
328
|
+
since all the messages go to the queue available to `Sqewer.submit!`.
|
329
|
+
|
330
|
+
There is no timeout handling, if you need it you may want to implement it within your jobs proper.
|
331
|
+
Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
|
332
|
+
|
333
|
+
## Delay handling
|
334
|
+
|
335
|
+
Delayed execution is handled via a combination
|
336
|
+
of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
|
337
|
+
in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
|
338
|
+
delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
|
339
|
+
|
340
|
+
If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
|
341
|
+
a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
|
342
|
+
delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
|
343
|
+
queue with the same maximum delay, and will continue doing so for as long as necessary.
|
344
|
+
|
345
|
+
Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
|
346
|
+
it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
|
347
|
+
you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
|
348
|
+
for the actual, executable background task.
|
349
|
+
|
350
|
+
# Frequently asked questions (A.K.A. _why is it done this way_)
|
351
|
+
|
352
|
+
This document tries to answer some questions that may arise when reading or using the library. Hopefully
|
353
|
+
this can provide some answers with regards to how things are put together.
|
354
|
+
|
355
|
+
## Why separate `new` and `run` methods instead of just `perform`?
|
356
|
+
|
357
|
+
Because the job needs access to the execution context of the worker. It turned out that keeping the context
|
358
|
+
in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
|
359
|
+
to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
|
360
|
+
it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
|
361
|
+
|
362
|
+
Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
|
363
|
+
to save those parameters in instance variables or in struct members.
|
364
|
+
|
365
|
+
## Why keyword constructors for jobs?
|
366
|
+
|
367
|
+
Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
|
368
|
+
by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
|
369
|
+
products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
|
370
|
+
|
371
|
+
## Why no weighted queues?
|
372
|
+
|
373
|
+
Because very often when you want to split queues servicing one application it means that you do not have enough
|
374
|
+
capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
|
375
|
+
whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
|
376
|
+
stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
|
377
|
+
products using this library, was a very bad idea (more workload for deployment).
|
378
|
+
|
379
|
+
## Why so many configurable components?
|
380
|
+
|
381
|
+
Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
|
382
|
+
implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
|
383
|
+
needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
|
384
|
+
Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
|
385
|
+
pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
|
386
|
+
have a magic `.default` method which will liberate you from most complexities.
|
387
|
+
|
388
|
+
## Why multithreading for workers?
|
389
|
+
|
390
|
+
Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
|
391
|
+
network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
|
392
|
+
the operating system with too many processes. An optional feature for one-process-per-job is going to be added
|
393
|
+
soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
|
394
|
+
|
395
|
+
## Why no Celluloid?
|
396
|
+
|
397
|
+
Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
|
398
|
+
the Ruby standard library alone.
|
64
399
|
|
65
400
|
## Contributing to the library
|
66
401
|
|
data/lib/sqewer.rb
CHANGED
@@ -7,7 +7,7 @@ module Sqewer
|
|
7
7
|
require path
|
8
8
|
end
|
9
9
|
end
|
10
|
-
|
10
|
+
|
11
11
|
# Loads a particular Sqewer extension that is not loaded
|
12
12
|
# automatically during the gem require.
|
13
13
|
#
|
@@ -16,7 +16,7 @@ module Sqewer
|
|
16
16
|
path = File.join("sqewer", "extensions", extension_name)
|
17
17
|
require_relative path
|
18
18
|
end
|
19
|
-
|
19
|
+
|
20
20
|
# Shortcut access to Submitter#submit.
|
21
21
|
#
|
22
22
|
# @see {Sqewer::Submitter#submit!}
|
data/lib/sqewer/connection.rb
CHANGED
data/lib/sqewer/version.rb
CHANGED
data/sqewer.gemspec
CHANGED
@@ -29,7 +29,7 @@ Gem::Specification.new do |spec|
|
|
29
29
|
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
30
30
|
spec.require_paths = ["lib"]
|
31
31
|
|
32
|
-
spec.add_runtime_dependency 'aws-sdk', '~>
|
32
|
+
spec.add_runtime_dependency 'aws-sdk-sqs', '~> 1'
|
33
33
|
spec.add_runtime_dependency 'rack'
|
34
34
|
spec.add_runtime_dependency 'very_tiny_state_machine'
|
35
35
|
spec.add_runtime_dependency 'ks'
|
metadata
CHANGED
@@ -1,29 +1,29 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sqewer
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 6.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Julik Tarkhanov
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-08
|
11
|
+
date: 2017-09-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
|
-
name: aws-sdk
|
14
|
+
name: aws-sdk-sqs
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: '
|
19
|
+
version: '1'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: '
|
26
|
+
version: '1'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: rack
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -219,10 +219,7 @@ files:
|
|
219
219
|
- ".gitlab-ci.yml"
|
220
220
|
- ".travis.yml"
|
221
221
|
- ".yardopts"
|
222
|
-
- ACTIVE_JOB.md
|
223
222
|
- CHANGELOG.md
|
224
|
-
- DETAILS.md
|
225
|
-
- FAQ.md
|
226
223
|
- Gemfile
|
227
224
|
- README.md
|
228
225
|
- Rakefile
|
data/ACTIVE_JOB.md
DELETED
@@ -1,64 +0,0 @@
|
|
1
|
-
# Sqewer with ActiveJob
|
2
|
-
|
3
|
-
This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
|
4
|
-
is well-tested and should function for any well-conforming ActiveJob subclasses.
|
5
|
-
|
6
|
-
To run the default `sqewer` worker setup against your Rails application, first set it as the
|
7
|
-
executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
|
8
|
-
in the environment variables, and make sure you can access it using your default (envvar-based
|
9
|
-
or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
|
10
|
-
|
11
|
-
class Application < Rails::Application
|
12
|
-
...
|
13
|
-
config.active_job.queue_adapter = :sqewer
|
14
|
-
end
|
15
|
-
|
16
|
-
and then run
|
17
|
-
|
18
|
-
$ bundle exec sqewer_rails
|
19
|
-
|
20
|
-
in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
|
21
|
-
for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
|
22
|
-
before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
|
23
|
-
play nice with threading. So do not forget to add this in your worker code:
|
24
|
-
|
25
|
-
Rails.application.eager_load!
|
26
|
-
|
27
|
-
For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
|
28
|
-
|
29
|
-
## ActiveJob feature support matrix
|
30
|
-
|
31
|
-
Compared to the matrix of features as seen in the
|
32
|
-
[official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
|
33
|
-
`sqewer` has the following support for various ActiveJob options, in comparison to the builtin
|
34
|
-
ActiveJob adapters:
|
35
|
-
|
36
|
-
| | Async | Queues | Delayed | Priorities | Timeout | Retries |
|
37
|
-
|-------------------|-------|--------|------------|------------|---------|---------|
|
38
|
-
| sqewer | Yes | No | Yes | No | No | Global |
|
39
|
-
| // | // | // | // | // | // | // |
|
40
|
-
| Active Job Async | Yes | Yes | Yes | No | No | No |
|
41
|
-
| Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
|
42
|
-
|
43
|
-
Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
|
44
|
-
since all the messages go to the queue available to `Sqewer.submit!`.
|
45
|
-
|
46
|
-
There is no timeout handling, if you need it you may want to implement it within your jobs proper.
|
47
|
-
Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
|
48
|
-
|
49
|
-
## Delay handling
|
50
|
-
|
51
|
-
Delayed execution is handled via a combination
|
52
|
-
of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
|
53
|
-
in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
|
54
|
-
delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
|
55
|
-
|
56
|
-
If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
|
57
|
-
a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
|
58
|
-
delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
|
59
|
-
queue with the same maximum delay, and will continue doing so for as long as necessary.
|
60
|
-
|
61
|
-
Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
|
62
|
-
it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
|
63
|
-
you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
|
64
|
-
for the actual, executable background task.
|
data/DETAILS.md
DELETED
@@ -1,233 +0,0 @@
|
|
1
|
-
A more in-depth explanation of the systems below.
|
2
|
-
|
3
|
-
## Job storage
|
4
|
-
|
5
|
-
Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
|
6
|
-
|
7
|
-
{"_job_class": "MyJob", "_job_params": null}
|
8
|
-
|
9
|
-
When this ticket is being picked up by the worker, the worker will do the following:
|
10
|
-
|
11
|
-
job = MyJob.new
|
12
|
-
job.run
|
13
|
-
|
14
|
-
So the smallest job class has to be instantiatable, and has to respond to the `run` message.
|
15
|
-
|
16
|
-
## Jobs with arguments and parameters
|
17
|
-
|
18
|
-
Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
|
19
|
-
directly translated to keyword arguments of the job constructor. With a job ticket like this:
|
20
|
-
|
21
|
-
{
|
22
|
-
"_job_class": "MyJob",
|
23
|
-
"_job_params": {"ids": [1,2,3]}
|
24
|
-
}
|
25
|
-
|
26
|
-
the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
|
27
|
-
|
28
|
-
job = MyJob.new(ids: [1,2,3])
|
29
|
-
job.run
|
30
|
-
|
31
|
-
Note that at this point only arguments that are raw JSON types are supported:
|
32
|
-
|
33
|
-
* Hash
|
34
|
-
* Array
|
35
|
-
* Numeric
|
36
|
-
* String
|
37
|
-
* nil/false/true
|
38
|
-
|
39
|
-
If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
|
40
|
-
|
41
|
-
## Jobs spawning dependent jobs
|
42
|
-
|
43
|
-
If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
|
44
|
-
be passed to the `run` method.
|
45
|
-
|
46
|
-
job = MyJob.new(ids: [1,2,3])
|
47
|
-
job.run(execution_context)
|
48
|
-
|
49
|
-
The execution context has some useful methods:
|
50
|
-
|
51
|
-
* `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
|
52
|
-
* `submit!` for submitting more jobs to the same queue
|
53
|
-
|
54
|
-
A job submitting a subsequent job could look like this:
|
55
|
-
|
56
|
-
class MyJob
|
57
|
-
def run(ctx)
|
58
|
-
...
|
59
|
-
ctx.submit!(DeferredCleanupJob.new)
|
60
|
-
end
|
61
|
-
end
|
62
|
-
|
63
|
-
## Job submission
|
64
|
-
|
65
|
-
In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
|
66
|
-
include all the keyword arguments needed to instantiate the job when executing. For example:
|
67
|
-
|
68
|
-
class SendMail
|
69
|
-
def initialize(to:, body:)
|
70
|
-
...
|
71
|
-
end
|
72
|
-
|
73
|
-
def run()
|
74
|
-
...
|
75
|
-
end
|
76
|
-
|
77
|
-
def to_h
|
78
|
-
{to: @to, body: @body}
|
79
|
-
end
|
80
|
-
end
|
81
|
-
|
82
|
-
Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
|
83
|
-
|
84
|
-
class SendMail < Ks.strict(:to, :body)
|
85
|
-
def run
|
86
|
-
...
|
87
|
-
end
|
88
|
-
end
|
89
|
-
|
90
|
-
## Job marshaling
|
91
|
-
|
92
|
-
By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
|
93
|
-
override that object if you need to handle job tickets that come from external sources and do not necessarily
|
94
|
-
conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
|
95
|
-
|
96
|
-
class CustomSerializer < Sqewer::Serializer
|
97
|
-
# Overridden so that we can instantiate a custom job
|
98
|
-
# from the AWS notification payload.
|
99
|
-
# Return "nil" and the job will be simply deleted from the queue
|
100
|
-
def unserialize(message_blob)
|
101
|
-
message = JSON.load(message_blob)
|
102
|
-
return if message['Service'] # AWS test
|
103
|
-
return HandleS3Notification.new(message) if message['Records']
|
104
|
-
|
105
|
-
super # as default
|
106
|
-
end
|
107
|
-
end
|
108
|
-
|
109
|
-
Or you can override the serialization method to add some metadata to the job ticket on job submission:
|
110
|
-
|
111
|
-
class CustomSerializer < Sqewer::Serializer
|
112
|
-
def serialize(job_object)
|
113
|
-
json_blob = super
|
114
|
-
parsed = JSON.load(json_blob)
|
115
|
-
parsed['_submitter_host'] = Socket.gethostname
|
116
|
-
JSON.dump(parsed)
|
117
|
-
end
|
118
|
-
end
|
119
|
-
|
120
|
-
If you return `nil` from your `unserialize` method the job will not be executed,
|
121
|
-
but will just be deleted from the SQS queue.
|
122
|
-
|
123
|
-
## Starting and running the worker
|
124
|
-
|
125
|
-
The very minimal executable for running jobs would be this:
|
126
|
-
|
127
|
-
#!/usr/bin/env ruby
|
128
|
-
require 'my_applicaion'
|
129
|
-
Sqewer::CLI.run
|
130
|
-
|
131
|
-
This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
|
132
|
-
use all the default parameters. The `CLI` module will also set up a signal handler to terminate
|
133
|
-
the current jobs cleanly if the commandline app receives a USR1 and TERM.
|
134
|
-
|
135
|
-
You can also run a worker without signal handling, for example in test
|
136
|
-
environments. Note that the worker is asynchronous, it has worker threads
|
137
|
-
which do all the operations by themselves.
|
138
|
-
|
139
|
-
worker = Sqewer::Worker.new
|
140
|
-
worker.start
|
141
|
-
# ...and once you are done testing
|
142
|
-
worker.stop
|
143
|
-
|
144
|
-
## Configuring the worker
|
145
|
-
|
146
|
-
One of the reasons this library exists is that sometimes you need to set up some more
|
147
|
-
things than usually assumed to be possible. For example, you might want to have a special
|
148
|
-
logging library:
|
149
|
-
|
150
|
-
worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
|
151
|
-
|
152
|
-
Or you might want a different job serializer/deserializer (for instance, if you want to handle
|
153
|
-
S3 bucket notifications coming into the same queue):
|
154
|
-
|
155
|
-
worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
|
156
|
-
|
157
|
-
You can also elect to inherit from the `Worker` class and override some default constructor
|
158
|
-
arguments:
|
159
|
-
|
160
|
-
class CustomWorker < Sqewer::Worker
|
161
|
-
def initialize(**kwargs)
|
162
|
-
super(serializer: CustomSerializer.new, ..., **kwargs)
|
163
|
-
end
|
164
|
-
end
|
165
|
-
|
166
|
-
The `Sqewer::CLI` module that you run from the commandline handler application can be
|
167
|
-
started with your custom Worker of choice:
|
168
|
-
|
169
|
-
custom_worker = Sqewer::Worker.new(logger: special_logger)
|
170
|
-
Sqewer::CLI.start(custom_worker)
|
171
|
-
|
172
|
-
## Threads versus processes
|
173
|
-
|
174
|
-
sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
|
175
|
-
management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
|
176
|
-
to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
|
177
|
-
you generate. For example, you could use a pipe. But in a more general case something like this can be used:
|
178
|
-
|
179
|
-
class MyJob
|
180
|
-
def run
|
181
|
-
pid = fork do
|
182
|
-
SomeRemoteService.reconnect # you are in the child process now
|
183
|
-
ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
|
184
|
-
end
|
185
|
-
end
|
186
|
-
|
187
|
-
_, status = Process.wait2(pid)
|
188
|
-
|
189
|
-
# Raise an error in the parent process to signal Sqewer that the job failed
|
190
|
-
# if the child exited with a non-0 status
|
191
|
-
raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
|
192
|
-
end
|
193
|
-
end
|
194
|
-
|
195
|
-
## Execution and serialization wrappers (middleware)
|
196
|
-
|
197
|
-
You can wrap job processing in middleware. A full-featured middleware class looks like this:
|
198
|
-
|
199
|
-
class MyWrapper
|
200
|
-
# Surrounds the job instantiation from the string coming from SQS.
|
201
|
-
def around_deserialization(serializer, msg_id, msg_payload)
|
202
|
-
# msg_id is the receipt handle, msg_payload is the message body string
|
203
|
-
yield
|
204
|
-
end
|
205
|
-
|
206
|
-
# Surrounds the actual job execution
|
207
|
-
def around_execution(job, context)
|
208
|
-
# job is the actual job you will be running, context is the ExecutionContext.
|
209
|
-
yield
|
210
|
-
end
|
211
|
-
end
|
212
|
-
|
213
|
-
You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
|
214
|
-
|
215
|
-
stack = Sqewer::MiddlewareStack.new
|
216
|
-
stack << MyWrapper.new
|
217
|
-
w = Sqewer::Worker.new(middleware_stack: stack)
|
218
|
-
|
219
|
-
# Execution guarantees
|
220
|
-
|
221
|
-
As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
|
222
|
-
`ensure` clause.
|
223
|
-
|
224
|
-
* When a job succeeds (raises no exceptions), it will be deleted from the queue
|
225
|
-
* When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
|
226
|
-
* When a job, or any wrapper routing of the job execution,
|
227
|
-
raises any exception, the job will not be deleted
|
228
|
-
* When a submit spun off from the job, or the deletion of the job itself,
|
229
|
-
cause an exception, the job will not be deleted
|
230
|
-
|
231
|
-
Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
|
232
|
-
start at the same job at the same time), idempotent (a job should be able to run twice without errors),
|
233
|
-
and traceable (make good use of logging).
|
data/FAQ.md
DELETED
@@ -1,50 +0,0 @@
|
|
1
|
-
# FAQ
|
2
|
-
|
3
|
-
This document tries to answer some questions that may arise when reading or using the library. Hopefully
|
4
|
-
this can provide some answers with regards to how things are put together.
|
5
|
-
|
6
|
-
## Why separate `new` and `run` methods instead of just `perform`?
|
7
|
-
|
8
|
-
Because the job needs access to the execution context of the worker. It turned out that keeping the context
|
9
|
-
in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
|
10
|
-
to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
|
11
|
-
it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
|
12
|
-
|
13
|
-
Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
|
14
|
-
to save those parameters in instance variables or in struct members.
|
15
|
-
|
16
|
-
## Why keyword constructors for jobs?
|
17
|
-
|
18
|
-
Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
|
19
|
-
by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
|
20
|
-
products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
|
21
|
-
|
22
|
-
## Why no weighted queues?
|
23
|
-
|
24
|
-
Because very often when you want to split queues servicing one application it means that you do not have enough
|
25
|
-
capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
|
26
|
-
whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
|
27
|
-
stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
|
28
|
-
products using this library, was a very bad idea (more workload for deployment).
|
29
|
-
|
30
|
-
## Why so many configurable components?
|
31
|
-
|
32
|
-
Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
|
33
|
-
implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
|
34
|
-
needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
|
35
|
-
Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
|
36
|
-
pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
|
37
|
-
have a magic `.default` method which will liberate you from most complexities.
|
38
|
-
|
39
|
-
## Why multithreading for workers?
|
40
|
-
|
41
|
-
Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
|
42
|
-
network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
|
43
|
-
the operating system with too many processes. An optional feature for one-process-per-job is going to be added
|
44
|
-
soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
|
45
|
-
|
46
|
-
## Why no Celluloid?
|
47
|
-
|
48
|
-
Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
|
49
|
-
the Ruby standard library alone.
|
50
|
-
|