sqewer 5.1.1 → 6.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a60bc89bd5a39d387bbb3375d370c93b9d02f43
4
- data.tar.gz: b8419eb16d2cfdff2ca1639f9b3895ffc29f0995
3
+ metadata.gz: 265bbc0bc13eeef0b5e99b04c509d99464e7d236
4
+ data.tar.gz: 48f95a807d92fc31956524b7ff07892851cf4369
5
5
  SHA512:
6
- metadata.gz: a05d2aee6a099a8eea61657e51b1bb975964ce17aeb95a2366616a315a10da457d1e4da59dcb549a2c4179982d1f56b88017a4d4ba9545a9b3f49117d7455baf
7
- data.tar.gz: 45dcf60818e4bf48c2cb929267c91f29dc7acfd33ccad4915e2d3c88b93033aa63c904b789100f88c546d37c4e50517b2535d0d340691771bcb7730261a257d9
6
+ metadata.gz: 6e6329fcbc8e9ba24adc4f98231a6875c94536de2d74c72fd3b670f3b0b04f27466e7776e3ce9d63cb7631f4ce82bc2143d40df96d4c3f5f3659ce5d6ae90b8c
7
+ data.tar.gz: '0319a0135413668649e37b3f0ad449eff0d857d8e5cae912111385d02bc1fa5090e4f0749388eb8fb6d0fb8417fb78102022c05efaea0e87443700353eafbaf0'
data/README.md CHANGED
@@ -48,19 +48,354 @@ The messages will only be deleted from SQS once the job execution completes with
48
48
 
49
49
  ## Requirements
50
50
 
51
- Ruby 2.1+, version 2 of the AWS SDK.
51
+ Ruby 2.1+, version 2 of the AWS SDK. You can also run Sqewer backed by a SQLite database file, which can be handy for development situations.
52
52
 
53
- ## Detailed usage instructions
53
+ ## Job storage
54
54
 
55
- For more detailed usage information, see [DETAILS.md](./DETAILS.md)
55
+ Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
56
56
 
57
- ## Frequently asked questions (A.K.A. _why is it done this way_)
57
+ {"_job_class": "MyJob", "_job_params": null}
58
58
 
59
- Please see [FAQ.md](./FAQ.md). This might explain some decisions behind the library in greater detail.
59
+ When this ticket is being picked up by the worker, the worker will do the following:
60
60
 
61
- ## Usage with Rails via ActiveJob
61
+ job = MyJob.new
62
+ job.run
62
63
 
63
- Please see [ACTIVE_JOB.md](./ACTIVE_JOB.md) for the exact description.
64
+ So the smallest job class has to be instantiatable, and has to respond to the `run` message.
65
+
66
+ ## Jobs with arguments and parameters
67
+
68
+ Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
69
+ directly translated to keyword arguments of the job constructor. With a job ticket like this:
70
+
71
+ {
72
+ "_job_class": "MyJob",
73
+ "_job_params": {"ids": [1,2,3]}
74
+ }
75
+
76
+ the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
77
+
78
+ job = MyJob.new(ids: [1,2,3])
79
+ job.run
80
+
81
+ Note that at this point only arguments that are raw JSON types are supported:
82
+
83
+ * Hash
84
+ * Array
85
+ * Numeric
86
+ * String
87
+ * nil/false/true
88
+
89
+ If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
90
+
91
+ ## Jobs spawning dependent jobs
92
+
93
+ If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
94
+ be passed to the `run` method.
95
+
96
+ job = MyJob.new(ids: [1,2,3])
97
+ job.run(execution_context)
98
+
99
+ The execution context has some useful methods:
100
+
101
+ * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
102
+ * `submit!` for submitting more jobs to the same queue
103
+
104
+ A job submitting a subsequent job could look like this:
105
+
106
+ class MyJob
107
+ def run(ctx)
108
+ ...
109
+ ctx.submit!(DeferredCleanupJob.new)
110
+ end
111
+ end
112
+
113
+ ## Job submission
114
+
115
+ In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
116
+ include all the keyword arguments needed to instantiate the job when executing. For example:
117
+
118
+ class SendMail
119
+ def initialize(to:, body:)
120
+ ...
121
+ end
122
+
123
+ def run()
124
+ ...
125
+ end
126
+
127
+ def to_h
128
+ {to: @to, body: @body}
129
+ end
130
+ end
131
+
132
+ Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
133
+
134
+ class SendMail < Ks.strict(:to, :body)
135
+ def run
136
+ ...
137
+ end
138
+ end
139
+
140
+ ## Job marshaling
141
+
142
+ By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
143
+ override that object if you need to handle job tickets that come from external sources and do not necessarily
144
+ conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
145
+
146
+ class CustomSerializer < Sqewer::Serializer
147
+ # Overridden so that we can instantiate a custom job
148
+ # from the AWS notification payload.
149
+ # Return "nil" and the job will be simply deleted from the queue
150
+ def unserialize(message_blob)
151
+ message = JSON.load(message_blob)
152
+ return if message['Service'] # AWS test
153
+ return HandleS3Notification.new(message) if message['Records']
154
+
155
+ super # as default
156
+ end
157
+ end
158
+
159
+ Or you can override the serialization method to add some metadata to the job ticket on job submission:
160
+
161
+ class CustomSerializer < Sqewer::Serializer
162
+ def serialize(job_object)
163
+ json_blob = super
164
+ parsed = JSON.load(json_blob)
165
+ parsed['_submitter_host'] = Socket.gethostname
166
+ JSON.dump(parsed)
167
+ end
168
+ end
169
+
170
+ If you return `nil` from your `unserialize` method the job will not be executed,
171
+ but will just be deleted from the SQS queue.
172
+
173
+ ## Starting and running the worker
174
+
175
+ The very minimal executable for running jobs would be this:
176
+
177
+ #!/usr/bin/env ruby
178
+ require 'my_applicaion'
179
+ Sqewer::CLI.run
180
+
181
+ This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
182
+ use all the default parameters. The `CLI` module will also set up a signal handler to terminate
183
+ the current jobs cleanly if the commandline app receives a USR1 and TERM.
184
+
185
+ You can also run a worker without signal handling, for example in test
186
+ environments. Note that the worker is asynchronous, it has worker threads
187
+ which do all the operations by themselves.
188
+
189
+ worker = Sqewer::Worker.new
190
+ worker.start
191
+ # ...and once you are done testing
192
+ worker.stop
193
+
194
+ ## Configuring the worker
195
+
196
+ One of the reasons this library exists is that sometimes you need to set up some more
197
+ things than usually assumed to be possible. For example, you might want to have a special
198
+ logging library:
199
+
200
+ worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
201
+
202
+ Or you might want a different job serializer/deserializer (for instance, if you want to handle
203
+ S3 bucket notifications coming into the same queue):
204
+
205
+ worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
206
+
207
+ You can also elect to inherit from the `Worker` class and override some default constructor
208
+ arguments:
209
+
210
+ class CustomWorker < Sqewer::Worker
211
+ def initialize(**kwargs)
212
+ super(serializer: CustomSerializer.new, ..., **kwargs)
213
+ end
214
+ end
215
+
216
+ The `Sqewer::CLI` module that you run from the commandline handler application can be
217
+ started with your custom Worker of choice:
218
+
219
+ custom_worker = Sqewer::Worker.new(logger: special_logger)
220
+ Sqewer::CLI.start(custom_worker)
221
+
222
+ ## Threads versus processes
223
+
224
+ sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
225
+ management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
226
+ to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
227
+ you generate. For example, you could use a pipe. But in a more general case something like this can be used:
228
+
229
+ class MyJob
230
+ def run
231
+ pid = fork do
232
+ SomeRemoteService.reconnect # you are in the child process now
233
+ ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
234
+ end
235
+ end
236
+
237
+ _, status = Process.wait2(pid)
238
+
239
+ # Raise an error in the parent process to signal Sqewer that the job failed
240
+ # if the child exited with a non-0 status
241
+ raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
242
+ end
243
+ end
244
+
245
+ ## Execution and serialization wrappers (middleware)
246
+
247
+ You can wrap job processing in middleware. A full-featured middleware class looks like this:
248
+
249
+ class MyWrapper
250
+ # Surrounds the job instantiation from the string coming from SQS.
251
+ def around_deserialization(serializer, msg_id, msg_payload)
252
+ # msg_id is the receipt handle, msg_payload is the message body string
253
+ yield
254
+ end
255
+
256
+ # Surrounds the actual job execution
257
+ def around_execution(job, context)
258
+ # job is the actual job you will be running, context is the ExecutionContext.
259
+ yield
260
+ end
261
+ end
262
+
263
+ You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
264
+
265
+ stack = Sqewer::MiddlewareStack.new
266
+ stack << MyWrapper.new
267
+ w = Sqewer::Worker.new(middleware_stack: stack)
268
+
269
+ # Execution guarantees
270
+
271
+ As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
272
+ `ensure` clause.
273
+
274
+ * When a job succeeds (raises no exceptions), it will be deleted from the queue
275
+ * When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
276
+ * When a job, or any wrapper routing of the job execution,
277
+ raises any exception, the job will not be deleted
278
+ * When a submit spun off from the job, or the deletion of the job itself,
279
+ cause an exception, the job will not be deleted
280
+
281
+ Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
282
+ start at the same job at the same time), idempotent (a job should be able to run twice without errors),
283
+ and traceable (make good use of logging).
284
+
285
+ # Usage with Rails via ActiveJob
286
+
287
+ This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
288
+ is well-tested and should function for any well-conforming ActiveJob subclasses.
289
+
290
+ To run the default `sqewer` worker setup against your Rails application, first set it as the
291
+ executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
292
+ in the environment variables, and make sure you can access it using your default (envvar-based
293
+ or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
294
+
295
+ class Application < Rails::Application
296
+ ...
297
+ config.active_job.queue_adapter = :sqewer
298
+ end
299
+
300
+ and then run
301
+
302
+ $ bundle exec sqewer_rails
303
+
304
+ in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
305
+ for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
306
+ before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
307
+ play nice with threading. So do not forget to add this in your worker code:
308
+
309
+ Rails.application.eager_load!
310
+
311
+ For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
312
+
313
+ ## ActiveJob feature support matrix
314
+
315
+ Compared to the matrix of features as seen in the
316
+ [official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
317
+ `sqewer` has the following support for various ActiveJob options, in comparison to the builtin
318
+ ActiveJob adapters:
319
+
320
+ | | Async | Queues | Delayed | Priorities | Timeout | Retries |
321
+ |-------------------|-------|--------|------------|------------|---------|---------|
322
+ | sqewer | Yes | No | Yes | No | No | Global |
323
+ | // | // | // | // | // | // | // |
324
+ | Active Job Async | Yes | Yes | Yes | No | No | No |
325
+ | Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
326
+
327
+ Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
328
+ since all the messages go to the queue available to `Sqewer.submit!`.
329
+
330
+ There is no timeout handling, if you need it you may want to implement it within your jobs proper.
331
+ Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
332
+
333
+ ## Delay handling
334
+
335
+ Delayed execution is handled via a combination
336
+ of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
337
+ in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
338
+ delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
339
+
340
+ If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
341
+ a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
342
+ delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
343
+ queue with the same maximum delay, and will continue doing so for as long as necessary.
344
+
345
+ Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
346
+ it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
347
+ you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
348
+ for the actual, executable background task.
349
+
350
+ # Frequently asked questions (A.K.A. _why is it done this way_)
351
+
352
+ This document tries to answer some questions that may arise when reading or using the library. Hopefully
353
+ this can provide some answers with regards to how things are put together.
354
+
355
+ ## Why separate `new` and `run` methods instead of just `perform`?
356
+
357
+ Because the job needs access to the execution context of the worker. It turned out that keeping the context
358
+ in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
359
+ to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
360
+ it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
361
+
362
+ Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
363
+ to save those parameters in instance variables or in struct members.
364
+
365
+ ## Why keyword constructors for jobs?
366
+
367
+ Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
368
+ by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
369
+ products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
370
+
371
+ ## Why no weighted queues?
372
+
373
+ Because very often when you want to split queues servicing one application it means that you do not have enough
374
+ capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
375
+ whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
376
+ stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
377
+ products using this library, was a very bad idea (more workload for deployment).
378
+
379
+ ## Why so many configurable components?
380
+
381
+ Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
382
+ implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
383
+ needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
384
+ Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
385
+ pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
386
+ have a magic `.default` method which will liberate you from most complexities.
387
+
388
+ ## Why multithreading for workers?
389
+
390
+ Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
391
+ network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
392
+ the operating system with too many processes. An optional feature for one-process-per-job is going to be added
393
+ soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
394
+
395
+ ## Why no Celluloid?
396
+
397
+ Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
398
+ the Ruby standard library alone.
64
399
 
65
400
  ## Contributing to the library
66
401
 
@@ -7,7 +7,7 @@ module Sqewer
7
7
  require path
8
8
  end
9
9
  end
10
-
10
+
11
11
  # Loads a particular Sqewer extension that is not loaded
12
12
  # automatically during the gem require.
13
13
  #
@@ -16,7 +16,7 @@ module Sqewer
16
16
  path = File.join("sqewer", "extensions", extension_name)
17
17
  require_relative path
18
18
  end
19
-
19
+
20
20
  # Shortcut access to Submitter#submit.
21
21
  #
22
22
  # @see {Sqewer::Submitter#submit!}
@@ -43,7 +43,7 @@ class Sqewer::Connection
43
43
  #
44
44
  # @param queue_url[String] the SQS queue URL (the URL can be copied from your AWS console)
45
45
  def initialize(queue_url)
46
- require 'aws-sdk'
46
+ require 'aws-sdk-sqs'
47
47
  @queue_url = queue_url
48
48
  end
49
49
 
@@ -1,3 +1,3 @@
1
1
  module Sqewer
2
- VERSION = '5.1.1'
2
+ VERSION = '6.0.0'
3
3
  end
@@ -29,7 +29,7 @@ Gem::Specification.new do |spec|
29
29
  spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
30
30
  spec.require_paths = ["lib"]
31
31
 
32
- spec.add_runtime_dependency 'aws-sdk', '~> 2'
32
+ spec.add_runtime_dependency 'aws-sdk-sqs', '~> 1'
33
33
  spec.add_runtime_dependency 'rack'
34
34
  spec.add_runtime_dependency 'very_tiny_state_machine'
35
35
  spec.add_runtime_dependency 'ks'
metadata CHANGED
@@ -1,29 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sqewer
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.1.1
4
+ version: 6.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Julik Tarkhanov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-31 00:00:00.000000000 Z
11
+ date: 2017-09-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: aws-sdk
14
+ name: aws-sdk-sqs
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '2'
19
+ version: '1'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '2'
26
+ version: '1'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rack
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -219,10 +219,7 @@ files:
219
219
  - ".gitlab-ci.yml"
220
220
  - ".travis.yml"
221
221
  - ".yardopts"
222
- - ACTIVE_JOB.md
223
222
  - CHANGELOG.md
224
- - DETAILS.md
225
- - FAQ.md
226
223
  - Gemfile
227
224
  - README.md
228
225
  - Rakefile
@@ -1,64 +0,0 @@
1
- # Sqewer with ActiveJob
2
-
3
- This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
4
- is well-tested and should function for any well-conforming ActiveJob subclasses.
5
-
6
- To run the default `sqewer` worker setup against your Rails application, first set it as the
7
- executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
8
- in the environment variables, and make sure you can access it using your default (envvar-based
9
- or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
10
-
11
- class Application < Rails::Application
12
- ...
13
- config.active_job.queue_adapter = :sqewer
14
- end
15
-
16
- and then run
17
-
18
- $ bundle exec sqewer_rails
19
-
20
- in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
21
- for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
22
- before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
23
- play nice with threading. So do not forget to add this in your worker code:
24
-
25
- Rails.application.eager_load!
26
-
27
- For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
28
-
29
- ## ActiveJob feature support matrix
30
-
31
- Compared to the matrix of features as seen in the
32
- [official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
33
- `sqewer` has the following support for various ActiveJob options, in comparison to the builtin
34
- ActiveJob adapters:
35
-
36
- | | Async | Queues | Delayed | Priorities | Timeout | Retries |
37
- |-------------------|-------|--------|------------|------------|---------|---------|
38
- | sqewer | Yes | No | Yes | No | No | Global |
39
- | // | // | // | // | // | // | // |
40
- | Active Job Async | Yes | Yes | Yes | No | No | No |
41
- | Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
42
-
43
- Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
44
- since all the messages go to the queue available to `Sqewer.submit!`.
45
-
46
- There is no timeout handling, if you need it you may want to implement it within your jobs proper.
47
- Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
48
-
49
- ## Delay handling
50
-
51
- Delayed execution is handled via a combination
52
- of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
53
- in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
54
- delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
55
-
56
- If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
57
- a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
58
- delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
59
- queue with the same maximum delay, and will continue doing so for as long as necessary.
60
-
61
- Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
62
- it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
63
- you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
64
- for the actual, executable background task.
data/DETAILS.md DELETED
@@ -1,233 +0,0 @@
1
- A more in-depth explanation of the systems below.
2
-
3
- ## Job storage
4
-
5
- Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
6
-
7
- {"_job_class": "MyJob", "_job_params": null}
8
-
9
- When this ticket is being picked up by the worker, the worker will do the following:
10
-
11
- job = MyJob.new
12
- job.run
13
-
14
- So the smallest job class has to be instantiatable, and has to respond to the `run` message.
15
-
16
- ## Jobs with arguments and parameters
17
-
18
- Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
19
- directly translated to keyword arguments of the job constructor. With a job ticket like this:
20
-
21
- {
22
- "_job_class": "MyJob",
23
- "_job_params": {"ids": [1,2,3]}
24
- }
25
-
26
- the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
27
-
28
- job = MyJob.new(ids: [1,2,3])
29
- job.run
30
-
31
- Note that at this point only arguments that are raw JSON types are supported:
32
-
33
- * Hash
34
- * Array
35
- * Numeric
36
- * String
37
- * nil/false/true
38
-
39
- If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
40
-
41
- ## Jobs spawning dependent jobs
42
-
43
- If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
44
- be passed to the `run` method.
45
-
46
- job = MyJob.new(ids: [1,2,3])
47
- job.run(execution_context)
48
-
49
- The execution context has some useful methods:
50
-
51
- * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
52
- * `submit!` for submitting more jobs to the same queue
53
-
54
- A job submitting a subsequent job could look like this:
55
-
56
- class MyJob
57
- def run(ctx)
58
- ...
59
- ctx.submit!(DeferredCleanupJob.new)
60
- end
61
- end
62
-
63
- ## Job submission
64
-
65
- In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
66
- include all the keyword arguments needed to instantiate the job when executing. For example:
67
-
68
- class SendMail
69
- def initialize(to:, body:)
70
- ...
71
- end
72
-
73
- def run()
74
- ...
75
- end
76
-
77
- def to_h
78
- {to: @to, body: @body}
79
- end
80
- end
81
-
82
- Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
83
-
84
- class SendMail < Ks.strict(:to, :body)
85
- def run
86
- ...
87
- end
88
- end
89
-
90
- ## Job marshaling
91
-
92
- By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
93
- override that object if you need to handle job tickets that come from external sources and do not necessarily
94
- conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
95
-
96
- class CustomSerializer < Sqewer::Serializer
97
- # Overridden so that we can instantiate a custom job
98
- # from the AWS notification payload.
99
- # Return "nil" and the job will be simply deleted from the queue
100
- def unserialize(message_blob)
101
- message = JSON.load(message_blob)
102
- return if message['Service'] # AWS test
103
- return HandleS3Notification.new(message) if message['Records']
104
-
105
- super # as default
106
- end
107
- end
108
-
109
- Or you can override the serialization method to add some metadata to the job ticket on job submission:
110
-
111
- class CustomSerializer < Sqewer::Serializer
112
- def serialize(job_object)
113
- json_blob = super
114
- parsed = JSON.load(json_blob)
115
- parsed['_submitter_host'] = Socket.gethostname
116
- JSON.dump(parsed)
117
- end
118
- end
119
-
120
- If you return `nil` from your `unserialize` method the job will not be executed,
121
- but will just be deleted from the SQS queue.
122
-
123
- ## Starting and running the worker
124
-
125
- The very minimal executable for running jobs would be this:
126
-
127
- #!/usr/bin/env ruby
128
- require 'my_applicaion'
129
- Sqewer::CLI.run
130
-
131
- This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
132
- use all the default parameters. The `CLI` module will also set up a signal handler to terminate
133
- the current jobs cleanly if the commandline app receives a USR1 and TERM.
134
-
135
- You can also run a worker without signal handling, for example in test
136
- environments. Note that the worker is asynchronous, it has worker threads
137
- which do all the operations by themselves.
138
-
139
- worker = Sqewer::Worker.new
140
- worker.start
141
- # ...and once you are done testing
142
- worker.stop
143
-
144
- ## Configuring the worker
145
-
146
- One of the reasons this library exists is that sometimes you need to set up some more
147
- things than usually assumed to be possible. For example, you might want to have a special
148
- logging library:
149
-
150
- worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
151
-
152
- Or you might want a different job serializer/deserializer (for instance, if you want to handle
153
- S3 bucket notifications coming into the same queue):
154
-
155
- worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
156
-
157
- You can also elect to inherit from the `Worker` class and override some default constructor
158
- arguments:
159
-
160
- class CustomWorker < Sqewer::Worker
161
- def initialize(**kwargs)
162
- super(serializer: CustomSerializer.new, ..., **kwargs)
163
- end
164
- end
165
-
166
- The `Sqewer::CLI` module that you run from the commandline handler application can be
167
- started with your custom Worker of choice:
168
-
169
- custom_worker = Sqewer::Worker.new(logger: special_logger)
170
- Sqewer::CLI.start(custom_worker)
171
-
172
- ## Threads versus processes
173
-
174
- sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
175
- management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
176
- to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
177
- you generate. For example, you could use a pipe. But in a more general case something like this can be used:
178
-
179
- class MyJob
180
- def run
181
- pid = fork do
182
- SomeRemoteService.reconnect # you are in the child process now
183
- ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
184
- end
185
- end
186
-
187
- _, status = Process.wait2(pid)
188
-
189
- # Raise an error in the parent process to signal Sqewer that the job failed
190
- # if the child exited with a non-0 status
191
- raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
192
- end
193
- end
194
-
195
- ## Execution and serialization wrappers (middleware)
196
-
197
- You can wrap job processing in middleware. A full-featured middleware class looks like this:
198
-
199
- class MyWrapper
200
- # Surrounds the job instantiation from the string coming from SQS.
201
- def around_deserialization(serializer, msg_id, msg_payload)
202
- # msg_id is the receipt handle, msg_payload is the message body string
203
- yield
204
- end
205
-
206
- # Surrounds the actual job execution
207
- def around_execution(job, context)
208
- # job is the actual job you will be running, context is the ExecutionContext.
209
- yield
210
- end
211
- end
212
-
213
- You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
214
-
215
- stack = Sqewer::MiddlewareStack.new
216
- stack << MyWrapper.new
217
- w = Sqewer::Worker.new(middleware_stack: stack)
218
-
219
- # Execution guarantees
220
-
221
- As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
222
- `ensure` clause.
223
-
224
- * When a job succeeds (raises no exceptions), it will be deleted from the queue
225
- * When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
226
- * When a job, or any wrapper routing of the job execution,
227
- raises any exception, the job will not be deleted
228
- * When a submit spun off from the job, or the deletion of the job itself,
229
- cause an exception, the job will not be deleted
230
-
231
- Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
232
- start at the same job at the same time), idempotent (a job should be able to run twice without errors),
233
- and traceable (make good use of logging).
data/FAQ.md DELETED
@@ -1,50 +0,0 @@
1
- # FAQ
2
-
3
- This document tries to answer some questions that may arise when reading or using the library. Hopefully
4
- this can provide some answers with regards to how things are put together.
5
-
6
- ## Why separate `new` and `run` methods instead of just `perform`?
7
-
8
- Because the job needs access to the execution context of the worker. It turned out that keeping the context
9
- in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
10
- to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
11
- it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
12
-
13
- Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
14
- to save those parameters in instance variables or in struct members.
15
-
16
- ## Why keyword constructors for jobs?
17
-
18
- Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
19
- by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
20
- products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
21
-
22
- ## Why no weighted queues?
23
-
24
- Because very often when you want to split queues servicing one application it means that you do not have enough
25
- capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
26
- whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
27
- stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
28
- products using this library, was a very bad idea (more workload for deployment).
29
-
30
- ## Why so many configurable components?
31
-
32
- Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
33
- implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
34
- needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
35
- Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
36
- pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
37
- have a magic `.default` method which will liberate you from most complexities.
38
-
39
- ## Why multithreading for workers?
40
-
41
- Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
42
- network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
43
- the operating system with too many processes. An optional feature for one-process-per-job is going to be added
44
- soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
45
-
46
- ## Why no Celluloid?
47
-
48
- Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
49
- the Ruby standard library alone.
50
-