sqewer 5.1.1 → 6.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 5a60bc89bd5a39d387bbb3375d370c93b9d02f43
4
- data.tar.gz: b8419eb16d2cfdff2ca1639f9b3895ffc29f0995
3
+ metadata.gz: 265bbc0bc13eeef0b5e99b04c509d99464e7d236
4
+ data.tar.gz: 48f95a807d92fc31956524b7ff07892851cf4369
5
5
  SHA512:
6
- metadata.gz: a05d2aee6a099a8eea61657e51b1bb975964ce17aeb95a2366616a315a10da457d1e4da59dcb549a2c4179982d1f56b88017a4d4ba9545a9b3f49117d7455baf
7
- data.tar.gz: 45dcf60818e4bf48c2cb929267c91f29dc7acfd33ccad4915e2d3c88b93033aa63c904b789100f88c546d37c4e50517b2535d0d340691771bcb7730261a257d9
6
+ metadata.gz: 6e6329fcbc8e9ba24adc4f98231a6875c94536de2d74c72fd3b670f3b0b04f27466e7776e3ce9d63cb7631f4ce82bc2143d40df96d4c3f5f3659ce5d6ae90b8c
7
+ data.tar.gz: '0319a0135413668649e37b3f0ad449eff0d857d8e5cae912111385d02bc1fa5090e4f0749388eb8fb6d0fb8417fb78102022c05efaea0e87443700353eafbaf0'
data/README.md CHANGED
@@ -48,19 +48,354 @@ The messages will only be deleted from SQS once the job execution completes with
48
48
 
49
49
  ## Requirements
50
50
 
51
- Ruby 2.1+, version 2 of the AWS SDK.
51
+ Ruby 2.1+, version 2 of the AWS SDK. You can also run Sqewer backed by a SQLite database file, which can be handy for development situations.
52
52
 
53
- ## Detailed usage instructions
53
+ ## Job storage
54
54
 
55
- For more detailed usage information, see [DETAILS.md](./DETAILS.md)
55
+ Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
56
56
 
57
- ## Frequently asked questions (A.K.A. _why is it done this way_)
57
+ {"_job_class": "MyJob", "_job_params": null}
58
58
 
59
- Please see [FAQ.md](./FAQ.md). This might explain some decisions behind the library in greater detail.
59
+ When this ticket is being picked up by the worker, the worker will do the following:
60
60
 
61
- ## Usage with Rails via ActiveJob
61
+ job = MyJob.new
62
+ job.run
62
63
 
63
- Please see [ACTIVE_JOB.md](./ACTIVE_JOB.md) for the exact description.
64
+ So the smallest job class has to be instantiatable, and has to respond to the `run` message.
65
+
66
+ ## Jobs with arguments and parameters
67
+
68
+ Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
69
+ directly translated to keyword arguments of the job constructor. With a job ticket like this:
70
+
71
+ {
72
+ "_job_class": "MyJob",
73
+ "_job_params": {"ids": [1,2,3]}
74
+ }
75
+
76
+ the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
77
+
78
+ job = MyJob.new(ids: [1,2,3])
79
+ job.run
80
+
81
+ Note that at this point only arguments that are raw JSON types are supported:
82
+
83
+ * Hash
84
+ * Array
85
+ * Numeric
86
+ * String
87
+ * nil/false/true
88
+
89
+ If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
90
+
91
+ ## Jobs spawning dependent jobs
92
+
93
+ If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
94
+ be passed to the `run` method.
95
+
96
+ job = MyJob.new(ids: [1,2,3])
97
+ job.run(execution_context)
98
+
99
+ The execution context has some useful methods:
100
+
101
+ * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
102
+ * `submit!` for submitting more jobs to the same queue
103
+
104
+ A job submitting a subsequent job could look like this:
105
+
106
+ class MyJob
107
+ def run(ctx)
108
+ ...
109
+ ctx.submit!(DeferredCleanupJob.new)
110
+ end
111
+ end
112
+
113
+ ## Job submission
114
+
115
+ In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
116
+ include all the keyword arguments needed to instantiate the job when executing. For example:
117
+
118
+ class SendMail
119
+ def initialize(to:, body:)
120
+ ...
121
+ end
122
+
123
+ def run()
124
+ ...
125
+ end
126
+
127
+ def to_h
128
+ {to: @to, body: @body}
129
+ end
130
+ end
131
+
132
+ Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
133
+
134
+ class SendMail < Ks.strict(:to, :body)
135
+ def run
136
+ ...
137
+ end
138
+ end
139
+
140
+ ## Job marshaling
141
+
142
+ By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
143
+ override that object if you need to handle job tickets that come from external sources and do not necessarily
144
+ conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
145
+
146
+ class CustomSerializer < Sqewer::Serializer
147
+ # Overridden so that we can instantiate a custom job
148
+ # from the AWS notification payload.
149
+ # Return "nil" and the job will be simply deleted from the queue
150
+ def unserialize(message_blob)
151
+ message = JSON.load(message_blob)
152
+ return if message['Service'] # AWS test
153
+ return HandleS3Notification.new(message) if message['Records']
154
+
155
+ super # as default
156
+ end
157
+ end
158
+
159
+ Or you can override the serialization method to add some metadata to the job ticket on job submission:
160
+
161
+ class CustomSerializer < Sqewer::Serializer
162
+ def serialize(job_object)
163
+ json_blob = super
164
+ parsed = JSON.load(json_blob)
165
+ parsed['_submitter_host'] = Socket.gethostname
166
+ JSON.dump(parsed)
167
+ end
168
+ end
169
+
170
+ If you return `nil` from your `unserialize` method the job will not be executed,
171
+ but will just be deleted from the SQS queue.
172
+
173
+ ## Starting and running the worker
174
+
175
+ The very minimal executable for running jobs would be this:
176
+
177
+ #!/usr/bin/env ruby
178
+ require 'my_applicaion'
179
+ Sqewer::CLI.run
180
+
181
+ This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
182
+ use all the default parameters. The `CLI` module will also set up a signal handler to terminate
183
+ the current jobs cleanly if the commandline app receives a USR1 and TERM.
184
+
185
+ You can also run a worker without signal handling, for example in test
186
+ environments. Note that the worker is asynchronous, it has worker threads
187
+ which do all the operations by themselves.
188
+
189
+ worker = Sqewer::Worker.new
190
+ worker.start
191
+ # ...and once you are done testing
192
+ worker.stop
193
+
194
+ ## Configuring the worker
195
+
196
+ One of the reasons this library exists is that sometimes you need to set up some more
197
+ things than usually assumed to be possible. For example, you might want to have a special
198
+ logging library:
199
+
200
+ worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
201
+
202
+ Or you might want a different job serializer/deserializer (for instance, if you want to handle
203
+ S3 bucket notifications coming into the same queue):
204
+
205
+ worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
206
+
207
+ You can also elect to inherit from the `Worker` class and override some default constructor
208
+ arguments:
209
+
210
+ class CustomWorker < Sqewer::Worker
211
+ def initialize(**kwargs)
212
+ super(serializer: CustomSerializer.new, ..., **kwargs)
213
+ end
214
+ end
215
+
216
+ The `Sqewer::CLI` module that you run from the commandline handler application can be
217
+ started with your custom Worker of choice:
218
+
219
+ custom_worker = Sqewer::Worker.new(logger: special_logger)
220
+ Sqewer::CLI.start(custom_worker)
221
+
222
+ ## Threads versus processes
223
+
224
+ sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
225
+ management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
226
+ to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
227
+ you generate. For example, you could use a pipe. But in a more general case something like this can be used:
228
+
229
+ class MyJob
230
+ def run
231
+ pid = fork do
232
+ SomeRemoteService.reconnect # you are in the child process now
233
+ ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
234
+ end
235
+ end
236
+
237
+ _, status = Process.wait2(pid)
238
+
239
+ # Raise an error in the parent process to signal Sqewer that the job failed
240
+ # if the child exited with a non-0 status
241
+ raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
242
+ end
243
+ end
244
+
245
+ ## Execution and serialization wrappers (middleware)
246
+
247
+ You can wrap job processing in middleware. A full-featured middleware class looks like this:
248
+
249
+ class MyWrapper
250
+ # Surrounds the job instantiation from the string coming from SQS.
251
+ def around_deserialization(serializer, msg_id, msg_payload)
252
+ # msg_id is the receipt handle, msg_payload is the message body string
253
+ yield
254
+ end
255
+
256
+ # Surrounds the actual job execution
257
+ def around_execution(job, context)
258
+ # job is the actual job you will be running, context is the ExecutionContext.
259
+ yield
260
+ end
261
+ end
262
+
263
+ You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
264
+
265
+ stack = Sqewer::MiddlewareStack.new
266
+ stack << MyWrapper.new
267
+ w = Sqewer::Worker.new(middleware_stack: stack)
268
+
269
+ # Execution guarantees
270
+
271
+ As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
272
+ `ensure` clause.
273
+
274
+ * When a job succeeds (raises no exceptions), it will be deleted from the queue
275
+ * When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
276
+ * When a job, or any wrapper routing of the job execution,
277
+ raises any exception, the job will not be deleted
278
+ * When a submit spun off from the job, or the deletion of the job itself,
279
+ cause an exception, the job will not be deleted
280
+
281
+ Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
282
+ start at the same job at the same time), idempotent (a job should be able to run twice without errors),
283
+ and traceable (make good use of logging).
284
+
285
+ # Usage with Rails via ActiveJob
286
+
287
+ This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
288
+ is well-tested and should function for any well-conforming ActiveJob subclasses.
289
+
290
+ To run the default `sqewer` worker setup against your Rails application, first set it as the
291
+ executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
292
+ in the environment variables, and make sure you can access it using your default (envvar-based
293
+ or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
294
+
295
+ class Application < Rails::Application
296
+ ...
297
+ config.active_job.queue_adapter = :sqewer
298
+ end
299
+
300
+ and then run
301
+
302
+ $ bundle exec sqewer_rails
303
+
304
+ in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
305
+ for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
306
+ before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
307
+ play nice with threading. So do not forget to add this in your worker code:
308
+
309
+ Rails.application.eager_load!
310
+
311
+ For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
312
+
313
+ ## ActiveJob feature support matrix
314
+
315
+ Compared to the matrix of features as seen in the
316
+ [official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
317
+ `sqewer` has the following support for various ActiveJob options, in comparison to the builtin
318
+ ActiveJob adapters:
319
+
320
+ | | Async | Queues | Delayed | Priorities | Timeout | Retries |
321
+ |-------------------|-------|--------|------------|------------|---------|---------|
322
+ | sqewer | Yes | No | Yes | No | No | Global |
323
+ | // | // | // | // | // | // | // |
324
+ | Active Job Async | Yes | Yes | Yes | No | No | No |
325
+ | Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
326
+
327
+ Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
328
+ since all the messages go to the queue available to `Sqewer.submit!`.
329
+
330
+ There is no timeout handling, if you need it you may want to implement it within your jobs proper.
331
+ Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
332
+
333
+ ## Delay handling
334
+
335
+ Delayed execution is handled via a combination
336
+ of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
337
+ in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
338
+ delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
339
+
340
+ If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
341
+ a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
342
+ delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
343
+ queue with the same maximum delay, and will continue doing so for as long as necessary.
344
+
345
+ Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
346
+ it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
347
+ you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
348
+ for the actual, executable background task.
349
+
350
+ # Frequently asked questions (A.K.A. _why is it done this way_)
351
+
352
+ This document tries to answer some questions that may arise when reading or using the library. Hopefully
353
+ this can provide some answers with regards to how things are put together.
354
+
355
+ ## Why separate `new` and `run` methods instead of just `perform`?
356
+
357
+ Because the job needs access to the execution context of the worker. It turned out that keeping the context
358
+ in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
359
+ to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
360
+ it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
361
+
362
+ Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
363
+ to save those parameters in instance variables or in struct members.
364
+
365
+ ## Why keyword constructors for jobs?
366
+
367
+ Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
368
+ by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
369
+ products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
370
+
371
+ ## Why no weighted queues?
372
+
373
+ Because very often when you want to split queues servicing one application it means that you do not have enough
374
+ capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
375
+ whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
376
+ stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
377
+ products using this library, was a very bad idea (more workload for deployment).
378
+
379
+ ## Why so many configurable components?
380
+
381
+ Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
382
+ implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
383
+ needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
384
+ Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
385
+ pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
386
+ have a magic `.default` method which will liberate you from most complexities.
387
+
388
+ ## Why multithreading for workers?
389
+
390
+ Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
391
+ network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
392
+ the operating system with too many processes. An optional feature for one-process-per-job is going to be added
393
+ soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
394
+
395
+ ## Why no Celluloid?
396
+
397
+ Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
398
+ the Ruby standard library alone.
64
399
 
65
400
  ## Contributing to the library
66
401
 
@@ -7,7 +7,7 @@ module Sqewer
7
7
  require path
8
8
  end
9
9
  end
10
-
10
+
11
11
  # Loads a particular Sqewer extension that is not loaded
12
12
  # automatically during the gem require.
13
13
  #
@@ -16,7 +16,7 @@ module Sqewer
16
16
  path = File.join("sqewer", "extensions", extension_name)
17
17
  require_relative path
18
18
  end
19
-
19
+
20
20
  # Shortcut access to Submitter#submit.
21
21
  #
22
22
  # @see {Sqewer::Submitter#submit!}
@@ -43,7 +43,7 @@ class Sqewer::Connection
43
43
  #
44
44
  # @param queue_url[String] the SQS queue URL (the URL can be copied from your AWS console)
45
45
  def initialize(queue_url)
46
- require 'aws-sdk'
46
+ require 'aws-sdk-sqs'
47
47
  @queue_url = queue_url
48
48
  end
49
49
 
@@ -1,3 +1,3 @@
1
1
  module Sqewer
2
- VERSION = '5.1.1'
2
+ VERSION = '6.0.0'
3
3
  end
@@ -29,7 +29,7 @@ Gem::Specification.new do |spec|
29
29
  spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
30
30
  spec.require_paths = ["lib"]
31
31
 
32
- spec.add_runtime_dependency 'aws-sdk', '~> 2'
32
+ spec.add_runtime_dependency 'aws-sdk-sqs', '~> 1'
33
33
  spec.add_runtime_dependency 'rack'
34
34
  spec.add_runtime_dependency 'very_tiny_state_machine'
35
35
  spec.add_runtime_dependency 'ks'
metadata CHANGED
@@ -1,29 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sqewer
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.1.1
4
+ version: 6.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Julik Tarkhanov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-08-31 00:00:00.000000000 Z
11
+ date: 2017-09-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: aws-sdk
14
+ name: aws-sdk-sqs
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '2'
19
+ version: '1'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '2'
26
+ version: '1'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rack
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -219,10 +219,7 @@ files:
219
219
  - ".gitlab-ci.yml"
220
220
  - ".travis.yml"
221
221
  - ".yardopts"
222
- - ACTIVE_JOB.md
223
222
  - CHANGELOG.md
224
- - DETAILS.md
225
- - FAQ.md
226
223
  - Gemfile
227
224
  - README.md
228
225
  - Rakefile
@@ -1,64 +0,0 @@
1
- # Sqewer with ActiveJob
2
-
3
- This gem includes a queue adapter for usage with ActiveJob in Rails 4.2+. The functionality
4
- is well-tested and should function for any well-conforming ActiveJob subclasses.
5
-
6
- To run the default `sqewer` worker setup against your Rails application, first set it as the
7
- executing backend for ActiveJob in your Rails app configuration, set your `SQS_QUEUE_URL`
8
- in the environment variables, and make sure you can access it using your default (envvar-based
9
- or machine role based) AWS credentials. Then, set sqewer as the adapter for ActiveJob:
10
-
11
- class Application < Rails::Application
12
- ...
13
- config.active_job.queue_adapter = :sqewer
14
- end
15
-
16
- and then run
17
-
18
- $ bundle exec sqewer_rails
19
-
20
- in your rails source tree, via a foreman Procfile or similar. If you want to run your own worker binary
21
- for executing the jobs, be aware that you _have_ to eager-load your Rails application's code explicitly
22
- before the Sqewer worker is started. The worker is threaded and any kind of autoloading does not generally
23
- play nice with threading. So do not forget to add this in your worker code:
24
-
25
- Rails.application.eager_load!
26
-
27
- For handling error reporting within your Sqewer worker, set up a middleware stack as described in the documentation.
28
-
29
- ## ActiveJob feature support matrix
30
-
31
- Compared to the matrix of features as seen in the
32
- [official ActiveJob documentation](http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html)
33
- `sqewer` has the following support for various ActiveJob options, in comparison to the builtin
34
- ActiveJob adapters:
35
-
36
- | | Async | Queues | Delayed | Priorities | Timeout | Retries |
37
- |-------------------|-------|--------|------------|------------|---------|---------|
38
- | sqewer | Yes | No | Yes | No | No | Global |
39
- | // | // | // | // | // | // | // |
40
- | Active Job Async | Yes | Yes | Yes | No | No | No |
41
- | Active Job Inline | No | Yes | N/A | N/A | N/A | N/A |
42
-
43
- Retries are set up globally for the entire SQS queue. There is no specific queue setting per job,
44
- since all the messages go to the queue available to `Sqewer.submit!`.
45
-
46
- There is no timeout handling, if you need it you may want to implement it within your jobs proper.
47
- Retries are handled on Sqewer level for as many deliveries as your SQS settings permit.
48
-
49
- ## Delay handling
50
-
51
- Delayed execution is handled via a combination
52
- of the `delay_seconds` SQS parameter and the `_execute_after` job key (see the serializer documentation
53
- in Sqewer for more). In a nutshell - if you postpone a job by less than 900 seconds, the standard delivery
54
- delay option will be used - and the job will become visible for workers on the SQS queue only after this period.
55
-
56
- If a larger delay is used, the job will receive an additional field called `_execute_after`, which will contain
57
- a UNIX timestamp in seconds of when it must be executed at the earliest. In addition, the maximum permitted SQS
58
- delivery delay will be set for it. If the job then gets redelivered, Sqewer will automatically put it back on the
59
- queue with the same maximum delay, and will continue doing so for as long as necessary.
60
-
61
- Note that this will incur extra receives and sends on the queue, and even though it is not substantial,
62
- it will not be free. We think that this is an acceptable workaround for now, though. If you want a better approach,
63
- you may be better off using a Rails scheduling system and use a cron job or similar to spin up your enqueue
64
- for the actual, executable background task.
data/DETAILS.md DELETED
@@ -1,233 +0,0 @@
1
- A more in-depth explanation of the systems below.
2
-
3
- ## Job storage
4
-
5
- Jobs are (by default) stored in SQS as JSON blobs. A very simple job ticket looks like this:
6
-
7
- {"_job_class": "MyJob", "_job_params": null}
8
-
9
- When this ticket is being picked up by the worker, the worker will do the following:
10
-
11
- job = MyJob.new
12
- job.run
13
-
14
- So the smallest job class has to be instantiatable, and has to respond to the `run` message.
15
-
16
- ## Jobs with arguments and parameters
17
-
18
- Job parameters can be passed as keyword arguments. Properties in the job ticket (encoded as JSON) are
19
- directly translated to keyword arguments of the job constructor. With a job ticket like this:
20
-
21
- {
22
- "_job_class": "MyJob",
23
- "_job_params": {"ids": [1,2,3]}
24
- }
25
-
26
- the worker will instantiate your `MyJob` class with the `ids:` keyword argument:
27
-
28
- job = MyJob.new(ids: [1,2,3])
29
- job.run
30
-
31
- Note that at this point only arguments that are raw JSON types are supported:
32
-
33
- * Hash
34
- * Array
35
- * Numeric
36
- * String
37
- * nil/false/true
38
-
39
- If you need marshalable Ruby types there instead, you might need to implement a custom `Serializer.`
40
-
41
- ## Jobs spawning dependent jobs
42
-
43
- If your `run` method on the job object accepts arguments (has non-zero `arity` ) the `ExecutionContext` will
44
- be passed to the `run` method.
45
-
46
- job = MyJob.new(ids: [1,2,3])
47
- job.run(execution_context)
48
-
49
- The execution context has some useful methods:
50
-
51
- * `logger`, for logging the state of the current job. The logger messages will be prefixed with the job's `inspect`.
52
- * `submit!` for submitting more jobs to the same queue
53
-
54
- A job submitting a subsequent job could look like this:
55
-
56
- class MyJob
57
- def run(ctx)
58
- ...
59
- ctx.submit!(DeferredCleanupJob.new)
60
- end
61
- end
62
-
63
- ## Job submission
64
-
65
- In general, a job object that needs some arguments for instantiation must return a Hash from it's `to_h` method. The hash must
66
- include all the keyword arguments needed to instantiate the job when executing. For example:
67
-
68
- class SendMail
69
- def initialize(to:, body:)
70
- ...
71
- end
72
-
73
- def run()
74
- ...
75
- end
76
-
77
- def to_h
78
- {to: @to, body: @body}
79
- end
80
- end
81
-
82
- Or if you are using `ks` gem (https://rubygems.org/gems/ks) you could inherit your Job from it:
83
-
84
- class SendMail < Ks.strict(:to, :body)
85
- def run
86
- ...
87
- end
88
- end
89
-
90
- ## Job marshaling
91
-
92
- By default, the jobs are converted to JSON and back from JSON using the Sqewer::Serializer object. You can
93
- override that object if you need to handle job tickets that come from external sources and do not necessarily
94
- conform to the job serialization format used internally. For example, you can handle S3 bucket notifications:
95
-
96
- class CustomSerializer < Sqewer::Serializer
97
- # Overridden so that we can instantiate a custom job
98
- # from the AWS notification payload.
99
- # Return "nil" and the job will be simply deleted from the queue
100
- def unserialize(message_blob)
101
- message = JSON.load(message_blob)
102
- return if message['Service'] # AWS test
103
- return HandleS3Notification.new(message) if message['Records']
104
-
105
- super # as default
106
- end
107
- end
108
-
109
- Or you can override the serialization method to add some metadata to the job ticket on job submission:
110
-
111
- class CustomSerializer < Sqewer::Serializer
112
- def serialize(job_object)
113
- json_blob = super
114
- parsed = JSON.load(json_blob)
115
- parsed['_submitter_host'] = Socket.gethostname
116
- JSON.dump(parsed)
117
- end
118
- end
119
-
120
- If you return `nil` from your `unserialize` method the job will not be executed,
121
- but will just be deleted from the SQS queue.
122
-
123
- ## Starting and running the worker
124
-
125
- The very minimal executable for running jobs would be this:
126
-
127
- #!/usr/bin/env ruby
128
- require 'my_applicaion'
129
- Sqewer::CLI.run
130
-
131
- This will connect to the queue at the URL set in the `SQS_QUEUE_URL` environment variable, and
132
- use all the default parameters. The `CLI` module will also set up a signal handler to terminate
133
- the current jobs cleanly if the commandline app receives a USR1 and TERM.
134
-
135
- You can also run a worker without signal handling, for example in test
136
- environments. Note that the worker is asynchronous, it has worker threads
137
- which do all the operations by themselves.
138
-
139
- worker = Sqewer::Worker.new
140
- worker.start
141
- # ...and once you are done testing
142
- worker.stop
143
-
144
- ## Configuring the worker
145
-
146
- One of the reasons this library exists is that sometimes you need to set up some more
147
- things than usually assumed to be possible. For example, you might want to have a special
148
- logging library:
149
-
150
- worker = Sqewer::Worker.new(logger: MyCustomLogger.new)
151
-
152
- Or you might want a different job serializer/deserializer (for instance, if you want to handle
153
- S3 bucket notifications coming into the same queue):
154
-
155
- worker = Sqewer::Worker.new(serializer: CustomSerializer.new)
156
-
157
- You can also elect to inherit from the `Worker` class and override some default constructor
158
- arguments:
159
-
160
- class CustomWorker < Sqewer::Worker
161
- def initialize(**kwargs)
162
- super(serializer: CustomSerializer.new, ..., **kwargs)
163
- end
164
- end
165
-
166
- The `Sqewer::CLI` module that you run from the commandline handler application can be
167
- started with your custom Worker of choice:
168
-
169
- custom_worker = Sqewer::Worker.new(logger: special_logger)
170
- Sqewer::CLI.start(custom_worker)
171
-
172
- ## Threads versus processes
173
-
174
- sqewer uses threads. If you need to run your job from a forked subprocess (primarily for memory
175
- management reasons) you can do so from the `run` method. Note that you might need to apply extra gymnastics
176
- to submit extra jobs in this case, as it is the job of the controlling worker thread to submit the messages
177
- you generate. For example, you could use a pipe. But in a more general case something like this can be used:
178
-
179
- class MyJob
180
- def run
181
- pid = fork do
182
- SomeRemoteService.reconnect # you are in the child process now
183
- ActiveRAMGobbler.fetch_stupendously_many_things.each do |...|
184
- end
185
- end
186
-
187
- _, status = Process.wait2(pid)
188
-
189
- # Raise an error in the parent process to signal Sqewer that the job failed
190
- # if the child exited with a non-0 status
191
- raise "Child process crashed" unless status.exitstatus && status.exitstatus.zero?
192
- end
193
- end
194
-
195
- ## Execution and serialization wrappers (middleware)
196
-
197
- You can wrap job processing in middleware. A full-featured middleware class looks like this:
198
-
199
- class MyWrapper
200
- # Surrounds the job instantiation from the string coming from SQS.
201
- def around_deserialization(serializer, msg_id, msg_payload)
202
- # msg_id is the receipt handle, msg_payload is the message body string
203
- yield
204
- end
205
-
206
- # Surrounds the actual job execution
207
- def around_execution(job, context)
208
- # job is the actual job you will be running, context is the ExecutionContext.
209
- yield
210
- end
211
- end
212
-
213
- You need to set up a `MiddlewareStack` and supply it to the `Worker` when instantiating:
214
-
215
- stack = Sqewer::MiddlewareStack.new
216
- stack << MyWrapper.new
217
- w = Sqewer::Worker.new(middleware_stack: stack)
218
-
219
- # Execution guarantees
220
-
221
- As a queue worker system, Sqewer makes a number of guarantees, which are as solid as the Ruby's
222
- `ensure` clause.
223
-
224
- * When a job succeeds (raises no exceptions), it will be deleted from the queue
225
- * When a job submits other jobs, and succeeds, the submitted jobs will be sent to the queue
226
- * When a job, or any wrapper routing of the job execution,
227
- raises any exception, the job will not be deleted
228
- * When a submit spun off from the job, or the deletion of the job itself,
229
- cause an exception, the job will not be deleted
230
-
231
- Use those guarantees to your advantage. Always make your jobs horizontally repeatable (if two hosts
232
- start at the same job at the same time), idempotent (a job should be able to run twice without errors),
233
- and traceable (make good use of logging).
data/FAQ.md DELETED
@@ -1,50 +0,0 @@
1
- # FAQ
2
-
3
- This document tries to answer some questions that may arise when reading or using the library. Hopefully
4
- this can provide some answers with regards to how things are put together.
5
-
6
- ## Why separate `new` and `run` methods instead of just `perform`?
7
-
8
- Because the job needs access to the execution context of the worker. It turned out that keeping the context
9
- in global/thread/class variables was somewhat nasty, and jobs needed access to the current execution context
10
- to enqueue the subsequent jobs, and to get access to loggers (and other context-sensitive objects). Therefore
11
- it makes more sense to offer Jobs access to the execution context, and to make a Job a command object.
12
-
13
- Also, Jobs usually use their parameters in multiple smaller methods down the line. It therefore makes sense
14
- to save those parameters in instance variables or in struct members.
15
-
16
- ## Why keyword constructors for jobs?
17
-
18
- Because keyword constructors map very nicely to JSON objects and provide some (at least rudimentary) arity safety,
19
- by checking for missing keywords and by allowing default keyword argument values. Also, we already have some
20
- products that use those job formats. Some have dozens of classes of jobs, all with those signatures and tests.
21
-
22
- ## Why no weighted queues?
23
-
24
- Because very often when you want to split queues servicing one application it means that you do not have enough
25
- capacity to serve all of the job _types_ in a timely manner. Then you try to assign priority to separate jobs,
26
- whereas in fact what you need are jobs that execute _roughly_ at the same speed - so that your workers do not
27
- stall when clogged with mostly-long jobs. Also, multiple queues introduce more configuration, which, for most
28
- products using this library, was a very bad idea (more workload for deployment).
29
-
30
- ## Why so many configurable components?
31
-
32
- Because sometimes your requirements differ just-a-little-bit from what is provided, and you have to swap your
33
- implementation in instead. One product needs foreign-submitted SQS jobs (S3 notifications). Another product
34
- needs a custom Logger subclass. Yet another product needs process-based concurrency on top of threads.
35
- Yet another process needs to manage database connections when running the jobs. Have 3-4 of those, and a
36
- pretty substantial union of required features will start to emerge. Do not fear - most classes of the library
37
- have a magic `.default` method which will liberate you from most complexities.
38
-
39
- ## Why multithreading for workers?
40
-
41
- Because it is fast and relatively memory-efficient. Most of the workload we encountered was IO-bound or even
42
- network-IO bound. In that situation it makes more sense to use threads that switch quickly, instead of burdening
43
- the operating system with too many processes. An optional feature for one-process-per-job is going to be added
44
- soon, for tasks that really warrant it (like image manipulation). For now, however, threads are working quite OK.
45
-
46
- ## Why no Celluloid?
47
-
48
- Because I found that a producer-consumer model with a thread pool works quite well, and can be created based on
49
- the Ruby standard library alone.
50
-