qless 0.9.1
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +8 -0
- data/HISTORY.md +168 -0
- data/README.md +571 -0
- data/Rakefile +28 -0
- data/bin/qless-campfire +106 -0
- data/bin/qless-growl +99 -0
- data/bin/qless-web +23 -0
- data/lib/qless.rb +185 -0
- data/lib/qless/config.rb +31 -0
- data/lib/qless/job.rb +259 -0
- data/lib/qless/job_reservers/ordered.rb +23 -0
- data/lib/qless/job_reservers/round_robin.rb +34 -0
- data/lib/qless/lua.rb +25 -0
- data/lib/qless/qless-core/cancel.lua +71 -0
- data/lib/qless/qless-core/complete.lua +218 -0
- data/lib/qless/qless-core/config.lua +44 -0
- data/lib/qless/qless-core/depends.lua +65 -0
- data/lib/qless/qless-core/fail.lua +107 -0
- data/lib/qless/qless-core/failed.lua +83 -0
- data/lib/qless/qless-core/get.lua +37 -0
- data/lib/qless/qless-core/heartbeat.lua +50 -0
- data/lib/qless/qless-core/jobs.lua +41 -0
- data/lib/qless/qless-core/peek.lua +155 -0
- data/lib/qless/qless-core/pop.lua +278 -0
- data/lib/qless/qless-core/priority.lua +32 -0
- data/lib/qless/qless-core/put.lua +156 -0
- data/lib/qless/qless-core/queues.lua +58 -0
- data/lib/qless/qless-core/recur.lua +181 -0
- data/lib/qless/qless-core/retry.lua +73 -0
- data/lib/qless/qless-core/ruby/lib/qless-core.rb +1 -0
- data/lib/qless/qless-core/ruby/lib/qless/core.rb +13 -0
- data/lib/qless/qless-core/ruby/lib/qless/core/version.rb +5 -0
- data/lib/qless/qless-core/ruby/spec/qless_core_spec.rb +13 -0
- data/lib/qless/qless-core/stats.lua +92 -0
- data/lib/qless/qless-core/tag.lua +100 -0
- data/lib/qless/qless-core/track.lua +79 -0
- data/lib/qless/qless-core/workers.lua +69 -0
- data/lib/qless/queue.rb +141 -0
- data/lib/qless/server.rb +411 -0
- data/lib/qless/tasks.rb +10 -0
- data/lib/qless/version.rb +3 -0
- data/lib/qless/worker.rb +195 -0
- metadata +239 -0
data/Gemfile
ADDED
data/HISTORY.md
ADDED
@@ -0,0 +1,168 @@
|
|
1
|
+
qless
|
2
|
+
=====
|
3
|
+
|
4
|
+
My hope for qless is that it will make certain aspects of pipeline management will be made
|
5
|
+
easier. For the moment, this is a stream of consciousness document meant to capture the
|
6
|
+
features that have been occurring to me lately. After these, I have some initial thoughts
|
7
|
+
on the implementation, concluding with the outstanding __questions__ I have.
|
8
|
+
|
9
|
+
I welcome input on any of this.
|
10
|
+
|
11
|
+
Context
|
12
|
+
-------
|
13
|
+
|
14
|
+
This is a subject that has been on my mind in particular in three contexts:
|
15
|
+
|
16
|
+
1. `custom crawl` -- queue management has always been an annoyance, and it's reaching the
|
17
|
+
breaking point for me
|
18
|
+
1. `freshscape` -- I'm going to be encountering very similar problems like these in freshscape,
|
19
|
+
and I'd like to be able to avoid some of the difficulties I've encountered.
|
20
|
+
1. `general` -- There are a lot of contexts in which such a system would be useful.
|
21
|
+
__Update__ Myron pointed out that in fact `resque` is built on a simple protocol,
|
22
|
+
where each job is a JSON blob with two keys: `id` and `args`. That makes me feel
|
23
|
+
like this is on the right track!
|
24
|
+
|
25
|
+
Feature Requests
|
26
|
+
----------------
|
27
|
+
|
28
|
+
Some of the features that I'm really after include
|
29
|
+
|
30
|
+
1. __Jobs should not get dropped on the floor__ -- This has been a problem for certain
|
31
|
+
projects, including our custom crawler. In this case, jobs had a propensity for
|
32
|
+
getting lost in the shuffle.
|
33
|
+
1. __Job stats should be available__ -- It would be nice to be able to track summary statistics
|
34
|
+
in one place. Perhaps about the number currently in each stage, waiting for each stage,
|
35
|
+
time spent in each stage, number of retries, etc.
|
36
|
+
1. __Job movement should be atomic__ -- One of the problems we've encountered with using
|
37
|
+
Redis is that it's been hard to keep items moving from one queue to another in an atomic
|
38
|
+
way. This has the unfortunate effect of making it difficult to trust the queues to hold
|
39
|
+
any real meaning. For example, the queues use both a list and a hash to track items, and
|
40
|
+
the lengths of the two often get out of sync.
|
41
|
+
1. __Retry logic__ -- For this, I believe we need the ability to support some automatic
|
42
|
+
retry logic. This should be configurable, and based on the stage
|
43
|
+
1. __Data lookups should be easy__ -- It's been difficult to quickly identify a work item and
|
44
|
+
get information on its state. We've usually had to rely on an external storage for this.
|
45
|
+
1. __Manual requeuing__ -- We should be able to safely and atomically move items from one
|
46
|
+
queue into another. We've had problems with race conditions in the past.
|
47
|
+
1. __Priority__ -- Jobs should be describable with priority as well. On occasion we've had
|
48
|
+
to push items through more quickly than others, and it would be great if the underlying
|
49
|
+
system supported that.
|
50
|
+
1. __Tagging / Tracking__ -- It would be nice to be able to mark certain jobs with tags, and
|
51
|
+
as work items that we'd like to track. It should then be possible to get a summary along
|
52
|
+
the lines of "This is the state of the jobs you said you were interested in." I have a
|
53
|
+
system for this set up for my personally, and it has been _extremely_ useful.
|
54
|
+
1. __The system should be reliable and highly available__ -- We're trusting this system to
|
55
|
+
be a single point of failure, and as such it needs to be robust and dependable.
|
56
|
+
1. __High Performance__ -- We should be able to expect this system to support a large number
|
57
|
+
of jobs in a short amount of time. For some context, we need custom crawler to support
|
58
|
+
about 50k state transitions in a day, but my goal is to support millions of transitions
|
59
|
+
in a day, and my bonus goal is 10 million or so transitions in a day.
|
60
|
+
1. __Scheduled Work__ -- We should be able to schedule work items to be enqueued as some
|
61
|
+
specified time.
|
62
|
+
1. __UI__ -- It would be nice to have a useful interface providing insight into the state of
|
63
|
+
the pipeline(s).
|
64
|
+
1. __Namespaces__ -- It might be nice to be able to segment the jobs into namespaces based on
|
65
|
+
project, stage, type, etc. It shouldn't have any explicit meaning outside of partitioning
|
66
|
+
the work space.
|
67
|
+
1. __Language Agnosticism__ -- The lingua franca for this should be something supported by a
|
68
|
+
large number of languages, and the interface should likewise be supported by a large number
|
69
|
+
of languages. In this way, I'd like it to be possible that a job is handled by one language
|
70
|
+
in one stage, and conceivably another in a different stage.
|
71
|
+
1. __Clients Should be Easy to Write__ -- I don't want to put too much burden on the authors
|
72
|
+
of various clients, because I think this helps a project to gain adoption. But, the
|
73
|
+
out-of-the-box feature set should be compelling.
|
74
|
+
|
75
|
+
Thoughts / Recommendations
|
76
|
+
--------------------------
|
77
|
+
|
78
|
+
1. `Redis` as the storage engine. It's been heavily battle-tested, and it's highly available,
|
79
|
+
supports much of the data structures we'd need for these features (atomicity, priority,
|
80
|
+
robustness, performance, replicatable, good at saving state). To boot, it's widely available.
|
81
|
+
1. `JSON` as the lingua franca for communication of work units. Every language I've encountered
|
82
|
+
has strong support for it, it's expressive, and it's human readable.
|
83
|
+
|
84
|
+
Until recently, I had been imagining an HTTP server sitting in front of Redis. Mostly because
|
85
|
+
I figured that would be one way to make clients easy to write -- if all the logic is pinned
|
86
|
+
up in the server. That said, it's just a second moving part upon which to rely. And Myron
|
87
|
+
made the very compelling case for having the clients maintain the state and rely solely on
|
88
|
+
Redis. However, Redis 2.6 provides us with a way to get the best of both worlds -- clients
|
89
|
+
that are easy to write and yet smart enough to do everything themselves. This mechanism is
|
90
|
+
stored Lua scripts.
|
91
|
+
|
92
|
+
Redis has very tight integration with scripting language `Lua`, and the two major selling points
|
93
|
+
for us on this point are:
|
94
|
+
|
95
|
+
1. Atomicity _and_ performance -- Lua scripts are guaranteed to be the only thing running
|
96
|
+
on the Redis instance, and so it makes certain locking mechanisms significantly easier.
|
97
|
+
And since it's running on the actual redis instance, we can enjoy great performance.
|
98
|
+
1. All clients can use the same scripts -- Lua scripts for redis are loaded into the instance,
|
99
|
+
and then can be identified by a hash. But the language invoking them is irrelevant. As such,
|
100
|
+
the burden can still be placed on the original implementation, clients can be easy to
|
101
|
+
write, but still be smart enough to manage the queues themselves.
|
102
|
+
|
103
|
+
One other added benefit is that when using Lua, redis imports a C implementation of a JSON
|
104
|
+
parser and makes it available from Lua scripts. This is just icing on the cake.
|
105
|
+
|
106
|
+
Planned Features / Organization
|
107
|
+
===============================
|
108
|
+
|
109
|
+
All the smarts are essentially going to go into a collection of Lua scripts to be stored and
|
110
|
+
run on Redis. In addition to these Lua scripts, I'd like to provide a simple web interface
|
111
|
+
to provide pretty access to some of this functionality.
|
112
|
+
|
113
|
+
Round 1
|
114
|
+
-------
|
115
|
+
|
116
|
+
1. __Workers must heartbeat jobs__ -- When a worker is given a job, it is given an exclusive
|
117
|
+
lock, and no other worker will get that job so long as it continues to heartbeat. The
|
118
|
+
service keeps track of which locks are going to expire, and will give the work to another
|
119
|
+
worker if the original worker fails to check in. The expiry time is provided every time
|
120
|
+
work is given to a worker, and an updated time is provided when heartbeat-ing. If the
|
121
|
+
lock has been given to another worker, the heartbeat will return `false`.
|
122
|
+
1. __Stats A-Plenty__ -- Stats will be kept of when a job was enqueued for a stage, when it
|
123
|
+
was popped of to be worked on, and when it was completed. In addition, summary statistics
|
124
|
+
will be kept for all the stages.
|
125
|
+
1. __Job Data Stored Temporarily__ -- The data for each job will be stored temporarily. It's
|
126
|
+
yet to be determined exactly what the expiration policy will be, (either the last _k_
|
127
|
+
jobs or the last _x_ amount of time). But still, all the data about a job will be available
|
128
|
+
1. __Atomic Requeueing__ -- If a work item is moved from one queue to another, it is moved.
|
129
|
+
If a worker is in the middle of processing it, its heartbeat will not be renewed, and it
|
130
|
+
will not be allowed to complete.
|
131
|
+
1. __Scheduling / Priority__ -- Jobs can be scheduled to become active at a certain time.
|
132
|
+
This does not mean that the job will be worked on at that time, though. It simply means
|
133
|
+
that after a given scheduled time, it will be considered a candidate, and it will still
|
134
|
+
be subject to the normal priority rules. Priority is always given to an active job with
|
135
|
+
the lowest priority score.
|
136
|
+
1. __Tracking__ -- Jobs can be marked as being of particular interest, and their progress
|
137
|
+
can be tracked accordingly.
|
138
|
+
1. __Web App__ -- A simple web app would be nice
|
139
|
+
|
140
|
+
Round 2
|
141
|
+
-------
|
142
|
+
|
143
|
+
1. __Retry logic__ -- For this, I believe we need the ability to support some automatic
|
144
|
+
retry logic. This should be configurable, and based on the stage
|
145
|
+
1. __Tagging__ -- Jobs should be able to be tagged with certain meaningful flags, like a
|
146
|
+
version number for the software used to process it, or the workers used to process it.
|
147
|
+
|
148
|
+
Questions
|
149
|
+
=========
|
150
|
+
|
151
|
+
1. __Implicit Queue Creation__ -- Each queue needs some configuration, like the heartbeat rate,
|
152
|
+
the time to live for a job, etc. And not only that, but there might be additional more complicated
|
153
|
+
configuration (flow of control). So, which of these should be supported and which not?
|
154
|
+
|
155
|
+
1. Static queue definition -- at a very minimum, we should be able to configure some ahead of time
|
156
|
+
1. Dynamic queue creation -- should there just be another endpoint that allows queues to be added?
|
157
|
+
If so, should these queues then be saved to persist?
|
158
|
+
1. Implicit queue creation -- if we push to a non-existent queue, should we get a warning?
|
159
|
+
An error? Should the queue just be created with some sort of default values?
|
160
|
+
|
161
|
+
On the one hand, I would like to make the system very flexible and amenable to sort of
|
162
|
+
ad-hoc queues, but on the other hand, there may be non-default-able configuration values
|
163
|
+
for queues.
|
164
|
+
|
165
|
+
1. __Job Data Storage__ -- How long should we keep the data about jobs around? We'd like to be
|
166
|
+
able to get information about a job, but those should probably be expired. Should expiration
|
167
|
+
policy be set to hold jobs for a certain amount of time? Should this window be configured for
|
168
|
+
simply the last _k_ jobs?
|
data/README.md
ADDED
@@ -0,0 +1,571 @@
|
|
1
|
+
qless
|
2
|
+
=====
|
3
|
+
|
4
|
+
Qless is a powerful `Redis`-based job queueing system inspired by
|
5
|
+
[resque](https://github.com/defunkt/resque#readme),
|
6
|
+
but built on a collection of Lua scripts, maintained in the
|
7
|
+
[qless-core](https://github.com/seomoz/qless-core) repo.
|
8
|
+
|
9
|
+
Philosophy and Nomenclature
|
10
|
+
===========================
|
11
|
+
A `job` is a unit of work identified by a job id or `jid`. A `queue` can contain
|
12
|
+
several jobs that are scheduled to be run at a certain time, several jobs that are
|
13
|
+
waiting to run, and jobs that are currently running. A `worker` is a process on a
|
14
|
+
host, identified uniquely, that asks for jobs from the queue, performs some process
|
15
|
+
associated with that job, and then marks it as complete. When it's completed, it
|
16
|
+
can be put into another queue.
|
17
|
+
|
18
|
+
Jobs can only be in one queue at a time. That queue is whatever queue they were last
|
19
|
+
put in. So if a worker is working on a job, and you move it, the worker's request to
|
20
|
+
complete the job will be ignored.
|
21
|
+
|
22
|
+
A job can be `canceled`, which means it disappears into the ether, and we'll never
|
23
|
+
pay it any mind every again. A job can be `dropped`, which is when a worker fails
|
24
|
+
to heartbeat or complete the job in a timely fashion, or a job can be `failed`,
|
25
|
+
which is when a host recognizes some systematically problematic state about the
|
26
|
+
job. A worker should only fail a job if the error is likely not a transient one;
|
27
|
+
otherwise, that worker should just drop it and let the system reclaim it.
|
28
|
+
|
29
|
+
Features
|
30
|
+
========
|
31
|
+
|
32
|
+
1. __Jobs don't get dropped on the floor__ -- Sometimes workers drop jobs. Qless
|
33
|
+
automatically picks them back up and gives them to another worker
|
34
|
+
1. __Tagging / Tracking__ -- Some jobs are more interesting than others. Track those
|
35
|
+
jobs to get updates on their progress. Tag jobs with meaningful identifiers to
|
36
|
+
find them quickly in the UI.
|
37
|
+
1. __Job Dependencies__ -- One job might need to wait for another job to complete
|
38
|
+
1. __Stats__ -- `qless` automatically keeps statistics about how long jobs wait
|
39
|
+
to be processed and how long they take to be processed. Currently, we keep
|
40
|
+
track of the count, mean, standard deviation, and a histogram of these times.
|
41
|
+
1. __Job data is stored temporarily__ -- Job info sticks around for a configurable
|
42
|
+
amount of time so you can still look back on a job's history, data, etc.
|
43
|
+
1. __Priority__ -- Jobs with the same priority get popped in the order they were
|
44
|
+
inserted; a higher priority means that it gets popped faster
|
45
|
+
1. __Retry logic__ -- Every job has a number of retries associated with it, which are
|
46
|
+
renewed when it is put into a new queue or completed. If a job is repeatedly
|
47
|
+
dropped, then it is presumed to be problematic, and is automatically failed.
|
48
|
+
1. __Web App__ -- With the advent of a Ruby client, there is a Sinatra-based web
|
49
|
+
app that gives you control over certain operational issues
|
50
|
+
1. __Scheduled Work__ -- Until a job waits for a specified delay (defaults to 0),
|
51
|
+
jobs cannot be popped by workers
|
52
|
+
1. __Recurring Jobs__ -- Scheduling's all well and good, but we also support
|
53
|
+
jobs that need to recur periodically.
|
54
|
+
1. __Notifications__ -- Tracked jobs emit events on pubsub channels as they get
|
55
|
+
completed, failed, put, popped, etc. Use these events to get notified of
|
56
|
+
progress on jobs you're interested in.
|
57
|
+
|
58
|
+
Enqueing Jobs
|
59
|
+
=============
|
60
|
+
First things first, require `qless` and create a client. The client accepts all the
|
61
|
+
same arguments that you'd use when constructing a redis client.
|
62
|
+
|
63
|
+
``` ruby
|
64
|
+
require 'qless'
|
65
|
+
|
66
|
+
# Connect to localhost
|
67
|
+
client = Qless::Client.new
|
68
|
+
# Connect to somewhere else
|
69
|
+
client = Qless::Client.new(:host => 'foo.bar.com', :port => 1234)
|
70
|
+
```
|
71
|
+
|
72
|
+
Jobs should be classes or modules that define a `perform` method, which
|
73
|
+
must accept a single `job` argument:
|
74
|
+
|
75
|
+
``` ruby
|
76
|
+
class MyJobClass
|
77
|
+
def self.perform(job)
|
78
|
+
# job is an instance of `Qless::Job` and provides access to
|
79
|
+
# job.data, a means to cancel the job (job.cancel), and more.
|
80
|
+
end
|
81
|
+
end
|
82
|
+
```
|
83
|
+
|
84
|
+
Now you can access a queue, and add a job to that queue.
|
85
|
+
|
86
|
+
``` ruby
|
87
|
+
# This references a new or existing queue 'testing'
|
88
|
+
queue = client.queues['testing']
|
89
|
+
# Let's add a job, with some data. Returns Job ID
|
90
|
+
queue.put(MyJobClass, :hello => 'howdy')
|
91
|
+
# => "0c53b0404c56012f69fa482a1427ab7d"
|
92
|
+
# Now we can ask for a job
|
93
|
+
job = queue.pop
|
94
|
+
# => <Qless::Job 0c53b0404c56012f69fa482a1427ab7d (MyJobClass / testing)>
|
95
|
+
# And we can do the work associated with it!
|
96
|
+
job.perform
|
97
|
+
```
|
98
|
+
|
99
|
+
The job data must be serializable to JSON, and it is recommended
|
100
|
+
that you use a hash for it. See below for a list of the supported job options.
|
101
|
+
|
102
|
+
The argument returned by `queue.put` is the job ID, or jid. Every Qless
|
103
|
+
job has a unique jid, and it provides a means to interact with an
|
104
|
+
existing job:
|
105
|
+
|
106
|
+
``` ruby
|
107
|
+
# find an existing job by it's jid
|
108
|
+
job = client.jobs[jid]
|
109
|
+
|
110
|
+
# Query it to find out details about it:
|
111
|
+
job.klass # => the class of the job
|
112
|
+
job.queue # => the queue the job is in
|
113
|
+
job.data # => the data for the job
|
114
|
+
job.history # => the history of what has happened to the job sofar
|
115
|
+
job.dependencies # => the jids of other jobs that must complete before this one
|
116
|
+
job.dependents # => the jids of other jobs that depend on this one
|
117
|
+
job.priority # => the priority of this job
|
118
|
+
job.tags # => array of tags for this job
|
119
|
+
job.original_retries # => the number of times the job is allowed to be retried
|
120
|
+
job.retries_left # => the number of retries left
|
121
|
+
|
122
|
+
# You can also change the job in various ways:
|
123
|
+
job.move("some_other_queue") # move it to a new queue
|
124
|
+
job.cancel # cancel the job
|
125
|
+
job.tag("foo") # add a tag
|
126
|
+
job.untag("foo") # remove a tag
|
127
|
+
```
|
128
|
+
|
129
|
+
Running A Worker
|
130
|
+
================
|
131
|
+
|
132
|
+
The Qless ruby worker was heavily inspired by Resque's worker,
|
133
|
+
but thanks to the power of the qless-core lua scripts, it is
|
134
|
+
*much* simpler and you are welcome to write your own (e.g. if
|
135
|
+
you'd rather save memory by not forking the worker for each job).
|
136
|
+
|
137
|
+
As with resque...
|
138
|
+
|
139
|
+
* The worker forks a child process for each job in order to provide
|
140
|
+
resilience against memory leaks. Pass the `RUN_AS_SINGLE_PROCESS`
|
141
|
+
environment variable to force Qless to not fork the child process.
|
142
|
+
Single process mode should only be used in some test/dev
|
143
|
+
environments.
|
144
|
+
* The worker updates its procline with its status so you can see
|
145
|
+
what workers are doing using `ps`.
|
146
|
+
* The worker registers signal handlers so that you can control it
|
147
|
+
by sending it signals.
|
148
|
+
* The worker is given a list of queues to pop jobs off of.
|
149
|
+
* The worker logs out put based on `VERBOSE` or `VVERBOSE` (very
|
150
|
+
verbose) environment variables.
|
151
|
+
* Qless ships with a rake task (`qless:work`) for running workers.
|
152
|
+
It runs `qless:setup` before starting the main work loop so that
|
153
|
+
users can load their environment in that task.
|
154
|
+
* The sleep interval (for when there is no jobs available) can be
|
155
|
+
configured with the `INTERVAL` environment variable.
|
156
|
+
|
157
|
+
Resque uses queues for its notion of priority. In contrast, qless
|
158
|
+
has priority support built-in. Thus, the worker supports two strategies
|
159
|
+
for what order to pop jobs off the queues: ordered and round-robin.
|
160
|
+
The ordered reserver will keep popping jobs off the first queue until
|
161
|
+
it is empty, before trying to pop job off the second queue. The
|
162
|
+
round-robin reserver will pop a job off the first queue, then the second
|
163
|
+
queue, and so on. You could also easily implement your own.
|
164
|
+
|
165
|
+
To start a worker, load the qless rake tasks in your Rakefile, and
|
166
|
+
define a `qless:setup` task:
|
167
|
+
|
168
|
+
``` ruby
|
169
|
+
require 'qless/tasks'
|
170
|
+
namespace :qless do
|
171
|
+
task :setup do
|
172
|
+
require 'my_app/environment' # to ensure all job classes are loaded
|
173
|
+
|
174
|
+
# Set options via environment variables
|
175
|
+
# The only required option is QUEUES; the
|
176
|
+
# rest have reasonable defaults.
|
177
|
+
ENV['REDIS_URL'] ||= 'redis://some-host:7000/3'
|
178
|
+
ENV['QUEUES'] ||= 'fizz,buzz'
|
179
|
+
ENV['JOB_RESERVER'] ||= 'Ordered'
|
180
|
+
ENV['INTERVAL'] ||= '10' # 10 seconds
|
181
|
+
ENV['VERBOSE'] ||= 'true'
|
182
|
+
end
|
183
|
+
end
|
184
|
+
```
|
185
|
+
|
186
|
+
Then run the `qless:work` rake task:
|
187
|
+
|
188
|
+
```
|
189
|
+
rake qless:work
|
190
|
+
```
|
191
|
+
|
192
|
+
The following signals are supported:
|
193
|
+
|
194
|
+
* TERM: Shutdown immediately, stop processing jobs.
|
195
|
+
* INT: Shutdown immediately, stop processing jobs.
|
196
|
+
* QUIT: Shutdown after the current job has finished processing.
|
197
|
+
* USR1: Kill the forked child immediately, continue processing jobs.
|
198
|
+
* USR2: Don't process any new jobs
|
199
|
+
* CONT: Start processing jobs again after a USR2
|
200
|
+
|
201
|
+
You should send these to the master process, not the child.
|
202
|
+
|
203
|
+
Workers also support middleware modules that can be used to inject
|
204
|
+
logic before, after or around the processing of a single job in
|
205
|
+
the child process. This can be useful, for example, when you need to
|
206
|
+
re-establish a connection to your database in each job.
|
207
|
+
|
208
|
+
Define a module with an `around_perform` method that calls `super` where you
|
209
|
+
want the job to be processed:
|
210
|
+
|
211
|
+
``` ruby
|
212
|
+
module ReEstablishDBConnection
|
213
|
+
def around_perform(job)
|
214
|
+
MyORM.establish_connection
|
215
|
+
super
|
216
|
+
end
|
217
|
+
end
|
218
|
+
```
|
219
|
+
|
220
|
+
Then, mix-it into the worker class. You can mix-in as many
|
221
|
+
middleware modules as you like:
|
222
|
+
|
223
|
+
``` ruby
|
224
|
+
require 'qless/worker'
|
225
|
+
Qless::Worker.class_eval do
|
226
|
+
include ReEstablishDBConnection
|
227
|
+
include SomeOtherAwesomeMiddleware
|
228
|
+
end
|
229
|
+
```
|
230
|
+
|
231
|
+
Web Interface
|
232
|
+
=============
|
233
|
+
|
234
|
+
Qless ships with a resque-inspired web app that lets you easily
|
235
|
+
deal with failures and see what it is processing. If you're project
|
236
|
+
has a rack-based ruby web app, we recommend you mount Qless's web app
|
237
|
+
in it. Here's how you can do that with `Rack::Builder` in your `config.ru`:
|
238
|
+
|
239
|
+
``` ruby
|
240
|
+
Qless::Server.client = Qless::Client.new(:host => "some-host", :port => 7000)
|
241
|
+
|
242
|
+
Rack::Builder.new do
|
243
|
+
use SomeMiddleware
|
244
|
+
|
245
|
+
map('/some-other-app') { run Apps::Something.new }
|
246
|
+
map('/qless') { run Qless::Server.new }
|
247
|
+
end
|
248
|
+
```
|
249
|
+
|
250
|
+
For an app using Rails 3+, check the router documentation for how to mount
|
251
|
+
rack apps.
|
252
|
+
|
253
|
+
Job Dependencies
|
254
|
+
================
|
255
|
+
Let's say you have one job that depends on another, but the task definitions are
|
256
|
+
fundamentally different. You need to bake a turkey, and you need to make stuffing,
|
257
|
+
but you can't make the turkey until the stuffing is made:
|
258
|
+
|
259
|
+
``` ruby
|
260
|
+
queue = client.queues['cook']
|
261
|
+
stuffing_jid = queue.put(MakeStuffing, {:lots => 'of butter'})
|
262
|
+
turkey_jid = queue.put(MakeTurkey , {:with => 'stuffing'}, :depends=>[stuffing_jid])
|
263
|
+
```
|
264
|
+
|
265
|
+
When the stuffing job completes, the turkey job is unlocked and free to be processed.
|
266
|
+
|
267
|
+
Priority
|
268
|
+
========
|
269
|
+
Some jobs need to get popped sooner than others. Whether it's a trouble ticket, or
|
270
|
+
debugging, you can do this pretty easily when you put a job in a queue:
|
271
|
+
|
272
|
+
``` ruby
|
273
|
+
queue.put(MyJobClass, {:foo => 'bar'}, :priority => 10)
|
274
|
+
```
|
275
|
+
|
276
|
+
What happens when you want to adjust a job's priority while it's still waiting in
|
277
|
+
a queue?
|
278
|
+
|
279
|
+
``` ruby
|
280
|
+
job = client.jobs['0c53b0404c56012f69fa482a1427ab7d']
|
281
|
+
job.priority = 10
|
282
|
+
# Now this will get popped before any job of lower priority
|
283
|
+
```
|
284
|
+
|
285
|
+
Scheduled Jobs
|
286
|
+
==============
|
287
|
+
If you don't want a job to be run right away but some time in the future, you can
|
288
|
+
specify a delay:
|
289
|
+
|
290
|
+
``` ruby
|
291
|
+
# Run at least 10 minutes from now
|
292
|
+
queue.put(MyJobClass, {:foo => 'bar'}, :delay => 600)
|
293
|
+
```
|
294
|
+
|
295
|
+
This doesn't guarantee that job will be run exactly at 10 minutes. You can accomplish
|
296
|
+
this by changing the job's priority so that once 10 minutes has elapsed, it's put before
|
297
|
+
lesser-priority jobs:
|
298
|
+
|
299
|
+
``` ruby
|
300
|
+
# Run in 10 minutes
|
301
|
+
queue.put(MyJobClass, {:foo => 'bar'}, :delay => 600, :priority => 100)
|
302
|
+
```
|
303
|
+
|
304
|
+
Recurring Jobs
|
305
|
+
==============
|
306
|
+
Sometimes it's not enough simply to schedule one job, but you want to run jobs regularly.
|
307
|
+
In particular, maybe you have some batch operation that needs to get run once an hour and
|
308
|
+
you don't care what worker runs it. Recurring jobs are specified much like other jobs:
|
309
|
+
|
310
|
+
``` ruby
|
311
|
+
# Run every hour
|
312
|
+
queue.recur(MyJobClass, {:widget => 'warble'}, 3600)
|
313
|
+
# => 22ac75008a8011e182b24cf9ab3a8f3b
|
314
|
+
```
|
315
|
+
|
316
|
+
You can even access them in much the same way as you would normal jobs:
|
317
|
+
|
318
|
+
``` ruby
|
319
|
+
job = client.jobs['22ac75008a8011e182b24cf9ab3a8f3b']
|
320
|
+
# => < Qless::RecurringJob 22ac75008a8011e182b24cf9ab3a8f3b >
|
321
|
+
```
|
322
|
+
|
323
|
+
Changing the interval at which it runs after the fact is trivial:
|
324
|
+
|
325
|
+
``` ruby
|
326
|
+
# I think I only need it to run once every two hours
|
327
|
+
job.interval = 7200
|
328
|
+
```
|
329
|
+
|
330
|
+
If you want it to run every hour on the hour, but it's 2:37 right now, you can specify
|
331
|
+
an offset which is how long it should wait before popping the first job:
|
332
|
+
|
333
|
+
``` ruby
|
334
|
+
# 23 minutes of waiting until it should go
|
335
|
+
queue.recur(MyJobClass, {:howdy => 'hello'}, 3600, :offset => 23 * 60)
|
336
|
+
```
|
337
|
+
|
338
|
+
Recurring jobs also have priority, a configurable number of retries, and tags. These
|
339
|
+
settings don't apply to the recurring jobs, but rather the jobs that they create. In the
|
340
|
+
case where more than one interval passes before a worker tries to pop the job, __more than
|
341
|
+
one job is created__. The thinking is that while it's completely client-managed, the state
|
342
|
+
should not be dependent on how often workers are trying to pop jobs.
|
343
|
+
|
344
|
+
``` ruby
|
345
|
+
# Recur every minute
|
346
|
+
queue.recur(MyJobClass, {:lots => 'of jobs'}, 60)
|
347
|
+
# Wait 5 minutes
|
348
|
+
queue.pop(10).length
|
349
|
+
# => 5 jobs got popped
|
350
|
+
```
|
351
|
+
|
352
|
+
Configuration Options
|
353
|
+
=====================
|
354
|
+
You can get and set global (read: in the context of the same Redis instance) configuration
|
355
|
+
to change the behavior for heartbeating, and so forth. There aren't a tremendous number
|
356
|
+
of configuration options, but an important one is how long job data is kept around. Job
|
357
|
+
data is expired after it has been completed for `jobs-history` seconds, but is limited to
|
358
|
+
the last `jobs-history-count` completed jobs. These default to 50k jobs, and 30 days, but
|
359
|
+
depending on volume, your needs may change. To only keep the last 500 jobs for up to 7 days:
|
360
|
+
|
361
|
+
``` ruby
|
362
|
+
client.config['jobs-history'] = 7 * 86400
|
363
|
+
client.config['jobs-history-count'] = 500
|
364
|
+
```
|
365
|
+
|
366
|
+
Tagging / Tracking
|
367
|
+
==================
|
368
|
+
In qless, 'tracking' means flagging a job as important. Tracked jobs have a tab reserved
|
369
|
+
for them in the web interface, and they also emit subscribable events as they make progress
|
370
|
+
(more on that below). You can flag a job from the web interface, or the corresponding code:
|
371
|
+
|
372
|
+
``` ruby
|
373
|
+
client.jobs['b1882e009a3d11e192d0b174d751779d'].track
|
374
|
+
```
|
375
|
+
|
376
|
+
Jobs can be tagged with strings which are indexed for quick searches. For example, jobs
|
377
|
+
might be associated with customer accounts, or some other key that makes sense for your
|
378
|
+
project.
|
379
|
+
|
380
|
+
``` ruby
|
381
|
+
queue.put(MyJobClass, {:tags => 'aplenty'}, :tags => ['12345', 'foo', 'bar'])
|
382
|
+
```
|
383
|
+
|
384
|
+
This makes them searchable in the web interface, or from code:
|
385
|
+
|
386
|
+
``` ruby
|
387
|
+
jids = client.jobs.tagged('foo')
|
388
|
+
```
|
389
|
+
|
390
|
+
You can add or remove tags at will, too:
|
391
|
+
|
392
|
+
``` ruby
|
393
|
+
job = client.jobs['b1882e009a3d11e192d0b174d751779d']
|
394
|
+
job.tag('howdy', 'hello')
|
395
|
+
job.untag('foo', 'bar')
|
396
|
+
```
|
397
|
+
|
398
|
+
Notifications
|
399
|
+
=============
|
400
|
+
Tracked jobs emit events on specific pubsub channels as things happen to them. Whether
|
401
|
+
it's getting popped off of a queue, completed by a worker, etc. A good example of how
|
402
|
+
to make use of this is in the `qless-campfire` or `qless-growl`. The jist of it goes like
|
403
|
+
this, though:
|
404
|
+
|
405
|
+
``` ruby
|
406
|
+
client.events do |on|
|
407
|
+
on.canceled { |jid| puts "#{jid} canceled" }
|
408
|
+
on.stalled { |jid| puts "#{jid} stalled" }
|
409
|
+
on.track { |jid| puts "tracking #{jid}" }
|
410
|
+
on.untrack { |jid| puts "untracking #{jid}" }
|
411
|
+
on.completed { |jid| puts "#{jid} completed" }
|
412
|
+
on.failed { |jid| puts "#{jid} failed" }
|
413
|
+
on.popped { |jid| puts "#{jid} popped" }
|
414
|
+
on.put { |jid| puts "#{jid} put" }
|
415
|
+
end
|
416
|
+
```
|
417
|
+
|
418
|
+
Those familiar with redis pubsub will note that a redis connection can only be used
|
419
|
+
for pubsub-y commands once listening. For this reason, invoking `client.events` actually
|
420
|
+
creates a second connection so that `client` can still be used as it normally would be:
|
421
|
+
|
422
|
+
``` ruby
|
423
|
+
client.events do |on|
|
424
|
+
on.failed do |jid|
|
425
|
+
puts "#{jid} failed in #{client.jobs[jid].queue_name}"
|
426
|
+
end
|
427
|
+
end
|
428
|
+
```
|
429
|
+
|
430
|
+
Heartbeating
|
431
|
+
============
|
432
|
+
When a worker is given a job, it is given an exclusive lock to that job. That means
|
433
|
+
that job won't be given to any other worker, so long as the worker checks in with
|
434
|
+
progress on the job. By default, jobs have to either report back progress every 60
|
435
|
+
seconds, or complete it, but that's a configurable option. For longer jobs, this
|
436
|
+
may not make sense.
|
437
|
+
|
438
|
+
``` ruby
|
439
|
+
# Hooray! We've got a piece of work!
|
440
|
+
job = queue.pop
|
441
|
+
# How long until I have to check in?
|
442
|
+
job.ttl
|
443
|
+
# => 59
|
444
|
+
# Hey! I'm still working on it!
|
445
|
+
job.heartbeat
|
446
|
+
# => 1331326141.0
|
447
|
+
# Ok, I've got some more time. Oh! Now I'm done!
|
448
|
+
job.complete
|
449
|
+
```
|
450
|
+
|
451
|
+
If you want to increase the heartbeat in all queues,
|
452
|
+
|
453
|
+
``` ruby
|
454
|
+
# Now jobs get 10 minutes to check in
|
455
|
+
client.config['heartbeat'] = 600
|
456
|
+
# But the testing queue doesn't get as long.
|
457
|
+
client.queues['testing'].heartbeat = 300
|
458
|
+
```
|
459
|
+
|
460
|
+
When choosing a heartbeat interval, realize that this is the amount of time that
|
461
|
+
can pass before qless realizes if a job has been dropped. At the same time, you don't
|
462
|
+
want to burden qless with heartbeating every 10 seconds if your job is expected to
|
463
|
+
take several hours.
|
464
|
+
|
465
|
+
An idiom you're encouraged to use for long-running jobs that want to check in their
|
466
|
+
progress periodically:
|
467
|
+
|
468
|
+
``` ruby
|
469
|
+
# Wait until we have 5 minutes left on the heartbeat, and if we find that
|
470
|
+
# we've lost our lock on a job, then honorable fall on our sword
|
471
|
+
if (job.ttl < 300) && !job.heartbeat
|
472
|
+
return / die / exit
|
473
|
+
end
|
474
|
+
```
|
475
|
+
|
476
|
+
Stats
|
477
|
+
=====
|
478
|
+
One nice feature of `qless` is that you can get statistics about usage. Stats are
|
479
|
+
aggregated by day, so when you want stats about a queue, you need to say what queue
|
480
|
+
and what day you're talking about. By default, you just get the stats for today.
|
481
|
+
These stats include information about the mean job wait time, standard deviation,
|
482
|
+
and histogram. This same data is also provided for job completion:
|
483
|
+
|
484
|
+
``` ruby
|
485
|
+
# So, how're we doing today?
|
486
|
+
stats = client.stats.get('testing')
|
487
|
+
# => { 'run' => {'mean' => ..., }, 'wait' => {'mean' => ..., }}
|
488
|
+
```
|
489
|
+
|
490
|
+
Time
|
491
|
+
====
|
492
|
+
It's important to note that Redis doesn't allow access to the system time if you're
|
493
|
+
going to be making any manipulations to data (which our scripts do). And yet, we
|
494
|
+
have heartbeating. This means that the clients actually send the current time when
|
495
|
+
making most requests, and for consistency's sake, means that your workers must be
|
496
|
+
relatively synchronized. This doesn't mean down to the tens of milliseconds, but if
|
497
|
+
you're experiencing appreciable clock drift, you should investigate NTP. For what it's
|
498
|
+
worth, this hasn't been a problem for us, but most of our jobs have heartbeat intervals
|
499
|
+
of 30 minutes or more.
|
500
|
+
|
501
|
+
Ensuring Job Uniqueness
|
502
|
+
=======================
|
503
|
+
|
504
|
+
As mentioned above, Jobs are uniquely identied by an id--their jid.
|
505
|
+
Qless will generate a UUID for each enqueued job or you can specify
|
506
|
+
one manually:
|
507
|
+
|
508
|
+
``` ruby
|
509
|
+
queue.put(MyJobClass, { :hello => 'howdy' }, :jid => 'my-job-jid')
|
510
|
+
```
|
511
|
+
|
512
|
+
This can be useful when you want to ensure a job's uniqueness: simply
|
513
|
+
create a jid that is a function of the Job's class and data, it'll
|
514
|
+
guaranteed that Qless won't have multiple jobs with the same class
|
515
|
+
and data.
|
516
|
+
|
517
|
+
Setting Default Job Options
|
518
|
+
===========================
|
519
|
+
|
520
|
+
`Qless::Queue#put` accepts a number of job options (see above for their
|
521
|
+
semantics):
|
522
|
+
|
523
|
+
* jid
|
524
|
+
* delay
|
525
|
+
* priority
|
526
|
+
* tags
|
527
|
+
* retries
|
528
|
+
* depends
|
529
|
+
|
530
|
+
When enqueueing the same kind of job with the same args in multiple
|
531
|
+
places it's a pain to have to declare the job options every time.
|
532
|
+
Instead, you can define default job options directly on the job class:
|
533
|
+
|
534
|
+
``` ruby
|
535
|
+
class MyJobClass
|
536
|
+
def self.default_job_options(data)
|
537
|
+
{ :priority => 10, :delay => 100 }
|
538
|
+
end
|
539
|
+
end
|
540
|
+
|
541
|
+
queue.put(MyJobClass, { :some => "data" }, :delay => 10)
|
542
|
+
```
|
543
|
+
|
544
|
+
Individual jobs can still specify options, so in this example,
|
545
|
+
the job would be enqueued with a priority of 10 and a delay of 10.
|
546
|
+
|
547
|
+
Testing Jobs
|
548
|
+
============
|
549
|
+
When unit testing your jobs, you will probably want to avoid the
|
550
|
+
overhead of round-tripping them through redis. You can of course
|
551
|
+
use a mock job object and pass it to your job class's `perform`
|
552
|
+
method. Alternately, if you want a real full-fledged `Qless::Job`
|
553
|
+
instance without round-tripping it through Redis, use `Qless::Job.build`:
|
554
|
+
|
555
|
+
``` ruby
|
556
|
+
describe MyJobClass do
|
557
|
+
let(:client) { Qless::Client.new }
|
558
|
+
let(:job) { Qless::Job.build(client, MyJobClass, :data => { "some" => "data" }) }
|
559
|
+
|
560
|
+
it 'does something' do
|
561
|
+
MyJobClass.perform(job)
|
562
|
+
# make an assertion about what happened
|
563
|
+
end
|
564
|
+
end
|
565
|
+
```
|
566
|
+
|
567
|
+
The options hash passed to `Qless::Job.build` supports all the same
|
568
|
+
options a normal job supports. See
|
569
|
+
[the source](https://github.com/seomoz/qless/blob/master/lib/qless/job.rb)
|
570
|
+
for a full list.
|
571
|
+
|