distributed_job 3.0.0 → 3.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +9 -0
- data/README.md +39 -21
- data/distributed_job.gemspec +1 -1
- data/lib/distributed_job/job.rb +51 -11
- data/lib/distributed_job/version.rb +1 -1
- metadata +8 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 16178ecd755b5a3b3666711c23d6a29c87ec93ac04027df87b8e580c9f974599
|
4
|
+
data.tar.gz: c469b7af9c967395e6cf1fea21aac21d0cad14ac92a59eb8320c1cf601460c65
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 534bb12b74aa562af37d913164ffe134084b417cc5f1e51663682181bffcabb883501dcfc0bdeb8a2d60c9a95bfe872a3a6efad0c7ac015c27dcf3109d941643
|
7
|
+
data.tar.gz: 74a0c64a0f31e00e17a7e977a201b7a6c8f51c013960b83068023c4f3c7bddcb161bab9bef9abe005cf38407c2c3c35482d59753c20578d6cdbed4c62137a8a2
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,14 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## v3.1.0
|
4
|
+
|
5
|
+
* Added `DistributedJob::Job#push_all`
|
6
|
+
* Added `DistributedJob::Job#open_part?`
|
7
|
+
|
8
|
+
## v3.0.1
|
9
|
+
|
10
|
+
* Fix pipelining with regards to redis-rb 4.6.0
|
11
|
+
|
3
12
|
## v3.0.0
|
4
13
|
|
5
14
|
* Split `DistributedJob` in `DistributedJob::Client` and `DistributedJob::Job`
|
data/README.md
CHANGED
@@ -40,28 +40,28 @@ You can specify a `namespace` to be additionally used for redis keys and set a
|
|
40
40
|
every time when keys in redis are updated to guarantee that the distributed
|
41
41
|
job metadata is cleaned up properly from redis at some point in time.
|
42
42
|
|
43
|
-
Afterwards,
|
44
|
-
|
43
|
+
Afterwards, you have two options to add parts, i.e. units of work, to the
|
44
|
+
distributed job. The first option is to use `#push_all` and pass an enum:
|
45
45
|
|
46
46
|
```ruby
|
47
47
|
distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
|
48
|
+
distributed_job.push_all(["job1", "job2", "job3"])
|
48
49
|
|
49
|
-
|
50
|
-
|
51
|
-
|
50
|
+
Job1.perform_async(distributed_job.token)
|
51
|
+
Job2.perform_async(distributed_job.token)
|
52
|
+
Job3.perform_async(distributed_job.token)
|
52
53
|
|
53
54
|
distributed_job.token # can be used to query the status of the distributed job
|
54
55
|
```
|
55
56
|
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
or in the terminal, etc:
|
57
|
+
Here, 3 parts named `job1`, `job2` and `job3` are added to the distributed job
|
58
|
+
and then 3 corresponding background jobs are enqueued. It is important to push
|
59
|
+
the parts before the background jobs are enqueued. Otherwise the background
|
60
|
+
jobs maybe can't find them. The `token` must be passed to the background jobs,
|
61
|
+
such that the background job can update the status of the distributed job by
|
62
|
+
marking the respective part as done. The token can also be used to query the
|
63
|
+
status of the distributed job, e.g. on a job summary page or similar. You can
|
64
|
+
also show some progress bar in the browser or in the terminal, etc.
|
65
65
|
|
66
66
|
```ruby
|
67
67
|
# token is given via URL or via some other means
|
@@ -71,8 +71,29 @@ distributed_job.total # total number of parts
|
|
71
71
|
distributed_job.count # number of unfinished parts
|
72
72
|
distributed_job.finished? # whether or not all parts are finished
|
73
73
|
distributed_job.open_parts # returns all not yet finished part id's
|
74
|
+
|
75
|
+
distributed_job.done('job1') # marks the respective part as done
|
74
76
|
```
|
75
77
|
|
78
|
+
The second option is to use `#push_each`:
|
79
|
+
|
80
|
+
```ruby
|
81
|
+
distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
|
82
|
+
|
83
|
+
distributed_job.push_each(Date.parse('2021-01-01')..Date.today) do |date, part|
|
84
|
+
SomeBackgroundJob.perform_async(date, distributed_job.token, part)
|
85
|
+
end
|
86
|
+
|
87
|
+
distributed_job.token # again, can be used to query the status of the distributed job
|
88
|
+
```
|
89
|
+
|
90
|
+
Here, the part name is automatically generated to be some id and passed as
|
91
|
+
`part` to the block. The part must also be passed to the respective background
|
92
|
+
job for it be able to mark the part as finished after it has been successfully
|
93
|
+
processed. Therefore, when all those background jobs have successfully
|
94
|
+
finished, all parts will be marked as finished, such that the distributed job
|
95
|
+
will finally be finished as well.
|
96
|
+
|
76
97
|
Within the background job, you must use the passed `token` and `part` to query
|
77
98
|
and update the status of the distributed job and part accordingly. Please note
|
78
99
|
that you can use whatever background job processing tool you like most.
|
@@ -86,9 +107,7 @@ class SomeBackgroundJob
|
|
86
107
|
|
87
108
|
# ...
|
88
109
|
|
89
|
-
distributed_job.done(part)
|
90
|
-
|
91
|
-
if distributed_job.finished?
|
110
|
+
if distributed_job.done(part)
|
92
111
|
# perform e.g. cleanup or the some other job
|
93
112
|
end
|
94
113
|
rescue
|
@@ -101,10 +120,9 @@ end
|
|
101
120
|
|
102
121
|
The `#stop` and `#stopped?` methods can be used to globally stop a distributed
|
103
122
|
job in case of errors. Contrary, the `#done` method tells the distributed job
|
104
|
-
that the specified part has successfully finished.
|
105
|
-
|
106
|
-
|
107
|
-
job.
|
123
|
+
that the specified part has successfully finished. The `#done` method returns
|
124
|
+
true when all parts of the distributed job have finished, which is useful to
|
125
|
+
start cleanup jobs or to even start another subsequent distributed job.
|
108
126
|
|
109
127
|
That's it.
|
110
128
|
|
data/distributed_job.gemspec
CHANGED
data/lib/distributed_job/job.rb
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
module DistributedJob
|
4
|
+
class AlreadyClosed < StandardError; end
|
5
|
+
|
4
6
|
# A `DistributedJob::Job` instance allows to keep track of a distributed job, i.e.
|
5
7
|
# a job which is split into multiple units running in parallel and in multiple
|
6
8
|
# workers using redis.
|
@@ -24,9 +26,7 @@ module DistributedJob
|
|
24
26
|
#
|
25
27
|
# # ...
|
26
28
|
#
|
27
|
-
# distributed_job.done(part)
|
28
|
-
#
|
29
|
-
# if distributed_job.finished?
|
29
|
+
# if distributed_job.done(part)
|
30
30
|
# # perform e.g. cleanup or the some other job
|
31
31
|
# end
|
32
32
|
# rescue
|
@@ -87,6 +87,8 @@ module DistributedJob
|
|
87
87
|
# end
|
88
88
|
|
89
89
|
def push_each(enum)
|
90
|
+
raise(AlreadyClosed, 'The distributed job is already closed') if closed?
|
91
|
+
|
90
92
|
previous_object = nil
|
91
93
|
previous_index = nil
|
92
94
|
|
@@ -104,6 +106,35 @@ module DistributedJob
|
|
104
106
|
yield(previous_object, previous_index.to_s) if previous_index
|
105
107
|
end
|
106
108
|
|
109
|
+
# Pass an enum to be used to iterate all the units of work of the
|
110
|
+
# distributed job. The values of the enum are used for the names of the
|
111
|
+
# parts, such that values listed multiple times (duplicates) will only be
|
112
|
+
# added once to the distributed job. The distributed job needs to know all
|
113
|
+
# of them to keep track of the overall number and status of the parts.
|
114
|
+
# Passing an enum is much better compared to pushing the parts manually,
|
115
|
+
# because the distributed job needs to be closed before the last part of
|
116
|
+
# the distributed job is enqueued into some job queue. Otherwise it could
|
117
|
+
# potentially happen that the last part is already processed in the job
|
118
|
+
# queue before it is pushed to redis, such that the last job doesn't know
|
119
|
+
# that the distributed job is finished.
|
120
|
+
#
|
121
|
+
# @param enum [#each] The enum which can be iterated to get all
|
122
|
+
# job parts
|
123
|
+
#
|
124
|
+
# @example
|
125
|
+
# distributed_job.push_all(0..128)
|
126
|
+
# distributed_job.push(['part1', 'part2', 'part3'])
|
127
|
+
|
128
|
+
def push_all(enum)
|
129
|
+
raise(AlreadyClosed, 'The distributed job is already closed') if closed?
|
130
|
+
|
131
|
+
enum.each do |part|
|
132
|
+
push(part)
|
133
|
+
end
|
134
|
+
|
135
|
+
close
|
136
|
+
end
|
137
|
+
|
107
138
|
# Returns all parts of the distributed job which are not yet finished.
|
108
139
|
#
|
109
140
|
# @return [Enumerator] The enum which allows to iterate all parts
|
@@ -112,6 +143,15 @@ module DistributedJob
|
|
112
143
|
redis.sscan_each("#{redis_key}:parts")
|
113
144
|
end
|
114
145
|
|
146
|
+
# Returns whether or not the part is in the list of open parts of the
|
147
|
+
# distributed job.
|
148
|
+
#
|
149
|
+
# @return [Boolean] Returns true or false
|
150
|
+
|
151
|
+
def open_part?(part)
|
152
|
+
redis.sismember("#{redis_key}:parts", part.to_s)
|
153
|
+
end
|
154
|
+
|
115
155
|
# Removes the specified part from the distributed job, i.e. from the set of
|
116
156
|
# unfinished parts. Use this method when the respective job part has been
|
117
157
|
# successfully processed, i.e. finished.
|
@@ -199,11 +239,11 @@ module DistributedJob
|
|
199
239
|
# end
|
200
240
|
|
201
241
|
def stop
|
202
|
-
redis.multi do
|
203
|
-
|
242
|
+
redis.multi do |transaction|
|
243
|
+
transaction.hset("#{redis_key}:state", 'stopped', 1)
|
204
244
|
|
205
|
-
|
206
|
-
|
245
|
+
transaction.expire("#{redis_key}:state", ttl)
|
246
|
+
transaction.expire("#{redis_key}:parts", ttl)
|
207
247
|
end
|
208
248
|
|
209
249
|
true
|
@@ -245,11 +285,11 @@ module DistributedJob
|
|
245
285
|
end
|
246
286
|
|
247
287
|
def close
|
248
|
-
redis.multi do
|
249
|
-
|
288
|
+
redis.multi do |transaction|
|
289
|
+
transaction.hset("#{redis_key}:state", 'closed', 1)
|
250
290
|
|
251
|
-
|
252
|
-
|
291
|
+
transaction.expire("#{redis_key}:state", ttl)
|
292
|
+
transaction.expire("#{redis_key}:parts", ttl)
|
253
293
|
end
|
254
294
|
|
255
295
|
true
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: distributed_job
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Benjamin Vetter
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2022-09-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -44,14 +44,14 @@ dependencies:
|
|
44
44
|
requirements:
|
45
45
|
- - ">="
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version:
|
47
|
+
version: 4.1.0
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - ">="
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version:
|
54
|
+
version: 4.1.0
|
55
55
|
description: Keep track of distributed jobs spanning multiple workers using redis
|
56
56
|
email:
|
57
57
|
- benjamin.vetter@wlw.de
|
@@ -83,7 +83,7 @@ metadata:
|
|
83
83
|
homepage_uri: https://github.com/mrkamel/distributed_job
|
84
84
|
source_code_uri: https://github.com/mrkamel/distributed_job
|
85
85
|
changelog_uri: https://github.com/mrkamel/distributed_job/blob/master/CHANGELOG.md
|
86
|
-
post_install_message:
|
86
|
+
post_install_message:
|
87
87
|
rdoc_options: []
|
88
88
|
require_paths:
|
89
89
|
- lib
|
@@ -98,8 +98,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
102
|
-
signing_key:
|
101
|
+
rubygems_version: 3.3.3
|
102
|
+
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: Keep track of distributed jobs using redis
|
105
105
|
test_files: []
|