distributed_job 3.0.0 → 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +9 -0
- data/README.md +39 -21
- data/distributed_job.gemspec +1 -1
- data/lib/distributed_job/job.rb +51 -11
- data/lib/distributed_job/version.rb +1 -1
- metadata +8 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 16178ecd755b5a3b3666711c23d6a29c87ec93ac04027df87b8e580c9f974599
|
4
|
+
data.tar.gz: c469b7af9c967395e6cf1fea21aac21d0cad14ac92a59eb8320c1cf601460c65
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 534bb12b74aa562af37d913164ffe134084b417cc5f1e51663682181bffcabb883501dcfc0bdeb8a2d60c9a95bfe872a3a6efad0c7ac015c27dcf3109d941643
|
7
|
+
data.tar.gz: 74a0c64a0f31e00e17a7e977a201b7a6c8f51c013960b83068023c4f3c7bddcb161bab9bef9abe005cf38407c2c3c35482d59753c20578d6cdbed4c62137a8a2
|
data/.rubocop.yml
CHANGED
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,14 @@
|
|
1
1
|
# CHANGELOG
|
2
2
|
|
3
|
+
## v3.1.0
|
4
|
+
|
5
|
+
* Added `DistributedJob::Job#push_all`
|
6
|
+
* Added `DistributedJob::Job#open_part?`
|
7
|
+
|
8
|
+
## v3.0.1
|
9
|
+
|
10
|
+
* Fix pipelining with regards to redis-rb 4.6.0
|
11
|
+
|
3
12
|
## v3.0.0
|
4
13
|
|
5
14
|
* Split `DistributedJob` in `DistributedJob::Client` and `DistributedJob::Job`
|
data/README.md
CHANGED
@@ -40,28 +40,28 @@ You can specify a `namespace` to be additionally used for redis keys and set a
|
|
40
40
|
every time when keys in redis are updated to guarantee that the distributed
|
41
41
|
job metadata is cleaned up properly from redis at some point in time.
|
42
42
|
|
43
|
-
Afterwards,
|
44
|
-
|
43
|
+
Afterwards, you have two options to add parts, i.e. units of work, to the
|
44
|
+
distributed job. The first option is to use `#push_all` and pass an enum:
|
45
45
|
|
46
46
|
```ruby
|
47
47
|
distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
|
48
|
+
distributed_job.push_all(["job1", "job2", "job3"])
|
48
49
|
|
49
|
-
|
50
|
-
|
51
|
-
|
50
|
+
Job1.perform_async(distributed_job.token)
|
51
|
+
Job2.perform_async(distributed_job.token)
|
52
|
+
Job3.perform_async(distributed_job.token)
|
52
53
|
|
53
54
|
distributed_job.token # can be used to query the status of the distributed job
|
54
55
|
```
|
55
56
|
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
or in the terminal, etc:
|
57
|
+
Here, 3 parts named `job1`, `job2` and `job3` are added to the distributed job
|
58
|
+
and then 3 corresponding background jobs are enqueued. It is important to push
|
59
|
+
the parts before the background jobs are enqueued. Otherwise the background
|
60
|
+
jobs maybe can't find them. The `token` must be passed to the background jobs,
|
61
|
+
such that the background job can update the status of the distributed job by
|
62
|
+
marking the respective part as done. The token can also be used to query the
|
63
|
+
status of the distributed job, e.g. on a job summary page or similar. You can
|
64
|
+
also show some progress bar in the browser or in the terminal, etc.
|
65
65
|
|
66
66
|
```ruby
|
67
67
|
# token is given via URL or via some other means
|
@@ -71,8 +71,29 @@ distributed_job.total # total number of parts
|
|
71
71
|
distributed_job.count # number of unfinished parts
|
72
72
|
distributed_job.finished? # whether or not all parts are finished
|
73
73
|
distributed_job.open_parts # returns all not yet finished part id's
|
74
|
+
|
75
|
+
distributed_job.done('job1') # marks the respective part as done
|
74
76
|
```
|
75
77
|
|
78
|
+
The second option is to use `#push_each`:
|
79
|
+
|
80
|
+
```ruby
|
81
|
+
distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
|
82
|
+
|
83
|
+
distributed_job.push_each(Date.parse('2021-01-01')..Date.today) do |date, part|
|
84
|
+
SomeBackgroundJob.perform_async(date, distributed_job.token, part)
|
85
|
+
end
|
86
|
+
|
87
|
+
distributed_job.token # again, can be used to query the status of the distributed job
|
88
|
+
```
|
89
|
+
|
90
|
+
Here, the part name is automatically generated to be some id and passed as
|
91
|
+
`part` to the block. The part must also be passed to the respective background
|
92
|
+
job for it be able to mark the part as finished after it has been successfully
|
93
|
+
processed. Therefore, when all those background jobs have successfully
|
94
|
+
finished, all parts will be marked as finished, such that the distributed job
|
95
|
+
will finally be finished as well.
|
96
|
+
|
76
97
|
Within the background job, you must use the passed `token` and `part` to query
|
77
98
|
and update the status of the distributed job and part accordingly. Please note
|
78
99
|
that you can use whatever background job processing tool you like most.
|
@@ -86,9 +107,7 @@ class SomeBackgroundJob
|
|
86
107
|
|
87
108
|
# ...
|
88
109
|
|
89
|
-
distributed_job.done(part)
|
90
|
-
|
91
|
-
if distributed_job.finished?
|
110
|
+
if distributed_job.done(part)
|
92
111
|
# perform e.g. cleanup or the some other job
|
93
112
|
end
|
94
113
|
rescue
|
@@ -101,10 +120,9 @@ end
|
|
101
120
|
|
102
121
|
The `#stop` and `#stopped?` methods can be used to globally stop a distributed
|
103
122
|
job in case of errors. Contrary, the `#done` method tells the distributed job
|
104
|
-
that the specified part has successfully finished.
|
105
|
-
|
106
|
-
|
107
|
-
job.
|
123
|
+
that the specified part has successfully finished. The `#done` method returns
|
124
|
+
true when all parts of the distributed job have finished, which is useful to
|
125
|
+
start cleanup jobs or to even start another subsequent distributed job.
|
108
126
|
|
109
127
|
That's it.
|
110
128
|
|
data/distributed_job.gemspec
CHANGED
data/lib/distributed_job/job.rb
CHANGED
@@ -1,6 +1,8 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
module DistributedJob
|
4
|
+
class AlreadyClosed < StandardError; end
|
5
|
+
|
4
6
|
# A `DistributedJob::Job` instance allows to keep track of a distributed job, i.e.
|
5
7
|
# a job which is split into multiple units running in parallel and in multiple
|
6
8
|
# workers using redis.
|
@@ -24,9 +26,7 @@ module DistributedJob
|
|
24
26
|
#
|
25
27
|
# # ...
|
26
28
|
#
|
27
|
-
# distributed_job.done(part)
|
28
|
-
#
|
29
|
-
# if distributed_job.finished?
|
29
|
+
# if distributed_job.done(part)
|
30
30
|
# # perform e.g. cleanup or the some other job
|
31
31
|
# end
|
32
32
|
# rescue
|
@@ -87,6 +87,8 @@ module DistributedJob
|
|
87
87
|
# end
|
88
88
|
|
89
89
|
def push_each(enum)
|
90
|
+
raise(AlreadyClosed, 'The distributed job is already closed') if closed?
|
91
|
+
|
90
92
|
previous_object = nil
|
91
93
|
previous_index = nil
|
92
94
|
|
@@ -104,6 +106,35 @@ module DistributedJob
|
|
104
106
|
yield(previous_object, previous_index.to_s) if previous_index
|
105
107
|
end
|
106
108
|
|
109
|
+
# Pass an enum to be used to iterate all the units of work of the
|
110
|
+
# distributed job. The values of the enum are used for the names of the
|
111
|
+
# parts, such that values listed multiple times (duplicates) will only be
|
112
|
+
# added once to the distributed job. The distributed job needs to know all
|
113
|
+
# of them to keep track of the overall number and status of the parts.
|
114
|
+
# Passing an enum is much better compared to pushing the parts manually,
|
115
|
+
# because the distributed job needs to be closed before the last part of
|
116
|
+
# the distributed job is enqueued into some job queue. Otherwise it could
|
117
|
+
# potentially happen that the last part is already processed in the job
|
118
|
+
# queue before it is pushed to redis, such that the last job doesn't know
|
119
|
+
# that the distributed job is finished.
|
120
|
+
#
|
121
|
+
# @param enum [#each] The enum which can be iterated to get all
|
122
|
+
# job parts
|
123
|
+
#
|
124
|
+
# @example
|
125
|
+
# distributed_job.push_all(0..128)
|
126
|
+
# distributed_job.push(['part1', 'part2', 'part3'])
|
127
|
+
|
128
|
+
def push_all(enum)
|
129
|
+
raise(AlreadyClosed, 'The distributed job is already closed') if closed?
|
130
|
+
|
131
|
+
enum.each do |part|
|
132
|
+
push(part)
|
133
|
+
end
|
134
|
+
|
135
|
+
close
|
136
|
+
end
|
137
|
+
|
107
138
|
# Returns all parts of the distributed job which are not yet finished.
|
108
139
|
#
|
109
140
|
# @return [Enumerator] The enum which allows to iterate all parts
|
@@ -112,6 +143,15 @@ module DistributedJob
|
|
112
143
|
redis.sscan_each("#{redis_key}:parts")
|
113
144
|
end
|
114
145
|
|
146
|
+
# Returns whether or not the part is in the list of open parts of the
|
147
|
+
# distributed job.
|
148
|
+
#
|
149
|
+
# @return [Boolean] Returns true or false
|
150
|
+
|
151
|
+
def open_part?(part)
|
152
|
+
redis.sismember("#{redis_key}:parts", part.to_s)
|
153
|
+
end
|
154
|
+
|
115
155
|
# Removes the specified part from the distributed job, i.e. from the set of
|
116
156
|
# unfinished parts. Use this method when the respective job part has been
|
117
157
|
# successfully processed, i.e. finished.
|
@@ -199,11 +239,11 @@ module DistributedJob
|
|
199
239
|
# end
|
200
240
|
|
201
241
|
def stop
|
202
|
-
redis.multi do
|
203
|
-
|
242
|
+
redis.multi do |transaction|
|
243
|
+
transaction.hset("#{redis_key}:state", 'stopped', 1)
|
204
244
|
|
205
|
-
|
206
|
-
|
245
|
+
transaction.expire("#{redis_key}:state", ttl)
|
246
|
+
transaction.expire("#{redis_key}:parts", ttl)
|
207
247
|
end
|
208
248
|
|
209
249
|
true
|
@@ -245,11 +285,11 @@ module DistributedJob
|
|
245
285
|
end
|
246
286
|
|
247
287
|
def close
|
248
|
-
redis.multi do
|
249
|
-
|
288
|
+
redis.multi do |transaction|
|
289
|
+
transaction.hset("#{redis_key}:state", 'closed', 1)
|
250
290
|
|
251
|
-
|
252
|
-
|
291
|
+
transaction.expire("#{redis_key}:state", ttl)
|
292
|
+
transaction.expire("#{redis_key}:parts", ttl)
|
253
293
|
end
|
254
294
|
|
255
295
|
true
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: distributed_job
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 3.
|
4
|
+
version: 3.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Benjamin Vetter
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2022-09-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: rspec
|
@@ -44,14 +44,14 @@ dependencies:
|
|
44
44
|
requirements:
|
45
45
|
- - ">="
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version:
|
47
|
+
version: 4.1.0
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - ">="
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version:
|
54
|
+
version: 4.1.0
|
55
55
|
description: Keep track of distributed jobs spanning multiple workers using redis
|
56
56
|
email:
|
57
57
|
- benjamin.vetter@wlw.de
|
@@ -83,7 +83,7 @@ metadata:
|
|
83
83
|
homepage_uri: https://github.com/mrkamel/distributed_job
|
84
84
|
source_code_uri: https://github.com/mrkamel/distributed_job
|
85
85
|
changelog_uri: https://github.com/mrkamel/distributed_job/blob/master/CHANGELOG.md
|
86
|
-
post_install_message:
|
86
|
+
post_install_message:
|
87
87
|
rdoc_options: []
|
88
88
|
require_paths:
|
89
89
|
- lib
|
@@ -98,8 +98,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
98
98
|
- !ruby/object:Gem::Version
|
99
99
|
version: '0'
|
100
100
|
requirements: []
|
101
|
-
rubygems_version: 3.
|
102
|
-
signing_key:
|
101
|
+
rubygems_version: 3.3.3
|
102
|
+
signing_key:
|
103
103
|
specification_version: 4
|
104
104
|
summary: Keep track of distributed jobs using redis
|
105
105
|
test_files: []
|