distributed_job 3.0.1 → 3.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8bda4273dc59888e0269d7d14d69180e9a67bdb793e9c7f9f24bcc9f88a1b8fd
4
- data.tar.gz: 8a3108b573a9e46e78d57979dfab70906459947651c1b2cda1391f66aa8e0f93
3
+ metadata.gz: 16178ecd755b5a3b3666711c23d6a29c87ec93ac04027df87b8e580c9f974599
4
+ data.tar.gz: c469b7af9c967395e6cf1fea21aac21d0cad14ac92a59eb8320c1cf601460c65
5
5
  SHA512:
6
- metadata.gz: e553072546911dffba6bd7f40e9f4eb6c7fe4f0a0338437ffc21cdc8e442c91cb0b51fa2afbcf070e257d2caaade5954867710e14e7919fde335aaebb8ec3867
7
- data.tar.gz: 86ae97dd953642cc225f6b4280776c84dfc080cc4af90d8519352f6d4faf0a26ab736df372cac734a1884382a0173e9a44784371858b5e5f137bf68b2b6193d0
6
+ metadata.gz: 534bb12b74aa562af37d913164ffe134084b417cc5f1e51663682181bffcabb883501dcfc0bdeb8a2d60c9a95bfe872a3a6efad0c7ac015c27dcf3109d941643
7
+ data.tar.gz: 74a0c64a0f31e00e17a7e977a201b7a6c8f51c013960b83068023c4f3c7bddcb161bab9bef9abe005cf38407c2c3c35482d59753c20578d6cdbed4c62137a8a2
data/CHANGELOG.md CHANGED
@@ -1,5 +1,10 @@
1
1
  # CHANGELOG
2
2
 
3
+ ## v3.1.0
4
+
5
+ * Added `DistributedJob::Job#push_all`
6
+ * Added `DistributedJob::Job#open_part?`
7
+
3
8
  ## v3.0.1
4
9
 
5
10
  * Fix pipelining with regards to redis-rb 4.6.0
data/README.md CHANGED
@@ -40,28 +40,28 @@ You can specify a `namespace` to be additionally used for redis keys and set a
40
40
  every time when keys in redis are updated to guarantee that the distributed
41
41
  job metadata is cleaned up properly from redis at some point in time.
42
42
 
43
- Afterwards, to create a distributed job and add parts, i.e. units of work, to
44
- it, simply do:
43
+ Afterwards, you have two options to add parts, i.e. units of work, to the
44
+ distributed job. The first option is to use `#push_all` and pass an enum:
45
45
 
46
46
  ```ruby
47
47
  distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
48
+ distributed_job.push_all(["job1", "job2", "job3"])
48
49
 
49
- distributed_job.push_each(Date.parse('2021-01-01')..Date.today) do |date, part|
50
- SomeBackgroundJob.perform_async(date, distributed_job.token, part)
51
- end
50
+ Job1.perform_async(distributed_job.token)
51
+ Job2.perform_async(distributed_job.token)
52
+ Job3.perform_async(distributed_job.token)
52
53
 
53
54
  distributed_job.token # can be used to query the status of the distributed job
54
55
  ```
55
56
 
56
- The `part` which is passed to the block is some id for one particular part of
57
- the distributed job. It must be used in a respective background job to mark
58
- this part finished after it has been successfully processed. Therefore, when
59
- all those background jobs have successfully finished, all parts will be marked
60
- as finished, such that the distributed job will finally be finished as well.
61
-
62
- The `token` can also be used to query the status of the distributed job, e.g.
63
- on a job summary page or similar. You can show some progress bar in the browser
64
- or in the terminal, etc:
57
+ Here, 3 parts named `job1`, `job2` and `job3` are added to the distributed job
58
+ and then 3 corresponding background jobs are enqueued. It is important to push
59
+ the parts before the background jobs are enqueued. Otherwise the background
60
+ jobs maybe can't find them. The `token` must be passed to the background jobs,
61
+ such that the background job can update the status of the distributed job by
62
+ marking the respective part as done. The token can also be used to query the
63
+ status of the distributed job, e.g. on a job summary page or similar. You can
64
+ also show some progress bar in the browser or in the terminal, etc.
65
65
 
66
66
  ```ruby
67
67
  # token is given via URL or via some other means
@@ -71,8 +71,29 @@ distributed_job.total # total number of parts
71
71
  distributed_job.count # number of unfinished parts
72
72
  distributed_job.finished? # whether or not all parts are finished
73
73
  distributed_job.open_parts # returns all not yet finished part id's
74
+
75
+ distributed_job.done('job1') # marks the respective part as done
76
+ ```
77
+
78
+ The second option is to use `#push_each`:
79
+
80
+ ```ruby
81
+ distributed_job = DistributedJobClient.build(token: SecureRandom.hex)
82
+
83
+ distributed_job.push_each(Date.parse('2021-01-01')..Date.today) do |date, part|
84
+ SomeBackgroundJob.perform_async(date, distributed_job.token, part)
85
+ end
86
+
87
+ distributed_job.token # again, can be used to query the status of the distributed job
74
88
  ```
75
89
 
90
+ Here, the part name is automatically generated to be some id and passed as
91
+ `part` to the block. The part must also be passed to the respective background
92
+ job for it be able to mark the part as finished after it has been successfully
93
+ processed. Therefore, when all those background jobs have successfully
94
+ finished, all parts will be marked as finished, such that the distributed job
95
+ will finally be finished as well.
96
+
76
97
  Within the background job, you must use the passed `token` and `part` to query
77
98
  and update the status of the distributed job and part accordingly. Please note
78
99
  that you can use whatever background job processing tool you like most.
@@ -1,6 +1,8 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DistributedJob
4
+ class AlreadyClosed < StandardError; end
5
+
4
6
  # A `DistributedJob::Job` instance allows to keep track of a distributed job, i.e.
5
7
  # a job which is split into multiple units running in parallel and in multiple
6
8
  # workers using redis.
@@ -85,6 +87,8 @@ module DistributedJob
85
87
  # end
86
88
 
87
89
  def push_each(enum)
90
+ raise(AlreadyClosed, 'The distributed job is already closed') if closed?
91
+
88
92
  previous_object = nil
89
93
  previous_index = nil
90
94
 
@@ -102,6 +106,35 @@ module DistributedJob
102
106
  yield(previous_object, previous_index.to_s) if previous_index
103
107
  end
104
108
 
109
+ # Pass an enum to be used to iterate all the units of work of the
110
+ # distributed job. The values of the enum are used for the names of the
111
+ # parts, such that values listed multiple times (duplicates) will only be
112
+ # added once to the distributed job. The distributed job needs to know all
113
+ # of them to keep track of the overall number and status of the parts.
114
+ # Passing an enum is much better compared to pushing the parts manually,
115
+ # because the distributed job needs to be closed before the last part of
116
+ # the distributed job is enqueued into some job queue. Otherwise it could
117
+ # potentially happen that the last part is already processed in the job
118
+ # queue before it is pushed to redis, such that the last job doesn't know
119
+ # that the distributed job is finished.
120
+ #
121
+ # @param enum [#each] The enum which can be iterated to get all
122
+ # job parts
123
+ #
124
+ # @example
125
+ # distributed_job.push_all(0..128)
126
+ # distributed_job.push(['part1', 'part2', 'part3'])
127
+
128
+ def push_all(enum)
129
+ raise(AlreadyClosed, 'The distributed job is already closed') if closed?
130
+
131
+ enum.each do |part|
132
+ push(part)
133
+ end
134
+
135
+ close
136
+ end
137
+
105
138
  # Returns all parts of the distributed job which are not yet finished.
106
139
  #
107
140
  # @return [Enumerator] The enum which allows to iterate all parts
@@ -110,6 +143,15 @@ module DistributedJob
110
143
  redis.sscan_each("#{redis_key}:parts")
111
144
  end
112
145
 
146
+ # Returns whether or not the part is in the list of open parts of the
147
+ # distributed job.
148
+ #
149
+ # @return [Boolean] Returns true or false
150
+
151
+ def open_part?(part)
152
+ redis.sismember("#{redis_key}:parts", part.to_s)
153
+ end
154
+
113
155
  # Removes the specified part from the distributed job, i.e. from the set of
114
156
  # unfinished parts. Use this method when the respective job part has been
115
157
  # successfully processed, i.e. finished.
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module DistributedJob
4
- VERSION = '3.0.1'
4
+ VERSION = '3.1.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: distributed_job
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.1
4
+ version: 3.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Benjamin Vetter
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2022-02-28 00:00:00.000000000 Z
11
+ date: 2022-09-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec