async 2.27.0 → 2.27.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/context/tasks.md ADDED
@@ -0,0 +1,448 @@
1
+ # Tasks
2
+
3
+ This guide explains how asynchronous tasks work and how to use them.
4
+
5
+ ## Overview
6
+
7
+ Tasks are the smallest unit of sequential code execution in {ruby Async}. Tasks can create other tasks, and Async tracks the parent-child relationship between tasks. When a parent task is stopped, it will also stop all its children tasks. The reactor always starts with one root task.
8
+
9
+ ```mermaid
10
+ graph LR
11
+ R[Reactor] --> WS
12
+ WS[Web Server Task] --> R1[Request 1 Task]
13
+ WS --> R2[Request 2 Task]
14
+
15
+ R1 --> Q1[Database Query Task]
16
+ R1 --> H1[HTTP Client Request Task]
17
+
18
+ R2 --> H2[HTTP Client Request Task]
19
+ R2 --> H3[HTTP Client Request Task]
20
+ ```
21
+
22
+ ### How are they different from fibers?
23
+
24
+ A fiber is a lightweight unit of execution that can be suspended and resumed at specific points. After a fiber is suspended, it can be resumed later at the same point with the same execution state. Because only one fiber can execute at a time, they are often referred to as a mechanism for cooperative concurrency.
25
+
26
+ A task provides extra functionality on top of fibers. A task behaves like a promise: it either succeeds with a value or fails with an exception. Tasks keep track of their parent-child relationships, and when a parent task is stopped, it will also stop all its children tasks. This makes it easier to create complex programs with many concurrent tasks.
27
+
28
+ ### Why does Async manipulate tasks and not fibers?
29
+
30
+ The {ruby Async::Scheduler} actually works directly with fibers for most operations and isn't aware of tasks. However, the reactor does maintain a tree of tasks for the purpose of managing task and reactor life-cycle. For example, stopping a parent task will stop all its children tasks, and the reactor will exit when all tasks are finished.
31
+
32
+ ## Task Lifecycle
33
+
34
+ Tasks represent units of work which are executed according to the following state transition diagram:
35
+
36
+ ```mermaid
37
+ stateDiagram-v2
38
+ [*] --> initialized : Task.new
39
+ initialized --> running : run
40
+
41
+ running --> failed : unhandled StandardError-derived exception
42
+ running --> complete : user code finished
43
+ running --> stopped : stop
44
+
45
+ initialized --> stopped : stop
46
+
47
+ failed --> [*]
48
+ complete --> [*]
49
+ stopped --> [*]
50
+ ```
51
+
52
+ Tasks are created in the `initialized` state, and are run by the reactor. During the execution, a task can either `complete` successfully, become `failed` with an unhandled `StandardError`-derived exception, or be explicitly `stopped`. In all of these cases, you can wait for a task to complete by using {ruby Async::Task#wait}.
53
+
54
+ 1. In the case the task successfully completed, the result will be whatever value was generated by the last expression in the task.
55
+ 2. In the case the task failed with an unhandled `StandardError`-derived exception, waiting on the task will re-raise the exception.
56
+ 3. In the case the task was stopped, the result will be `nil`.
57
+
58
+ ## Starting A Task
59
+
60
+ At any point in your program, you can start a reactor and a root task using the {ruby Kernel::Async} method:
61
+
62
+ ```ruby
63
+ Async do
64
+ 1.upto(3) do |i|
65
+ sleep(i)
66
+ puts "Hello World #{i}"
67
+ end
68
+ end
69
+ ```
70
+
71
+ This program prints "Hello World" 3 times. Before printing, it sleeps for 1, then 2, then 3 seconds. The total execution time is 6 seconds because the program executes sequentially.
72
+
73
+ By using a nested task, we can ensure that each iteration of the loop creates a new task which runs concurrently.
74
+
75
+ ```ruby
76
+ Async do
77
+ 1.upto(3) do |i|
78
+ Async do
79
+ sleep(i)
80
+ puts "Hello World #{i}"
81
+ end
82
+ end
83
+ end
84
+ ```
85
+
86
+ Instead of taking 6 seconds, this program takes 3 seconds in total. The main loop executes, rapidly creating 3 child tasks, and then each child task sleeps for 1, 2 and 3 seconds respectively before printing "Hello World".
87
+
88
+ ```mermaid
89
+ graph LR
90
+ R[Reactor] --> TT[Initial Task]
91
+
92
+ TT --> H0[Hello World 0 Task]
93
+ TT --> H1[Hello World 1 Task]
94
+ TT --> H2[Hello World 2 Task]
95
+ ```
96
+
97
+ By constructing your program correctly, it's easy to implement concurrent map-reduce:
98
+
99
+ ```ruby
100
+ Async do
101
+ # Map (create several concurrent tasks)
102
+ users_size = Async{User.size}
103
+ posts_size = Async{Post.size}
104
+
105
+ # Reduce (wait for and merge the results)
106
+ average = posts_size.wait / users_size.wait
107
+ puts "#{users_size.wait} users created #{average} posts on average."
108
+ end
109
+ ```
110
+
111
+ ### Performance Considerations
112
+
113
+ Task creation and execution has been heavily optimised. Do not trade program complexity to avoid creating tasks; the cost will almost always exceed the gain.
114
+
115
+ Do consider using correct concurrency primatives like {ruby Async::Semaphore}, {ruby Async::Barrier}, etc, to ensure your program is well-behaved in the presence of large inputs (i.e. don't create an unbounded number of tasks).
116
+
117
+ ## Starting a Limited Number of Tasks
118
+
119
+ When processing potentially unbounded data, you may want to limit the concurrency using {ruby Async::Semaphore}.
120
+
121
+ ```ruby
122
+ Async do
123
+ # Create a semaphore with a limit of 2:
124
+ semaphore = Async::Semaphore.new(2)
125
+
126
+ file.each_line do |line|
127
+ semaphore.async do
128
+ # Only two tasks at most will be allowed to execute concurrently:
129
+ process(line)
130
+ end
131
+ end
132
+ end
133
+ ```
134
+
135
+ ## Waiting for Tasks
136
+
137
+ Waiting for a single task is trivial: simply invoke {ruby Async::Task#wait}. To wait for multiple tasks, you may either {ruby Async::Task#wait} on each in turn, or you may want to use a {ruby Async::Barrier}. You can use {ruby Async::Barrier#async} to create multiple child tasks, and wait for them all to complete using {ruby Async::Barrier#wait}.
138
+
139
+ ```ruby
140
+ barrier = Async::Barrier.new
141
+
142
+ Async do
143
+ jobs.each do |job|
144
+ barrier.async do
145
+ # ... process job ...
146
+ end
147
+ end
148
+
149
+ # Wait for all jobs to complete:
150
+ barrier.wait
151
+ end
152
+ ```
153
+
154
+ ### Waiting for the First N Tasks
155
+
156
+ Occasionally, you may need to just wait for the first task (or first several tasks) to complete. You can use a combination of {ruby Async::Waiter} and {ruby Async::Barrier} for controlling this:
157
+
158
+ ```ruby
159
+ Async do
160
+ barrier = Async::Barrier.new
161
+
162
+ begin
163
+ jobs.each do |job|
164
+ barrier.async do
165
+ # ... process job ...
166
+ end
167
+ end
168
+
169
+ # Wait for the first two jobs to complete:
170
+ done = []
171
+ barrier.wait do |task|
172
+ done << task.wait
173
+
174
+ # If you don't want to wait for any more tasks you can break:
175
+ break if done.size >= 2
176
+ end
177
+ ensure
178
+ # The remainder of the tasks will be stopped:
179
+ barrier.stop
180
+ end
181
+ end
182
+ ```
183
+
184
+ ### Combining a Barrier with a Semaphore
185
+
186
+ {ruby Async::Barrier} and {ruby Async::Semaphore} are designed to be compatible with each other, and with other tasks that nest `#async` invocations. There are other similar situations where you may want to pass in a parent task, e.g. {ruby Async::IO::Endpoint#bind}.
187
+
188
+ ~~~ ruby
189
+ barrier = Async::Barrier.new
190
+ semaphore = Async::Semaphore.new(2, parent: barrier)
191
+
192
+ jobs.each do |job|
193
+ semaphore.async(parent: barrier) do
194
+ # ... process job ...
195
+ end
196
+ end
197
+
198
+ # Wait until all jobs are done:
199
+ barrier.wait
200
+ ~~~
201
+
202
+ ## Stopping a Task
203
+
204
+ When a task completes execution, it will enter the `complete` state (or the `failed` state if it raises an unhandled exception).
205
+
206
+ There are various situations where you may want to stop a task ({ruby Async::Task#stop}) before it completes. The most common case is shutting down a server. A more complex example is this: you may fan out multiple (10s, 100s) of requests, wait for a subset to complete (e.g. the first 5 or all those that complete within a given deadline), and then stop (terminate/cancel) the remaining operations.
207
+
208
+ Using the above program as an example, let's stop all the tasks just after the first one completes.
209
+
210
+ ```ruby
211
+ Async do
212
+ tasks = 3.times.map do |i|
213
+ Async do
214
+ sleep(i)
215
+ puts "Hello World #{i}"
216
+ end
217
+ end
218
+
219
+ # Stop all the above tasks:
220
+ tasks.each(&:stop)
221
+ end
222
+ ```
223
+
224
+ ### Stopping all Tasks held in a Barrier
225
+
226
+ To stop (terminate/cancel) all the tasks held in a barrier:
227
+
228
+ ```ruby
229
+ barrier = Async::Barrier.new
230
+
231
+ Async do
232
+ tasks = 3.times.map do |i|
233
+ barrier.async do
234
+ sleep(i)
235
+ puts "Hello World #{i}"
236
+ end
237
+ end
238
+
239
+ barrier.stop
240
+ end
241
+ ```
242
+
243
+ Unless your tasks all rescue and suppresses `StandardError`-derived exceptions, be sure to call ({ruby Async::Barrier#stop}) to stop the remaining tasks:
244
+
245
+ ```ruby
246
+ barrier = Async::Barrier.new
247
+
248
+ Async do
249
+ tasks = 3.times.map do |i|
250
+ barrier.async do
251
+ sleep(i)
252
+ puts "Hello World #{i}"
253
+ end
254
+ end
255
+
256
+ begin
257
+ barrier.wait
258
+ ensure
259
+ barrier.stop
260
+ end
261
+ end
262
+ ```
263
+
264
+ ## Resource Management
265
+
266
+ In order to ensure your resources are cleaned up correctly, make sure you wrap resources appropriately, e.g.:
267
+
268
+ ~~~ ruby
269
+ Async do
270
+ begin
271
+ socket = connect(remote_address) # May raise Async::Stop
272
+
273
+ socket.write(...) # May raise Async::Stop
274
+ socket.read(...) # May raise Async::Stop
275
+ ensure
276
+ socket.close if socket
277
+ end
278
+ end
279
+ ~~~
280
+
281
+ As tasks run synchronously until they yield back to the reactor, you can guarantee this model works correctly. While in theory `IO#autoclose` allows you to automatically close file descriptors when they go out of scope via the GC, it may produce unpredictable behavour (exhaustion of file descriptors, flushing data at odd times), so it's not recommended.
282
+
283
+ ## Exception Handling
284
+
285
+ {ruby Async::Task} captures and logs exceptions. All unhandled exceptions will cause the enclosing task to enter the `:failed` state. Non-`StandardError` exceptions are re-raised immediately and will cause the reactor to exit. This ensures that exceptions will always be visible and cause the program to fail appropriately.
286
+
287
+ ~~~ ruby
288
+ require 'async'
289
+
290
+ task = Async do
291
+ # Exception will be logged and task will be failed.
292
+ raise "Boom"
293
+ end
294
+
295
+ puts task.status # failed
296
+ puts task.wait # raises RuntimeError: Boom
297
+ ~~~
298
+
299
+ ### Propagating Exceptions
300
+
301
+ If a task has finished due to an exception, calling `Task#wait` will re-raise the exception.
302
+
303
+ ~~~ ruby
304
+ require 'async'
305
+
306
+ Async do
307
+ task = Async do
308
+ raise "Boom"
309
+ end
310
+
311
+ begin
312
+ task.wait # Re-raises above exception.
313
+ rescue
314
+ puts "It went #{$!}!"
315
+ end
316
+ end
317
+ ~~~
318
+
319
+ ## Timeouts
320
+
321
+ You can wrap asynchronous operations in a timeout. This allows you to put an upper bound on how long the enclosed code will run vs. potentially blocking indefinitely. If the enclosed code hasn't completed by the timeout, it will be interrupted with an {ruby Async::TimeoutError} exception.
322
+
323
+ ~~~ ruby
324
+ require 'async'
325
+
326
+ Async do |task|
327
+ task.with_timeout(1) do
328
+ sleep(100)
329
+ rescue Async::TimeoutError
330
+ puts "I timed out 99 seconds early!"
331
+ end
332
+ end
333
+ ~~~
334
+
335
+ ### Periodic Timers
336
+
337
+ Sometimes you need to do some recurring work in a loop. Often it's best to measure the periodic delay end-to-start, so that your process always takes a break between iterations and doesn't risk spending 100% of its time on the periodic work. In this case, simply call {ruby sleep} between iterations:
338
+
339
+ ~~~ ruby
340
+ require 'async'
341
+
342
+ period = 30
343
+
344
+ Async do |task|
345
+ loop do
346
+ puts Time.now
347
+ # ... process job ...
348
+ sleep(period)
349
+ end
350
+ end
351
+ ~~~
352
+
353
+ If you need a periodic timer that runs start-to-start, you can keep track of the `run_next` time using the monotonic clock:
354
+
355
+ ~~~ ruby
356
+ require 'async'
357
+
358
+ period = 30
359
+
360
+ Async do |task|
361
+ run_next = Async::Clock.now
362
+ loop do
363
+ run_next += period
364
+ puts Time.now
365
+ # ... process job ...
366
+ if (remaining = run_next - Async::Clock.now) > 0
367
+ sleep(remaining)
368
+ end
369
+ end
370
+ end
371
+ ~~~
372
+
373
+ ## Reactor Lifecycle
374
+
375
+ Generally, the reactor's event loop will not exit until all tasks complete. This is informed by {ruby Async::Task#finished?} which checks if the current node has completed execution, which also includes all children. However, there is one exception to this rule: tasks flagged as being `transient` ({ruby Async::Node#transient?}).
376
+
377
+ ### Transient Tasks
378
+
379
+ Tasks which are flagged as `transient` are identical to normal tasks, except for one key difference: they do not keep the reactor alive. They are useful for operations which are not directly related to application concurrency, but are instead an implementation detail of the application. For example, a task which is monitoring and maintaining a connection pool, pruning unused connections or possibly ensuring those connections are periodically checked for activity (ping/pong, etc). If all *other* tasks are completed, and only transient tasks remain at the root of the reactor, the reactor should exit.
380
+
381
+ #### How To Create Transient Tasks
382
+
383
+ Specify the `transient` option when creating a task:
384
+
385
+ ```ruby
386
+ @pruner = Async(transient: true) do
387
+ loop do
388
+ sleep(1)
389
+ prune_connection_pool
390
+ end
391
+ end
392
+ ```
393
+
394
+ Transient tasks are similar to normal tasks, except for the following differences:
395
+
396
+ 1. They are not considered by {ruby Async::Task#finished?}, so they will not keep the reactor alive. Instead, they are stopped (with a {ruby Async::Stop} exception) when all other (non-transient) tasks are finished.
397
+ 2. As soon as a parent task is finished, any transient child tasks will be moved up to be children of the parent's parent. This ensures that they never keep a sub-tree alive.
398
+ 3. Similarly, if you `stop` a task, any transient child tasks will be moved up the tree as above rather than being stopped.
399
+
400
+ The purpose of transient tasks is when a task is an implementation detail of an object or instance, rather than a concurrency process. Some examples of transient tasks:
401
+
402
+ - A task which is reading or writing data on behalf of a stateful connection object, e.g. HTTP/2 frame reader, Redis cache invalidation, etc.
403
+ - A task which is monitoring and maintaining a connection pool, pruning unused connections or possibly ensuring those connections are periodically checked for activity (ping/pong, etc).
404
+ - A background worker or batch processing job which is independent of any specific operation, and is lazily created.
405
+ - A cache system which needs periodic expiration / revalidation of data/values.
406
+
407
+ Here is an example that keeps a cache of the current time string since that has only 1-second granularity
408
+ and you could be handling 1000s of requests per second.
409
+ The task doing the updating in the background is an implementation detail, so it is marked as `transient`.
410
+
411
+ ```ruby
412
+ require 'async'
413
+ require 'thread/local' # thread-local gem.
414
+
415
+ class TimeStringCache
416
+ extend Thread::Local # defines `instance` class method that lazy-creates a separate instance per thread
417
+
418
+ def initialize
419
+ @current_time_string = nil
420
+ end
421
+
422
+ def current_time_string
423
+ refresh!
424
+
425
+ return @current_time_string
426
+ end
427
+
428
+ private
429
+
430
+ def refresh!
431
+ @refresh ||= Async(transient: true) do
432
+ loop do
433
+ @current_time_string = Time.now.to_s
434
+ sleep(1)
435
+ end
436
+ ensure
437
+ # When the reactor terminates all tasks, `Async::Stop` will be raised from `sleep` and this code will be invoked. By clearing `@refresh`, we ensure that the task will be recreated if needed again:
438
+ @refresh = nil
439
+ end
440
+ end
441
+ end
442
+
443
+ Async do
444
+ p TimeStringCache.instance.current_time_string
445
+ end
446
+ ```
447
+
448
+ Upon existing the top level async block, the {ruby @refresh} task will be set to `nil`. Bear in mind, you should not share these resources across threads; doing so would need some form of mutual exclusion.