async 2.27.0 → 2.27.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/agent.md +16 -0
- data/context/best-practices.md +188 -0
- data/context/debugging.md +63 -0
- data/context/getting-started.md +177 -0
- data/context/index.yaml +29 -0
- data/context/scheduler.md +109 -0
- data/context/tasks.md +448 -0
- data/context/thread-safety.md +651 -0
- data/lib/async/version.rb +1 -1
- data/readme.md +3 -3
- data.tar.gz.sig +0 -0
- metadata +9 -2
- metadata.gz.sig +0 -0
@@ -0,0 +1,651 @@
|
|
1
|
+
# Thread safety
|
2
|
+
|
3
|
+
This guide explains thread safety in Ruby, focusing on fibers and threads, common pitfalls, and best practices to avoid problems like data corruption, race conditions, and deadlocks.
|
4
|
+
|
5
|
+
## Usage
|
6
|
+
|
7
|
+
You can use this guide to understand how to write thread-safe code in Ruby, especially when using fibers and threads. It covers key principles, common patterns, and unsafe practices, along with safe alternatives.
|
8
|
+
|
9
|
+
### Agents
|
10
|
+
|
11
|
+
When analyzing existing projects, you should check files one by one, looking for problematic patterns and unsafe practices as outlined in this guide. You should report these issues and suggest safe alternatives.
|
12
|
+
|
13
|
+
## Key Issues
|
14
|
+
|
15
|
+
- **Data corruption is the primary concern** - prevention is absolutely critical.
|
16
|
+
- **Isolation should be the default** - operations should not share mutable state.
|
17
|
+
- **Shared mutable state should be avoided**. Prefer pure functions, immutable objects, and dependency injection.
|
18
|
+
- **Assume that code will be executed concurrently** by multiple fibers, threads and processes.
|
19
|
+
- **Assume that code may context switch at any time**, but especially during I/O operations.
|
20
|
+
- I/O operations include network calls, file I/O, database queries, etc.
|
21
|
+
- Other context switch points include `Fiber.yield`, `sleep`, waiting on child processes, DNS queries, and interrupts (signal handling).
|
22
|
+
- **Fibers and threads are NOT the same thing**, however they do share similar safety requirements.
|
23
|
+
- **C extensions e.g. C/Rust etc. can block the fiber scheduler entirely**.
|
24
|
+
- Native code, when implemented correctly, is usually okay, but bugs can exist anywhere, even in mature code.
|
25
|
+
|
26
|
+
## Quick Reference
|
27
|
+
|
28
|
+
| Unsafe Pattern / Problem | Safe Alternative / Solution | Note / Rationale |
|
29
|
+
| :--------------------------------------------- | :----------------------------------------------------- | :------------------------------------------------- |
|
30
|
+
| `@data \|\|= load_data` | `@mutex.synchronize { @data \|\|= load_data }` | `\|\|=` is not atomic; use double-checked locking. |
|
31
|
+
| `@shared_array << item` | `Thread::Mutex` or `Thread::Queue` | Use a mutex, or better, `Queue` for coordination. |
|
32
|
+
| `@shared_hash[key] = value` | `Thread::Mutex` or `Concurrent::Map` | Use a mutex or a concurrent data structure. |
|
33
|
+
| `@@class_var` | Dependency injection / Instance state | Class vars create spooky shared state. |
|
34
|
+
| Class attribute / `class_attribute` | Constructor arg or method param | Pass state explicitly to avoid coupling. |
|
35
|
+
| Shared mutable state | Immutability / Isolation / Pure functions | Avoid sharing mutable state if possible. |
|
36
|
+
| Memoization with shared Hash | `Mutex` or `Concurrent::Map` | Hash memoization is not thread-safe. |
|
37
|
+
| Lazy init: `@mutex \|\|= Mutex.new` | Initialize eagerly | Mutex creation must itself be thread-safe. |
|
38
|
+
| Shared connection (e.g. DB client) | Connection pool | Never share non-thread-safe connections. |
|
39
|
+
| Array/Hash iteration while mutating | Synchronize all access with `Mutex` / copy for enum | Don’t mutate while enumerating. |
|
40
|
+
| `Thread.current[:key] = value` for per-request | `Fiber[:key] = value` or pass context | Prefer fiber-local or explicit context passing. |
|
41
|
+
| Waiting on state with busy-wait | `Mutex` + `ConditionVariable` | Use proper synchronization. |
|
42
|
+
| "Time of check, time of use" on files | Atomic file ops / use database / transaction | Use atomic operations to avoid TOCTOU. |
|
43
|
+
| Nested mutex acquisition | Minimise lock scope, avoid recursion | Design locking to avoid deadlocks. |
|
44
|
+
| C extensions blocking fibers | Use thread pool / offload blocking ops | Avoid blocking the event loop in async code. |
|
45
|
+
|
46
|
+
## Fibers vs Threads in Ruby
|
47
|
+
|
48
|
+
Fibers and threads are both primitives which allow for concurrent execution in Ruby. The main difference is that threads are preemptively scheduled by the operating system, while fibers are cooperatively scheduled by the Ruby interpreter. That makes fibers slightly more predictable in terms of execution order, but they still share many of the same safety concerns.
|
49
|
+
|
50
|
+
### Fibers
|
51
|
+
|
52
|
+
- **Cooperative multitasking** (usually) within a single thread.
|
53
|
+
- **No preemption** and greedy execution may cause latency issues if a fiber does not yield.
|
54
|
+
- **Explicit yield points** including I/O operations, `Fiber.yield`, `sleep`, etc.
|
55
|
+
- **Light weight context switching** due to user-space coroutine implementations.
|
56
|
+
- **Limited parallelism** if `rb_nogvl` operations can be offloaded to a worker pool.
|
57
|
+
|
58
|
+
### Threads
|
59
|
+
|
60
|
+
- **Preemptive multitasking** with native OS threads and within the Ruby thread scheduler.
|
61
|
+
- **Can be interrupted** at any point by the interpreter.
|
62
|
+
- **Expensive context switching** due operating system overheads and contention within the Ruby interpreter.
|
63
|
+
- **Limited parallelism** if `rb_nogvl` allows other threads to execute.
|
64
|
+
|
65
|
+
## Common patterns with potential issues
|
66
|
+
|
67
|
+
The most fundamental issue that underpins all "thread safety issues" is **shared mutable state**. That is because in the presence of multiple execution contexts, such as fibers or threads, shared mutable state creates a combinatorial explosion of possible execution paths, many of which may be undesirable or incorrect. Coordination primitives (like `Mutex`) exist to constrain the combinatorial explosion of possible program states, but they are not a silver bullet and can introduce their own issues like deadlocks, contention, and performance bottlenecks.
|
68
|
+
|
69
|
+
Therefore, the best practice is to avoid shared mutable state whenever possible. Isolation, immutability, and pure functions should be the default where possible, and shared mutable state should be the exception, not the rule.
|
70
|
+
|
71
|
+
### Shared mutable state
|
72
|
+
|
73
|
+
Shared mutable state, including class instance variables accessed by multiple threads or fibers, is problematic and should be avoided. This includes class instance variables, module variables, and any mutable objects that are shared across threads or fibers.
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
class CurrencyConverter
|
77
|
+
def initialize
|
78
|
+
@exchange_rates = {} # Issue: Shared mutable state
|
79
|
+
end
|
80
|
+
|
81
|
+
def update_rate(currency, rate)
|
82
|
+
# Issue: Multiple threads can modify @exchange_rates concurrently
|
83
|
+
@exchange_rates[currency] = rate
|
84
|
+
end
|
85
|
+
|
86
|
+
def convert(amount, from_currency, to_currency)
|
87
|
+
# Issue: If @exchange_rates is modified while this method runs, it can lead to incorrect conversions
|
88
|
+
rate = @exchange_rates[from_currency] / @exchange_rates[to_currency]
|
89
|
+
amount * rate
|
90
|
+
end
|
91
|
+
end
|
92
|
+
```
|
93
|
+
|
94
|
+
**Why is this problematic?**: Multiple threads or fibers can modify the shared state concurrently, leading to race conditions and inconsistent data.
|
95
|
+
|
96
|
+
#### Better alternatives
|
97
|
+
|
98
|
+
- Do not share mutable state across threads or fibers.
|
99
|
+
- Use immutable objects or pure functions that do not rely on shared mutable state.
|
100
|
+
- Use locking (`Mutex`) or concurrent data structures (if available) to protect shared mutable state.
|
101
|
+
|
102
|
+
### Class Variables with shared state
|
103
|
+
|
104
|
+
Class variables (`@@variable`) and class attributes (`class_attribute`) represent a design problem because they lack isolation and can lead to unexpected behavior if mutated. As they are shared across the entire inheritance hierarchy, they can cause "spooky action at a distance" where changes in one part of the codebase affect other parts in unexpected ways.
|
105
|
+
|
106
|
+
```ruby
|
107
|
+
class GlobalConfig
|
108
|
+
@@settings = {} # Issue: Class variables are shared across inheritance
|
109
|
+
|
110
|
+
def set(key, value)
|
111
|
+
@@settings[key] = value
|
112
|
+
end
|
113
|
+
|
114
|
+
def get(key)
|
115
|
+
@@settings[key]
|
116
|
+
end
|
117
|
+
end
|
118
|
+
|
119
|
+
class UserConfig < GlobalConfig
|
120
|
+
end
|
121
|
+
|
122
|
+
GlobalConfig.new.set(:foo, 42)
|
123
|
+
# Issue: UserConfig inherits from GlobalConfig, so it shares the same @@settings (lack of isolation):
|
124
|
+
UserConfig.new.get(:foo) # => 42
|
125
|
+
```
|
126
|
+
|
127
|
+
**Why is this problematic?**: Class variables and class instance variables are shared across the entire inheritance hierarchy, creating unnecessary coupling and making it difficult to reason about state changes. This can lead to unexpected behavior, especially in larger codebases or when using libraries that modify class variables.
|
128
|
+
|
129
|
+
#### Better alternatives
|
130
|
+
|
131
|
+
- Inject configuration or state through method parameters or constructor arguments.
|
132
|
+
- Simply avoid if possible.
|
133
|
+
|
134
|
+
### Lazy Initialization
|
135
|
+
|
136
|
+
Lazy initialization is a common pattern in Ruby, but the `||=` operator is not atomic and can lead to race conditions.
|
137
|
+
|
138
|
+
```ruby
|
139
|
+
class Loader
|
140
|
+
def self.data
|
141
|
+
@data ||= JSON.load_file('data.json')
|
142
|
+
end
|
143
|
+
end
|
144
|
+
```
|
145
|
+
|
146
|
+
**Why is this problematic?**: Multiple threads can see `@data` is `nil` simultaneously on shared mutable data. They will both call `JSON.load_file` concurrently, and each receive different instances of `@data` (althought only one will actually be assigned). This can lead to inconsistent data being used across threads or fibers.
|
147
|
+
|
148
|
+
This could cause situations where `self.data != self.data` for example, or modifications to `self.data` in one thread may be lost and not visible in another thread. It should also be noted that some operations are more likely to context switch, such as I/O operations, which could exacerbate this issue.
|
149
|
+
|
150
|
+
#### Potential fix with `Mutex`
|
151
|
+
|
152
|
+
```ruby
|
153
|
+
class Loader
|
154
|
+
@mutex = Mutex.new
|
155
|
+
|
156
|
+
def self.data
|
157
|
+
# Double-checked locking pattern:
|
158
|
+
return @data if @data
|
159
|
+
|
160
|
+
@mutex.synchronize do
|
161
|
+
return @data if @data
|
162
|
+
|
163
|
+
# Now we are sure that @data is nil, we can safely fetch it:
|
164
|
+
@data = JSON.load_file('data.json')
|
165
|
+
end
|
166
|
+
|
167
|
+
return @data
|
168
|
+
end
|
169
|
+
end
|
170
|
+
```
|
171
|
+
|
172
|
+
In addition, it should be noted that lazy initialization of a `Mutex` (and other synchronization primitives) is **always** a problem and should be avoided. This is because the `Mutex` itself may not be initialized when multiple threads attempt to access it concurrently, leading to multiple threads using different mutex instances:
|
173
|
+
|
174
|
+
```ruby
|
175
|
+
class Loader
|
176
|
+
def self.data
|
177
|
+
@mutex ||= Mutex.new # Issue: Not thread-safe
|
178
|
+
|
179
|
+
@mutex.synchronize do
|
180
|
+
# Double-checked locking pattern:
|
181
|
+
return @data if @data
|
182
|
+
|
183
|
+
# Now we are sure that @data is nil, we can safely fetch it:
|
184
|
+
@data = JSON.load_file('data.json')
|
185
|
+
end
|
186
|
+
|
187
|
+
return @data
|
188
|
+
end
|
189
|
+
end
|
190
|
+
```
|
191
|
+
|
192
|
+
#### Safe if instances are not shared
|
193
|
+
|
194
|
+
In the case that each instance is only accessed by a single thread or fiber, memoization can be safe:
|
195
|
+
|
196
|
+
```ruby
|
197
|
+
class Loader
|
198
|
+
def things
|
199
|
+
# Safe: each instance has its own @things
|
200
|
+
@things ||= compute_things
|
201
|
+
end
|
202
|
+
end
|
203
|
+
|
204
|
+
def do_something
|
205
|
+
loader = Loader.new
|
206
|
+
loader.things # Safe: only accessed by this thread/fiber
|
207
|
+
end
|
208
|
+
```
|
209
|
+
|
210
|
+
### Memoization with `Hash` caches
|
211
|
+
|
212
|
+
Like lazy initialization, memoization using `Hash` caches can lead to race conditions if not handled properly.
|
213
|
+
|
214
|
+
```ruby
|
215
|
+
class ExpensiveComputation
|
216
|
+
@cache = {}
|
217
|
+
|
218
|
+
def self.compute(key)
|
219
|
+
@cache[key] ||= expensive_operation(key) # Issue: Not thread-safe
|
220
|
+
end
|
221
|
+
end
|
222
|
+
```
|
223
|
+
|
224
|
+
**Why is this problematic?**: Multiple threads can see `@cache[key]` is `nil` simultaneously, leading to multiple calls to `expensive_operation(key)` which is both inefficient and can lead to inconsistent results if the operation is not idempotent.
|
225
|
+
|
226
|
+
#### Potential fix with `Mutex`
|
227
|
+
|
228
|
+
Note that this mutex creates contention on all calls to `compute`, which can be a performance bottleneck if the operation is expensive and called frequently.
|
229
|
+
|
230
|
+
```ruby
|
231
|
+
class ExpensiveComputation
|
232
|
+
@cache = {}
|
233
|
+
@mutex = Mutex.new
|
234
|
+
|
235
|
+
def self.compute(key)
|
236
|
+
@mutex.synchronize do
|
237
|
+
@cache[key] ||= expensive_operation(key)
|
238
|
+
end
|
239
|
+
end
|
240
|
+
end
|
241
|
+
```
|
242
|
+
|
243
|
+
#### Potential fix with `Concurrent::Map`
|
244
|
+
|
245
|
+
```ruby
|
246
|
+
class ExpensiveComputation
|
247
|
+
@cache = Concurrent::Map.new
|
248
|
+
|
249
|
+
def self.compute(key)
|
250
|
+
@cache.compute_if_absent(key) do
|
251
|
+
expensive_operation(key)
|
252
|
+
end
|
253
|
+
end
|
254
|
+
end
|
255
|
+
```
|
256
|
+
|
257
|
+
You should avoid `Concurrent::Hash` as it's just an alias for `Hash` and does not provide any thread-safety guarantees.
|
258
|
+
|
259
|
+
### Aggregating results with `Array`
|
260
|
+
|
261
|
+
Aggregating results from multiple threads or fibers using shared `Array` instance is generally safe in Ruby, but can lead to issues if you are trying to coordinate completion of multiple threads or fibers.
|
262
|
+
|
263
|
+
```ruby
|
264
|
+
done = []
|
265
|
+
threads = []
|
266
|
+
|
267
|
+
5.times do |i|
|
268
|
+
threads << Thread.new do
|
269
|
+
# Simulate some work
|
270
|
+
sleep(rand(0.1..0.5))
|
271
|
+
done << i
|
272
|
+
end
|
273
|
+
end
|
274
|
+
|
275
|
+
# Risk: The threads may not be finished, so `done` is likely incomplete!
|
276
|
+
puts "Done: #{done.inspect}"
|
277
|
+
```
|
278
|
+
|
279
|
+
**Why is this problematic?**: Trying to wait for the first item (or any subset) to be added to `done` can lead to faulty behaviour as there is no actual coordination between the threads and there is no real error handling. The threads are waited on in creation order, but the items in `done` may not be in the same order, or may not even be present at all if a thread is still running.
|
280
|
+
|
281
|
+
#### Potential fix with `Thread#join`
|
282
|
+
|
283
|
+
Using `Thread#join` ensures that all threads have completed before accessing the results:
|
284
|
+
|
285
|
+
```ruby
|
286
|
+
done = []
|
287
|
+
|
288
|
+
threads = 5.times.map do |i|
|
289
|
+
Thread.new do
|
290
|
+
# Simulate some work
|
291
|
+
sleep(rand(0.1..0.5))
|
292
|
+
done << i
|
293
|
+
end
|
294
|
+
end
|
295
|
+
|
296
|
+
threads.each(&:join) # Wait for all threads to complete
|
297
|
+
puts "Done: #{done.inspect}" # Output: Done: [0, 1, 2, 3, 4]
|
298
|
+
```
|
299
|
+
|
300
|
+
### Shared connections
|
301
|
+
|
302
|
+
Sharing network connections, database connections, or other resources across threads or fibers can lead to invalid state or unexpected behavior.
|
303
|
+
|
304
|
+
```ruby
|
305
|
+
client = Database.connect
|
306
|
+
|
307
|
+
Thread.new do
|
308
|
+
results = client.query("SELECT * FROM users")
|
309
|
+
end
|
310
|
+
|
311
|
+
Thread.new do
|
312
|
+
results = client.query("SELECT * FROM products")
|
313
|
+
end
|
314
|
+
```
|
315
|
+
|
316
|
+
**Why is this problematic?**: If the `client` is not thread-safe, and does not handle concurrent queries properly (e.g. by using a connection pool, or explicit multiplexing), it is unlikely that the above code will work as expected. It is possible that the queries will interfere with each other, leading to inconsistent results or even errors.
|
317
|
+
|
318
|
+
#### Potential fix with connection pools
|
319
|
+
|
320
|
+
Using a connection pool can help manage shared connections safely:
|
321
|
+
|
322
|
+
```ruby
|
323
|
+
require 'connection_pool'
|
324
|
+
pool = ConnectionPool.new(size: 5, timeout: 5) do
|
325
|
+
Database.connect
|
326
|
+
end
|
327
|
+
|
328
|
+
Thread.new do
|
329
|
+
pool.with do |client|
|
330
|
+
results = client.query("SELECT * FROM users")
|
331
|
+
end
|
332
|
+
end
|
333
|
+
|
334
|
+
Thread.new do
|
335
|
+
pool.with do |client|
|
336
|
+
results = client.query("SELECT * FROM products")
|
337
|
+
end
|
338
|
+
end
|
339
|
+
```
|
340
|
+
|
341
|
+
### Enumeration of shared mutable state
|
342
|
+
|
343
|
+
Enumerating shared mutable container (e.g. `Array` or `Hash`) can cause consistency issues if the state is modified during enumeration. This can lead to unexpected behavior, such as missing or duplicated elements.
|
344
|
+
|
345
|
+
```ruby
|
346
|
+
class SharedList
|
347
|
+
def initialize
|
348
|
+
@list = []
|
349
|
+
end
|
350
|
+
|
351
|
+
def add(item)
|
352
|
+
@list << item
|
353
|
+
end
|
354
|
+
|
355
|
+
def each(&block)
|
356
|
+
# Issue: Modifications during enumeration can lead to inconsistent state
|
357
|
+
@list.each(&block)
|
358
|
+
end
|
359
|
+
end
|
360
|
+
```
|
361
|
+
|
362
|
+
In addition, adding or deleting items from a list while iterating over it can lead to errors or unexpected behaviour.
|
363
|
+
|
364
|
+
**Why is this problematic?**: If another thread modifies `@list` while it is being enumerated, it can lead to missing or duplicated items, or even raise an error if the underlying data structure is modified during iteration.
|
365
|
+
|
366
|
+
#### Potential fix with `Mutex`
|
367
|
+
|
368
|
+
To ensure that the enumeration is safe, you can use a `Mutex` to synchronize access to the shared state:
|
369
|
+
|
370
|
+
```ruby
|
371
|
+
class SharedList
|
372
|
+
def initialize
|
373
|
+
@list = []
|
374
|
+
@mutex = Mutex.new
|
375
|
+
end
|
376
|
+
|
377
|
+
def add(item)
|
378
|
+
@mutex.synchronize do
|
379
|
+
@list << item
|
380
|
+
end
|
381
|
+
end
|
382
|
+
|
383
|
+
def each(&block)
|
384
|
+
@mutex.synchronize do
|
385
|
+
@list.each(&block)
|
386
|
+
end
|
387
|
+
end
|
388
|
+
end
|
389
|
+
```
|
390
|
+
|
391
|
+
#### Potential fix with deferred operations
|
392
|
+
|
393
|
+
Alternatively, you can defer operations that modify the shared state until after the enumeration is complete:
|
394
|
+
|
395
|
+
```ruby
|
396
|
+
stale = []
|
397
|
+
shared_list.each do |item|
|
398
|
+
if item.stale?
|
399
|
+
stale << item
|
400
|
+
end
|
401
|
+
end
|
402
|
+
|
403
|
+
stale.each do |item|
|
404
|
+
shared_list.remove(item)
|
405
|
+
end
|
406
|
+
```
|
407
|
+
|
408
|
+
Or better yet, use immutable data structures or pure functions that do not rely on shared mutable state:
|
409
|
+
|
410
|
+
```ruby
|
411
|
+
fresh = []
|
412
|
+
shared_list.each do |item|
|
413
|
+
fresh << item unless item.stale?
|
414
|
+
end
|
415
|
+
|
416
|
+
shared_list.replace(fresh) # Replace the entire list with a new one
|
417
|
+
```
|
418
|
+
|
419
|
+
### Internal Race Conditions
|
420
|
+
|
421
|
+
Race conditions occur when state changes in an unpredictable way due to concurrent access. This can happen with shared mutable state, lazy initialization, or any operation that modifies state without proper synchronization, leading to deadlocks or inconsistent data.
|
422
|
+
|
423
|
+
```ruby
|
424
|
+
while system.busy?
|
425
|
+
system.wait
|
426
|
+
end
|
427
|
+
```
|
428
|
+
|
429
|
+
**Why is this problematic?**: If, between the call to `system.busy?` and `system.wait`, another thread modifies the state of `system`, such that it is no longer busy, the current thread may wait indefinitely, leading to a deadlock.
|
430
|
+
|
431
|
+
#### Potential fix with `Mutex` and `ConditionVariable`
|
432
|
+
|
433
|
+
If you are able to modify the state transition logic of the shared resource, you can use a `Mutex` and `ConditionVariable` to ensure that the state is checked and modified atomically:
|
434
|
+
|
435
|
+
```ruby
|
436
|
+
class System
|
437
|
+
def initialize
|
438
|
+
@mutex = Mutex.new
|
439
|
+
@condition = ConditionVariable.new
|
440
|
+
@usage = 0
|
441
|
+
end
|
442
|
+
|
443
|
+
def release
|
444
|
+
@mutex.synchronize do
|
445
|
+
@usage -= 1
|
446
|
+
@condition.signal if @usage == 0
|
447
|
+
end
|
448
|
+
end
|
449
|
+
|
450
|
+
def wait_until_free
|
451
|
+
@mutex.synchronize do
|
452
|
+
while @usage > 0
|
453
|
+
@condition.wait(@mutex)
|
454
|
+
end
|
455
|
+
end
|
456
|
+
end
|
457
|
+
end
|
458
|
+
```
|
459
|
+
|
460
|
+
### External Race Conditions
|
461
|
+
|
462
|
+
External resources can also lead to "time of check to time of use" issues, where the state of the resource changes between checking its status and using it.
|
463
|
+
|
464
|
+
```ruby
|
465
|
+
if File.exist?('cache.json')
|
466
|
+
@data = File.read('cache.json')
|
467
|
+
else
|
468
|
+
@data = fetch_data_from_api
|
469
|
+
File.write('cache.json', @data)
|
470
|
+
end
|
471
|
+
```
|
472
|
+
|
473
|
+
**Why is this problematic?**: If another thread deletes `cache.json` after the check but before the read, the read will fail, leading to an error or inconsistent state.
|
474
|
+
|
475
|
+
This can apply to any external resource, such as files, databases, or network resources and can be extremely difficult to mitigate if proper synchronization is not available (e.g. database transactions).
|
476
|
+
|
477
|
+
#### Potential fix for external resources
|
478
|
+
|
479
|
+
Using content-addressable storage and atomic file operations can help avoid race conditions when accessing shared resources on the filesystem
|
480
|
+
|
481
|
+
```ruby
|
482
|
+
begin
|
483
|
+
File.read('cache.json')
|
484
|
+
rescue Errno::ENOENT
|
485
|
+
File.open('cache.json', 'w') do |file|
|
486
|
+
file.flock(File::LOCK_EX)
|
487
|
+
file.write(fetch_data_from_api)
|
488
|
+
end
|
489
|
+
end
|
490
|
+
```
|
491
|
+
|
492
|
+
Modern systems should generally avoid using the filesystem for shared state, and instead use a database or other persistent storage that supports transactions and atomic operations.
|
493
|
+
|
494
|
+
### Thread-local storage for "per-request" state
|
495
|
+
|
496
|
+
Using actual thread-local storage for "per-request" state can be problematic in Ruby, especially when using fibers. This is because fibers may share the same thread, leading to unexpected behavior if the thread-local is used when "per-request" state is expected.
|
497
|
+
|
498
|
+
```ruby
|
499
|
+
class RequestContext
|
500
|
+
def self.current
|
501
|
+
Thread.current.thread_variable_get(:request_context) ||
|
502
|
+
Thread.current.thread_variable_set(:request_context, Hash.new)
|
503
|
+
end
|
504
|
+
end
|
505
|
+
```
|
506
|
+
|
507
|
+
**Why is this problematic?**: If fibers are used for individual requests, they may share the same thread, leading to unexpected behavior when accessing `Thread.current.thread_variable_get(:request_context)`. This can result in data being shared across requests unintentionally, leading to data corruption or unexpected behavior.
|
508
|
+
|
509
|
+
In addition, some libraries may use `Thread.current` as a key in a hash or other data structure to store per-request state. This can be problematic for the same reason, since multiple requests may share the same thread and therefore the same key, leading to data being shared across requests unintentionally. This can be a problem for both concurrent and sequential requests, for example if the state is not cleaned up properly between requests, incorrect sharing of state can occur.
|
510
|
+
|
511
|
+
```ruby
|
512
|
+
class Pool
|
513
|
+
def initialize
|
514
|
+
@connections = {}
|
515
|
+
@mutex = Mutex.new
|
516
|
+
end
|
517
|
+
|
518
|
+
def current_connection
|
519
|
+
@mutex.synchronize do
|
520
|
+
@connections[Thread.current] ||= create_new_connection
|
521
|
+
end
|
522
|
+
end
|
523
|
+
end
|
524
|
+
```
|
525
|
+
|
526
|
+
#### Use `Thread.current` for per-request state
|
527
|
+
|
528
|
+
Despite the look, this is actually fiber-local and thus scoped to the smallest unit of concurrency in Ruby, which is the fiber. This means that it is safe to use `Thread.current` for per-request state, as long as you are aware that it is actually fiber-local storage.
|
529
|
+
|
530
|
+
```ruby
|
531
|
+
Thread.current[:connection] ||= create_new_connection
|
532
|
+
```
|
533
|
+
|
534
|
+
As a counter point, it not a good idea to use fiber-local storage for a cache, since it will never be shared.
|
535
|
+
|
536
|
+
#### Use `Fiber[key]` for per-request state
|
537
|
+
|
538
|
+
Using `Fiber[key]` can be a better alternative for per-request state as it is scoped to the fiber and is also inherited to child contexts.
|
539
|
+
|
540
|
+
```ruby
|
541
|
+
Fiber[:user_id] = request.session[:user_id] # Set per-request state
|
542
|
+
|
543
|
+
jobs.each do |job|
|
544
|
+
Thread.new do
|
545
|
+
puts "Processing job for user #{Fiber[:user_id]}"
|
546
|
+
# Do something with the job...
|
547
|
+
end
|
548
|
+
end
|
549
|
+
```
|
550
|
+
|
551
|
+
#### Use `Fiber.attr` for per-request state
|
552
|
+
|
553
|
+
As a direct alternative to `Thread.current`, with a slight performance advantage and readability improvement, you can use `Fiber.attr` to store per-request state. This is scoped to the fiber and is also inherited to child contexts.
|
554
|
+
|
555
|
+
```ruby
|
556
|
+
Fiber.attr :my_application_user_id
|
557
|
+
|
558
|
+
Fiber.current.my_application_user_id = request.session[:user_id] # Set per-request state
|
559
|
+
```
|
560
|
+
|
561
|
+
This state is not inherited to child fibers (or threads), so it's use is limited to the current fiber context. It should also be noted that the same technique can be used for threads, e.g. `Thread.attr`, but this has the same issues as `Thread.current.thread_variable_get/set`, since it is scoped to the thread and not the fiber.
|
562
|
+
|
563
|
+
### C extensions that block the scheduler
|
564
|
+
|
565
|
+
C extensions can block the Ruby scheduler, however the fiber scheduler has a higher risk of being blocked by C extensions than the thread scheduler. That is because `rb_nogvl` allows preemptive scheduling of threads, but fibers are not preemptively scheduled and must yield explicitly. This means that if a C extension blocks the fiber scheduler, it can lead to deadlocks or starvation of other fibers.
|
566
|
+
|
567
|
+
### Synchronization primitives
|
568
|
+
|
569
|
+
Synchronization primitives like `Mutex`, `ConditionVariable`, and `Queue` are essential for managing shared mutable state safely. However, they can introduce complexity and potential deadlocks if not used carefully.
|
570
|
+
|
571
|
+
```ruby
|
572
|
+
class Counter
|
573
|
+
def initialize(count = 0)
|
574
|
+
@count = count
|
575
|
+
@mutex = Mutex.new
|
576
|
+
end
|
577
|
+
|
578
|
+
def increment
|
579
|
+
@mutex.synchronize do
|
580
|
+
@count += 1
|
581
|
+
end
|
582
|
+
end
|
583
|
+
|
584
|
+
def times
|
585
|
+
@mutex.synchronize do
|
586
|
+
@count.times do |i|
|
587
|
+
yield i
|
588
|
+
end
|
589
|
+
end
|
590
|
+
end
|
591
|
+
end
|
592
|
+
|
593
|
+
counter = Counter.new
|
594
|
+
counter.times do |i|
|
595
|
+
counter.increment # deadlock
|
596
|
+
end
|
597
|
+
```
|
598
|
+
|
599
|
+
In general, it is known that `Mutex` can not be composed safely. However, with careful design, it is usually safe to use `Mutex` to protect shared mutable state, as long as you are aware of the potential for deadlocks and contention.
|
600
|
+
|
601
|
+
Using recursive mutexes is generally not recommended, as they can lead to complex and hard-to-debug issues. If you find yourself needing recursive locks, it may be a sign that you need to rethink your locking strategy or the design of your code.
|
602
|
+
|
603
|
+
#### Potential fix with `Mutex`
|
604
|
+
|
605
|
+
As an alternative to the above, reducing the scope of the lock can help avoid deadlocks and contention:
|
606
|
+
|
607
|
+
```ruby
|
608
|
+
class Counter
|
609
|
+
# ...
|
610
|
+
|
611
|
+
def times
|
612
|
+
count = @mutex.synchronize{@count}
|
613
|
+
|
614
|
+
# Avoid holding the lock while yielding to user code:
|
615
|
+
count.times do |i|
|
616
|
+
yield i
|
617
|
+
end
|
618
|
+
end
|
619
|
+
end
|
620
|
+
```
|
621
|
+
|
622
|
+
## Best Practices for Concurrency in Ruby
|
623
|
+
|
624
|
+
1. **Favor pure, isolated, and immutable objects and functions.**
|
625
|
+
The safest and easiest way to write concurrent code is to avoid shared mutable state entirely. Isolated objects and pure functions eliminate the risk of race conditions and make reasoning about code much simpler.
|
626
|
+
|
627
|
+
2. **Use per-request (or per-fiber) state correctly.**
|
628
|
+
When you need to associate state with a request, job, or fiber, prefer explicit context passing, or use fiber-local variables (e.g. `Fiber[:key]`). Avoid using thread-local storage in fiber-based code, as fibers may share threads and this can lead to subtle bugs.
|
629
|
+
|
630
|
+
3. **Use synchronization primitives only when sharing is truly necessary.**
|
631
|
+
If you must share mutable state (for performance, memory efficiency, or correctness), protect it with the appropriate synchronization primitives:
|
632
|
+
|
633
|
+
* Prefer high-level, lock-free data structures (e.g. `Concurrent::Map`) when possible.
|
634
|
+
* If locks are necessary, use fine-grained locking to minimize contention and reduce deadlock risk.
|
635
|
+
* Avoid coarse-grained locks except as a last resort, as they can severely limit concurrency and hurt performance.
|
636
|
+
|
637
|
+
### Hierarchy of Concurrency Safety
|
638
|
+
|
639
|
+
1. **No shared state** (ideal) Isolate state to each thread, fiber, or request—no coordination needed.
|
640
|
+
|
641
|
+
2. **Immutable shared state** (very good) Share only data that does not change after creation (constants, frozen objects, etc.).
|
642
|
+
|
643
|
+
3. **Synchronized mutable state** (only when unavoidable) Share mutable state only with robust synchronization.
|
644
|
+
|
645
|
+
#### When synchronization is needed:
|
646
|
+
|
647
|
+
* **Lock-free structures (e.g. `Concurrent::Map`)** Provide safe, concurrent access with high performance and minimal contention.
|
648
|
+
|
649
|
+
* **Fine-grained locks** Protect the smallest necessary scope of shared state; avoid holding locks while yielding or running untrusted code.
|
650
|
+
|
651
|
+
* **Coarse-grained locks** Protect large areas of code or many data structures at once; use sparingly as this reduces concurrency.
|
data/lib/async/version.rb
CHANGED
data/readme.md
CHANGED
@@ -21,16 +21,16 @@ Please see the [project documentation](https://socketry.github.io/async/) for mo
|
|
21
21
|
|
22
22
|
- [Getting Started](https://socketry.github.io/async/guides/getting-started/index) - This guide shows how to add async to your project and run code asynchronously.
|
23
23
|
|
24
|
-
- [Asynchronous Tasks](https://socketry.github.io/async/guides/asynchronous-tasks/index) - This guide explains how asynchronous tasks work and how to use them.
|
25
|
-
|
26
24
|
- [Scheduler](https://socketry.github.io/async/guides/scheduler/index) - This guide gives an overview of how the scheduler is implemented.
|
27
25
|
|
28
|
-
- [
|
26
|
+
- [Asynchronous Tasks](https://socketry.github.io/async/guides/tasks/index) - This guide explains how asynchronous tasks work and how to use them.
|
29
27
|
|
30
28
|
- [Best Practices](https://socketry.github.io/async/guides/best-practices/index) - This guide gives an overview of best practices for using Async.
|
31
29
|
|
32
30
|
- [Debugging](https://socketry.github.io/async/guides/debugging/index) - This guide explains how to debug issues with programs that use Async.
|
33
31
|
|
32
|
+
- [Thread safety](https://socketry.github.io/async/guides/thread-safety/index) - This guide explains thread safety in Ruby, focusing on fibers and threads, common pitfalls, and best practices to avoid problems like data corruption, race conditions, and deadlocks.
|
33
|
+
|
34
34
|
## Releases
|
35
35
|
|
36
36
|
Please see the [project releases](https://socketry.github.io/async/releases/index) for all releases.
|
data.tar.gz.sig
CHANGED
Binary file
|