uringmachine 0.22.0 → 0.22.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 74e4816d1191d862df3ba04d46cc038d04b999c22c5604d9a4eec0d1d3fd047c
4
- data.tar.gz: d857ba559f6c48dfc8d65a1812eb3996c7a65d70d263e016bbb96dbf99e6273c
3
+ metadata.gz: fbf0486feb49686a85ad162dd569ce125b4f168904e6f364e7c4358be234ca78
4
+ data.tar.gz: 27d0e3533c7d41f87b95376258992e37ebfa0d42c32ac0b46e7595939b3f1afa
5
5
  SHA512:
6
- metadata.gz: 662c0f7e07df7f87c759eb3e8001aa91c0682d55c63fc46e0429c5ac577de3e0f89476f93b0deb3e05fb3fba4daa4eaae767141615c9e1af1b35df8966f7d988
7
- data.tar.gz: d22cc49d99ef5772411ebdb8019b6d83eb9e944a83b0a327c638bb5eeaef5661ed3cc34dcb6673a26049e9f7f17cebe76306ca7b847cd4bc0e244c99dfafb210
6
+ metadata.gz: c4bf87080fe32e145e334944cc1871d109f5b11c5c3cb413c9de3f44cf09b9554ac3fabeb54650c18f67ed96d6381a0290ae8e461ff37b9fef3d27351675ff68
7
+ data.tar.gz: a53addafe3007de3ad77f3e42e4ca3b3f90ae372af24b00beff1b5b486fa4c1f5d6c84de22788192b05328c14c7eeec4e602a77623a00c5c2f8de5bdbd8dca7d
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ # 0.22.1 2025-12-11
2
+
3
+ - Comment out SIGCLD constant
4
+
1
5
  # 0.22.0 2025-12-10
2
6
 
3
7
  - Fix use of `um_yield` in statx, multishot ops
data/benchmark/README.md CHANGED
@@ -4,25 +4,26 @@ The following benchmarks measure the performance of UringMachine against stock
4
4
  Ruby in a variety of scenarios. For each scenario, we compare three different
5
5
  implementations:
6
6
 
7
- - **Threads**: thread-based concurrency using the stock Ruby I/O and
7
+ - `Threads`: thread-based concurrency using the stock Ruby I/O and
8
8
  synchronization classes.
9
9
 
10
- - **Async FS**: fiber-based concurrency with the
11
- [Async](https://github.com/socketry/async) fiber scheduler, using the stock
12
- Ruby I/O and synchronization classes.
10
+ - `ThreadPool`: thread pool consisting of 10 worker threads, receiving jobs
11
+ through a common queue.
13
12
 
14
- - **UM FS**: fiber-based concurrency with the UringMachine fiber scheduler,
15
- using the stock Ruby I/O and synchronization classes.
13
+ - `Async epoll`: fiber-based concurrency with
14
+ [Async](https://github.com/socketry/async) fiber scheduler, using an epoll
15
+ selector.
16
16
 
17
- - **UM pure**: fiber-based concurrency using the UringMachine low-level (pure)
18
- API.
17
+ - `Async uring`: fiber-based concurrency with Async fiber scheduler, using a
18
+ uring selector.
19
19
 
20
- - **UM sqpoll**: the same as **UM pure** with [submission queue
21
- polling](https://unixism.net/loti/tutorial/sq_poll.html).
20
+ - `UM FS`: fiber-based concurrency with UringMachine fiber scheduler.
21
+
22
+ - `UM`: fiber-based concurrency using the UringMachine low-level API.
22
23
 
23
24
  <img src="./chart.png">
24
25
 
25
- ## Observations:
26
+ ## Observations
26
27
 
27
28
  - We see the stark difference between thread-based and fiber-based concurrency.
28
29
  For I/O-bound workloads, there's really no contest - and that's exactly why
@@ -34,28 +35,37 @@ implementations:
34
35
  C-extension.
35
36
 
36
37
  - The UringMachine low-level API is faster to use in most cases, and its
37
- performance advantage grows with the level of concurrency.
38
-
39
- - SQ polling provides a performance advantage in high-concurrency scenarios,
40
- depending on the context. It remains to be seen how it affects performance in
41
- real-world situations.
38
+ performance advantage grows with the level of concurrency. Interestingly, when
39
+ performing CPU-bound work, it seems slightly slightly slower. This should be
40
+ investigated.
42
41
 
43
42
  - The [pg](https://github.com/ged/ruby-pg) gem supports the use of fiber
44
43
  schedulers, and there too we see a marked performance advantage to using
45
44
  fibers instead of threads.
46
45
 
46
+ According to these benchmarks, for I/O-bound scenarios the different fiber-based
47
+ implementations present a average speedup as follows:
48
+
49
+ |implementation|average factor|
50
+ |--------------|--------------|
51
+ |Async epoll |x2.36 |
52
+ |Async uring |x2.42 |
53
+ |UM FS |x2.85 |
54
+ |UM |x6.20 |
55
+
47
56
  ## 1. I/O - Pipe
48
57
 
49
58
  50 groups, where in each group we create a pipe with a pair of threads/fibers
50
59
  writing/reading 1KB of data to the pipe.
51
60
 
52
61
  ```
53
- C=50x2 user system total real
54
- Threads 2.501885 3.111840 5.613725 ( 5.017991)
55
- Async FS 1.189332 0.526275 1.715607 ( 1.715726)
56
- UM FS 0.715688 0.318851 1.034539 ( 1.034723)
57
- UM pure 0.241029 0.365079 0.606108 ( 0.606308)
58
- UM sqpoll 0.217577 0.634414 0.851991 ( 0.593531)
62
+ C=50x2 user system total real
63
+ Threads 2.105002 2.671980 4.776982 ( 4.272842)
64
+ ThreadPool 4.818014 10.740555 15.558569 ( 7.070236)
65
+ Async epoll 1.118937 0.254803 1.373740 ( 1.374298)
66
+ Async uring 1.363248 0.270063 1.633311 ( 1.633696)
67
+ UM FS 0.746332 0.183006 0.929338 ( 0.929619)
68
+ UM 0.237816 0.328352 0.566168 ( 0.566265)
59
69
  ```
60
70
 
61
71
  ## 2. I/O - Socketpair
@@ -64,12 +74,13 @@ UM sqpoll 0.217577 0.634414 0.851991 ( 0.593531)
64
74
  pair of threads/fibers writing/reading 1KB of data to the sockets.
65
75
 
66
76
  ```
67
- N=50 user system total real
68
- Threads 2.372753 3.612468 5.985221 ( 4.798625)
69
- Async FS 0.516226 0.877822 1.394048 ( 1.394266)
70
- UM FS 0.521360 0.875674 1.397034 ( 1.397327)
71
- UM pure 0.239353 0.642498 0.881851 ( 0.881962)
72
- UM sqpoll 0.220933 1.021997 1.242930 ( 0.976198)
77
+ C=50x2 user system total real
78
+ Threads 2.068122 3.247781 5.315903 ( 4.295488)
79
+ ThreadPool 2.283882 3.461607 5.745489 ( 4.650422)
80
+ Async epoll 0.381400 0.846445 1.227845 ( 1.227983)
81
+ Async uring 0.472526 0.821467 1.293993 ( 1.294166)
82
+ UM FS 0.443023 0.734334 1.177357 ( 1.177576)
83
+ UM 0.116995 0.675997 0.792992 ( 0.793183)
73
84
  ```
74
85
 
75
86
  ## 3. Mutex - CPU-bound
@@ -78,12 +89,12 @@ UM sqpoll 0.220933 1.021997 1.242930 ( 0.976198)
78
89
  threads/fibers locking the mutex and performing a Regexp match.
79
90
 
80
91
  ```
81
- N=20 user system total real
82
- Threads 5.348378 0.021847 5.370225 ( 5.362117)
83
- Async FS 5.519970 0.003964 5.523934 ( 5.524536)
84
- UM FS 5.505282 0.003983 5.509265 ( 5.509840)
85
- UM pure 5.607048 0.002991 5.610039 ( 5.610749)
86
- UM sqpoll 5.437836 5.418316 10.856152 ( 5.443331)
92
+ C=20x10 user system total real
93
+ Threads 5.174998 0.024885 5.199883 ( 5.193211)
94
+ Async epoll 5.309793 0.000949 5.310742 ( 5.311217)
95
+ Async uring 5.341404 0.004860 5.346264 ( 5.346963)
96
+ UM FS 5.363719 0.001976 5.365695 ( 5.366254)
97
+ UM 5.351073 0.005986 5.357059 ( 5.357602)
87
98
  ```
88
99
 
89
100
  ## 4. Mutex - I/O-bound
@@ -93,81 +104,36 @@ start 10 worker threads/fibers locking the mutex and writing 1KB chunks to the
93
104
  file.
94
105
 
95
106
  ```
96
- N=1 user system total real
97
- Threads 0.044103 0.057831 0.101934 ( 0.087204)
98
- Async FS 0.050608 0.084449 0.135057 ( 0.121300)
99
- UM FS 0.030355 0.077069 0.107424 ( 0.108146)
100
- UM pure 0.024489 0.086201 0.110690 ( 0.108023)
101
- UM sqpoll 0.022752 0.225133 0.247885 ( 0.136251)
102
-
103
- N=5 user system total real
104
- Threads 0.214296 0.384078 0.598374 ( 0.467425)
105
- Async FS 0.085820 0.158782 0.244602 ( 0.139766)
106
- UM FS 0.064279 0.147278 0.211557 ( 0.117488)
107
- UM pure 0.036478 0.182950 0.219428 ( 0.119745)
108
- UM sqpoll 0.036929 0.347573 0.384502 ( 0.160814)
109
-
110
- N=10 user system total real
111
- Threads 0.435688 0.752219 1.187907 ( 0.924561)
112
- Async FS 0.126573 0.303704 0.430277 ( 0.234900)
113
- UM FS 0.128427 0.215204 0.343631 ( 0.184074)
114
- UM pure 0.065522 0.359659 0.425181 ( 0.192385)
115
- UM sqpoll 0.076810 0.477429 0.554239 ( 0.210087)
116
-
117
- N=20 user system total real
118
- Threads 0.830763 1.585299 2.416062 ( 1.868194)
119
- Async FS 0.291823 0.644043 0.935866 ( 0.507887)
120
- UM FS 0.226202 0.460401 0.686603 ( 0.362879)
121
- UM pure 0.120524 0.616274 0.736798 ( 0.332182)
122
- UM sqpoll 0.177150 0.849890 1.027040 ( 0.284069)
123
-
124
- N=50 user system total real
125
- Threads 2.124048 4.182537 6.306585 ( 4.878387)
126
- Async FS 0.897134 1.268629 2.165763 ( 1.254624)
127
- UM FS 0.733193 0.971821 1.705014 ( 0.933749)
128
- UM pure 0.226431 1.504441 1.730872 ( 0.760731)
129
- UM sqpoll 0.557310 2.107389 2.664699 ( 0.783992)
130
-
131
- N=100 user system total real
132
- Threads 4.420832 8.628756 13.049588 ( 10.264590)
133
- Async FS 2.557661 2.532998 5.090659 ( 3.179336)
134
- UM FS 2.262136 1.912055 4.174191 ( 2.523789)
135
- UM pure 0.633897 2.793998 3.427895 ( 1.612989)
136
- UM sqpoll 1.119460 4.193703 5.313163 ( 1.525968)
107
+ C=50x10 user system total real
108
+ Threads 2.042649 3.441547 5.484196 ( 4.328783)
109
+ Async epoll 0.810375 0.744084 1.554459 ( 1.554726)
110
+ Async uring 0.854985 1.129260 1.984245 ( 1.140749)
111
+ UM FS 0.686329 0.872376 1.558705 ( 0.845214)
112
+ UM 0.250370 1.323227 1.573597 ( 0.720928)
137
113
  ```
138
114
 
139
- ## 5. Queue
115
+ ## 5. Postgres client
140
116
 
141
- 20 concurrent groups, where in each group we create a queue, start 5 producer
142
- threads/fibers that push items to the queue, and 10 consumer threads/fibers that
143
- pull items from the queue.
117
+ C concurrent threads/fibers, each thread issuing SELECT query to a PG database.
144
118
 
145
119
  ```
146
- N=20 user system total real
147
- Threads 2.522270 0.125569 2.647839 ( 2.638276)
148
- Async FS 2.245917 0.044860 2.290777 ( 2.291068)
149
- UM FS 2.235130 0.000958 2.236088 ( 2.236392)
150
- UM pure 2.125827 0.225050 2.350877 ( 2.351347)
151
- UM sqpoll 2.044662 2.460344 4.505006 ( 2.261502)
120
+ C=50 user system total real
121
+ Threads 4.304292 1.358116 5.662408 ( 4.795725)
122
+ Async epoll 2.890160 0.432836 3.322996 ( 3.334350)
123
+ Async uring 2.818439 0.433896 3.252335 ( 3.252799)
124
+ UM FS 2.819371 0.443182 3.262553 ( 3.264606)
152
125
  ```
126
+ ## 6. Queue
153
127
 
154
- ## 6. Postgres client
155
-
156
- C concurrent threads/fiber, each thread issuing SELECT query to a PG database.
128
+ 20 concurrent groups, where in each group we create a queue, start 5 producer
129
+ threads/fibers that push items to the queue, and 10 consumer threads/fibers that
130
+ pull items from the queue.
157
131
 
158
132
  ```
159
- C=10 user system total real
160
- Threads 0.813844 0.358261 1.172105 ( 0.987320)
161
- Async FS 0.545493 0.098608 0.644101 ( 0.644636)
162
- UM FS 0.523503 0.094336 0.617839 ( 0.619250)
163
-
164
- C=20 user system total real
165
- Threads 1.652901 0.714299 2.367200 ( 2.014781)
166
- Async FS 1.136826 0.212991 1.349817 ( 1.350544)
167
- UM FS 1.084873 0.205865 1.290738 ( 1.291865)
168
-
169
- C=50 user system total real
170
- Threads 4.410604 1.804900 6.215504 ( 5.253016)
171
- Async FS 2.918522 0.507981 3.426503 ( 3.427966)
172
- UM FS 2.789549 0.537269 3.326818 ( 3.329802)
133
+ C=20x(5+10) user system total real
134
+ Threads 4.880983 0.207451 5.088434 ( 5.071019)
135
+ Async epoll 4.107208 0.006519 4.113727 ( 4.114227)
136
+ Async uring 4.206283 0.028974 4.235257 ( 4.235705)
137
+ UM FS 4.082394 0.001719 4.084113 ( 4.084522)
138
+ UM 4.099893 0.323569 4.423462 ( 4.424089)
173
139
  ```
data/benchmark/chart.png CHANGED
Binary file
data/benchmark/common.rb CHANGED
@@ -118,6 +118,8 @@ class UMBenchmark
118
118
  fds = []
119
119
  do_um(machine, fibers, fds)
120
120
  machine.await_fibers(fibers)
121
+ puts "UM:"
122
+ p machine.metrics
121
123
  fds.each { machine.close(it) }
122
124
  end
123
125
 
@@ -128,6 +130,8 @@ class UMBenchmark
128
130
  do_um(machine, fibers, fds)
129
131
  machine.await_fibers(fibers)
130
132
  fds.each { machine.close_async(it) }
133
+ puts "UM sqpoll:"
134
+ p machine.metrics
131
135
  machine.snooze
132
136
  end
133
137
  end
data/ext/um/um_const.c CHANGED
@@ -425,7 +425,7 @@ void um_define_net_constants(VALUE mod) {
425
425
  DEF_CONST_INT(mod, SIGTSTP);
426
426
  DEF_CONST_INT(mod, SIGCONT);
427
427
  DEF_CONST_INT(mod, SIGCHLD);
428
- DEF_CONST_INT(mod, SIGCLD);
428
+ // DEF_CONST_INT(mod, SIGCLD);
429
429
  DEF_CONST_INT(mod, SIGTTIN);
430
430
  DEF_CONST_INT(mod, SIGTTOU);
431
431
  DEF_CONST_INT(mod, SIGIO);
data/grant-2025/tasks.md CHANGED
@@ -152,7 +152,7 @@
152
152
  - When finally idle, calls `Process.warmup`
153
153
  - Starts replacing sibling workers with forked workers
154
154
  see also: https://www.youtube.com/watch?v=kAW5O2dkSU8
155
- - [ ] Each worker is single-threaded (except for auxiliary threads)
155
+ - [ ] Each worker is single-threaded (except for worker threads)
156
156
  - [ ] Rack 3.0-compatible
157
157
  see: https://github.com/socketry/protocol-rack
158
158
  - [ ] Rails integration (Railtie)
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class UringMachine
4
- VERSION = '0.22.0'
4
+ VERSION = '0.22.1'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: uringmachine
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.22.0
4
+ version: 0.22.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sharon Rosner