turbopipe 1.2.3__cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of turbopipe might be problematic. Click here for more details.

turbopipe/__init__.py ADDED
@@ -0,0 +1,29 @@
1
+ from typing import Optional, Union
2
+
3
+ from moderngl import Buffer
4
+
5
+ from turbopipe import _turbopipe
6
+
7
+ __all__ = [
8
+ "pipe",
9
+ "sync",
10
+ "close"
11
+ ]
12
+
13
+ def pipe(buffer: Union[Buffer, memoryview], fileno: int) -> None:
14
+ """Pipe a buffer contents to a file descriptor, fast and threaded"""
15
+ if isinstance(buffer, Buffer):
16
+ buffer = memoryview(buffer.mglo)
17
+ _turbopipe.pipe(buffer, fileno)
18
+ del buffer
19
+
20
+ def sync(buffer: Optional[Union[Buffer, memoryview]]=None) -> None:
21
+ """Wait for pending operations on a buffer to finish"""
22
+ if isinstance(buffer, Buffer):
23
+ buffer = memoryview(buffer.mglo)
24
+ _turbopipe.sync(buffer)
25
+ del buffer
26
+
27
+ def close() -> None:
28
+ """Syncs and deletes objects"""
29
+ _turbopipe.close()
@@ -0,0 +1,379 @@
1
+ Metadata-Version: 2.4
2
+ Name: turbopipe
3
+ Version: 1.2.3
4
+ Summary: 🌀 Faster ModernGL Buffers inter-process data transfers for subprocesses
5
+ Author-Email: Tremeschin <29046864+Tremeschin@users.noreply.github.com>
6
+ License-Expression: MIT
7
+ Project-URL: GitHub, https://github.com/BrokenSource/TurboPipe
8
+ Project-URL: Changelog, https://brokensrc.dev/about/changelog
9
+ Project-URL: Funding, https://brokensrc.dev/about/sponsors
10
+ Project-URL: Contact, https://brokensrc.dev/about/contact
11
+ Project-URL: Homepage, https://brokensrc.dev
12
+ Requires-Python: >=3.7
13
+ Requires-Dist: moderngl
14
+ Description-Content-Type: text/markdown
15
+
16
+ <div align="center">
17
+ <a href="https://brokensrc.dev/"><img src="https://raw.githubusercontent.com/BrokenSource/TurboPipe/main/turbopipe/resources/images/turbopipe.png" width="200"></a>
18
+ <h1>TurboPipe</h1>
19
+ Faster <a href="https://github.com/moderngl/moderngl"><b>ModernGL Buffers</b></a> inter-process data transfers for subprocesses
20
+ <br>
21
+ <br>
22
+ <a href="https://pypi.org/project/turbopipe/"><img src="https://img.shields.io/pypi/v/turbopipe?label=PyPI&color=blue"></a>
23
+ <a href="https://pypi.org/project/turbopipe/"><img src="https://img.shields.io/pypi/dw/turbopipe?label=Installs&color=blue"></a>
24
+ <a href="https://github.com/BrokenSource/TurboPipe"><img src="https://img.shields.io/github/v/tag/BrokenSource/TurboPipe?label=GitHub&color=orange"></a>
25
+ <a href="https://github.com/BrokenSource/TurboPipe/stargazers"><img src="https://img.shields.io/github/stars/BrokenSource/TurboPipe?label=Stars&style=flat&color=orange"></a>
26
+ <a href="https://discord.gg/KjqvcYwRHm"><img src="https://img.shields.io/discord/1184696441298485370?label=Discord&style=flat&color=purple"></a>
27
+ </div>
28
+
29
+ <br>
30
+
31
+ # 🔥 Description
32
+
33
+ > TurboPipe speeds up sending raw bytes from `moderngl.Buffer` objects primarily to `FFmpeg` subprocess
34
+
35
+ The **optimizations** involved are:
36
+
37
+ - **Zero-copy**: Avoid unnecessary memory copies or allocation (intermediate `buffer.read`)
38
+ - **C++**: The core of TurboPipe is written in C++ for speed, efficiency and low-level control
39
+ - **Threaded**:
40
+ - Doesn't block Python code execution, allows to render next frame
41
+ - Decouples the main thread from the I/O thread for performance
42
+ - **Chunks**: Write in chunks of 4096 bytes (RAM page size), so the hardware is happy (Unix)
43
+
44
+ ✅ Don't worry, there's proper **safety** in place. TurboPipe will block Python if a memory address is already queued for writing, and guarantees order of writes per file-descriptor. Just call `.sync()` when done 😉
45
+
46
+ <sub>Also check out [**ShaderFlow**](https://github.com/BrokenSource/ShaderFlow), where **TurboPipe** shines! 😉</sub>
47
+
48
+ <br>
49
+
50
+ # 📦 Installation
51
+
52
+ It couldn't be easier! Just install the [**`turbopipe`**](https://pypi.org/project/turbopipe/) package from PyPI:
53
+
54
+ ```bash
55
+ # With pip (https://pip.pypa.io/)
56
+ pip install turbopipe
57
+
58
+ # With Poetry (https://python-poetry.org/)
59
+ poetry add turbopipe
60
+
61
+ # With PDM (https://pdm-project.org/en/latest/)
62
+ pdm add turbopipe
63
+
64
+ # With Rye (https://rye.astral.sh/)
65
+ rye add turbopipe
66
+ ```
67
+
68
+ <br>
69
+
70
+ # 🚀 Usage
71
+
72
+ See also the [**Examples**](https://github.com/BrokenSource/TurboPipe/tree/main/examples) folder for comparisons, and [**ShaderFlow**](https://github.com/BrokenSource/ShaderFlow/blob/main/ShaderFlow/Exporting.py)'s usage of it!
73
+
74
+ ```python
75
+ import subprocess
76
+
77
+ import moderngl
78
+ import turbopipe
79
+
80
+ # Create ModernGL objects and proxy buffers
81
+ ctx = moderngl.create_standalone_context()
82
+ width, height, duration, fps = (1920, 1080, 10, 60)
83
+ buffers = [
84
+ ctx.buffer(reserve=(width*height*3))
85
+ for _ in range(nbuffers := 2)
86
+ ]
87
+
88
+ # Create your FBO, Textures, Shaders, etc.
89
+
90
+ # Make sure resolution, pixel format matches!
91
+ ffmpeg = subprocess.Popen((
92
+ "ffmpeg",
93
+ "-f", "rawvideo",
94
+ "-pix_fmt", "rgb24",
95
+ "-r", str(fps),
96
+ "-s", f"{width}x{height}",
97
+ "-i", "-",
98
+ "-f", "null",
99
+ "output.mp4"
100
+ ), stdin=subprocess.PIPE)
101
+
102
+ # Rendering loop of yours
103
+ for frame in range(duration*fps):
104
+ buffer = buffers[frame % nbuffers]
105
+
106
+ # Wait queued writes before copying
107
+ turbopipe.sync(buffer)
108
+ fbo.read_into(buffer)
109
+
110
+ # Doesn't lock the GIL, writes in parallel
111
+ turbopipe.pipe(buffer, ffmpeg.stdin.fileno())
112
+
113
+ # Wait for queued writes, clean memory
114
+ for buffer in buffers:
115
+ turbopipe.sync(buffer)
116
+ buffer.release()
117
+
118
+ # Signal stdin stream is done
119
+ ffmpeg.stdin.close()
120
+
121
+ # wait for encoding to finish
122
+ ffmpeg.wait()
123
+
124
+ # Warn: Albeit rare, only call close when no other data
125
+ # write is pending, as it might skip a frame or halt
126
+ turbopipe.close()
127
+ ```
128
+
129
+ <br>
130
+
131
+ # ⭐️ Benchmarks
132
+
133
+ > [!NOTE]
134
+ > **The tests conditions are as follows**:
135
+ > - The tests are the average of 3 runs to ensure consistency, with 5 GB of the same data being piped
136
+ > - These aren't tests of render speed; but rather the throughput speed of GPU -> CPU -> RAM -> IPC
137
+ > - All resolutions are wide-screen (16:9) and have 3 components (RGB) with 3 bytes per pixel (SDR)
138
+ > - The data is a random noise per-buffer between 128-135. So, multi-buffers runs are a noise video
139
+ > - Multi-buffer cycles through a list of buffer (eg. 1, 2, 3, 1, 2, 3... for 3-buffers)
140
+ > - All FFmpeg outputs are scrapped with `-f null -` to avoid any disk I/O bottlenecks
141
+ > - The `gain` column is the percentage increase over the standard method
142
+ > - When `x264` is Null, no encoding took place (passthrough)
143
+ > - The test cases emoji signifies:
144
+ > - 🐢: Standard `ffmpeg.stdin.write(buffer.read())` on just the main thread, pure Python
145
+ > - 🚀: Threaded `ffmpeg.stdin.write(buffer.read())` with a queue (similar to turbopipe)
146
+ > - 🌀: The magic of `turbopipe.pipe(buffer, ffmpeg.stdin.fileno())`
147
+ >
148
+ > Also see [`benchmark.py`](https://github.com/BrokenSource/TurboPipe/blob/main/examples/benchmark.py) for the implementation
149
+
150
+ ✅ Check out benchmarks in a couple of systems below:
151
+
152
+ 📦 TurboPipe v1.0.4:
153
+
154
+ <details>
155
+ <summary><b>Desktop</b> • (AMD Ryzen 9 5900x) • (NVIDIA RTX 3060 12 GB) • (DDR4 2x32 GB 3200 MT/s) • (Arch Linux)</summary>
156
+ <br>
157
+
158
+ <b>Note:</b> I have noted inconsistency across tests, specially at lower resolutions. Some 720p runs might peak at 2900 fps and stay there, while others are limited by 1750 fps. I'm not sure if it's the Linux EEVDF scheduler, or CPU Topology that causes this. Nevertheless, results are stable on Windows 11 on the same machine.
159
+
160
+ | 720p | x264 | Buffers | Framerate | Bandwidth | Gain |
161
+ |:----:|:----------|:---------:|----------:|------------:|---------:|
162
+ | 🐢 | Null | 1 | 882 fps | 2.44 GB/s | |
163
+ | 🚀 | Null | 1 | 793 fps | 2.19 GB/s | -10.04% |
164
+ | 🌀 | Null | 1 | 1911 fps | 5.28 GB/s | 116.70% |
165
+ | 🐢 | Null | 4 | 857 fps | 2.37 GB/s | |
166
+ | 🚀 | Null | 4 | 891 fps | 2.47 GB/s | 4.05% |
167
+ | 🌀 | Null | 4 | 2309 fps | 6.38 GB/s | 169.45% |
168
+ | 🐢 | ultrafast | 4 | 714 fps | 1.98 GB/s | |
169
+ | 🚀 | ultrafast | 4 | 670 fps | 1.85 GB/s | -6.10% |
170
+ | 🌀 | ultrafast | 4 | 1093 fps | 3.02 GB/s | 53.13% |
171
+ | 🐢 | slow | 4 | 206 fps | 0.57 GB/s | |
172
+ | 🚀 | slow | 4 | 208 fps | 0.58 GB/s | 1.37% |
173
+ | 🌀 | slow | 4 | 214 fps | 0.59 GB/s | 3.93% |
174
+
175
+ | 1080p | x264 | Buffers | Framerate | Bandwidth | Gain |
176
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
177
+ | 🐢 | Null | 1 | 410 fps | 2.55 GB/s | |
178
+ | 🚀 | Null | 1 | 399 fps | 2.48 GB/s | -2.60% |
179
+ | 🌀 | Null | 1 | 794 fps | 4.94 GB/s | 93.80% |
180
+ | 🐢 | Null | 4 | 390 fps | 2.43 GB/s | |
181
+ | 🚀 | Null | 4 | 391 fps | 2.43 GB/s | 0.26% |
182
+ | 🌀 | Null | 4 | 756 fps | 4.71 GB/s | 94.01% |
183
+ | 🐢 | ultrafast | 4 | 269 fps | 1.68 GB/s | |
184
+ | 🚀 | ultrafast | 4 | 272 fps | 1.70 GB/s | 1.48% |
185
+ | 🌀 | ultrafast | 4 | 409 fps | 2.55 GB/s | 52.29% |
186
+ | 🐢 | slow | 4 | 115 fps | 0.72 GB/s | |
187
+ | 🚀 | slow | 4 | 118 fps | 0.74 GB/s | 3.40% |
188
+ | 🌀 | slow | 4 | 119 fps | 0.75 GB/s | 4.34% |
189
+
190
+ | 1440p | x264 | Buffers | Framerate | Bandwidth | Gain |
191
+ |:-----:|:----------|:---------:|----------:|------------:|---------:|
192
+ | 🐢 | Null | 1 | 210 fps | 2.33 GB/s | |
193
+ | 🚀 | Null | 1 | 239 fps | 2.64 GB/s | 13.84% |
194
+ | 🌀 | Null | 1 | 534 fps | 5.91 GB/s | 154.32% |
195
+ | 🐢 | Null | 4 | 219 fps | 2.43 GB/s | |
196
+ | 🚀 | Null | 4 | 231 fps | 2.56 GB/s | 5.64% |
197
+ | 🌀 | Null | 4 | 503 fps | 5.56 GB/s | 129.75% |
198
+ | 🐢 | ultrafast | 4 | 141 fps | 1.56 GB/s | |
199
+ | 🚀 | ultrafast | 4 | 150 fps | 1.67 GB/s | 6.92% |
200
+ | 🌀 | ultrafast | 4 | 226 fps | 2.50 GB/s | 60.37% |
201
+ | 🐢 | slow | 4 | 72 fps | 0.80 GB/s | |
202
+ | 🚀 | slow | 4 | 71 fps | 0.79 GB/s | -0.70% |
203
+ | 🌀 | slow | 4 | 75 fps | 0.83 GB/s | 4.60% |
204
+
205
+ | 2160p | x264 | Buffers | Framerate | Bandwidth | Gain |
206
+ |:-----:|:----------|:---------:|----------:|------------:|---------:|
207
+ | 🐢 | Null | 1 | 81 fps | 2.03 GB/s | |
208
+ | 🚀 | Null | 1 | 107 fps | 2.67 GB/s | 32.26% |
209
+ | 🌀 | Null | 1 | 213 fps | 5.31 GB/s | 163.47% |
210
+ | 🐢 | Null | 4 | 87 fps | 2.18 GB/s | |
211
+ | 🚀 | Null | 4 | 109 fps | 2.72 GB/s | 25.43% |
212
+ | 🌀 | Null | 4 | 212 fps | 5.28 GB/s | 143.72% |
213
+ | 🐢 | ultrafast | 4 | 59 fps | 1.48 GB/s | |
214
+ | 🚀 | ultrafast | 4 | 67 fps | 1.68 GB/s | 14.46% |
215
+ | 🌀 | ultrafast | 4 | 95 fps | 2.39 GB/s | 62.66% |
216
+ | 🐢 | slow | 4 | 37 fps | 0.94 GB/s | |
217
+ | 🚀 | slow | 4 | 43 fps | 1.07 GB/s | 16.22% |
218
+ | 🌀 | slow | 4 | 44 fps | 1.11 GB/s | 20.65% |
219
+
220
+ </details>
221
+
222
+ <details>
223
+ <summary><b>Desktop</b> • (AMD Ryzen 9 5900x) • (NVIDIA RTX 3060 12 GB) • (DDR4 2x32 GB 3200 MT/s) • (Windows 11)</summary>
224
+ <br>
225
+
226
+ | 720p | x264 | Buffers | Framerate | Bandwidth | Gain |
227
+ |:----:|:----------|:---------:|----------:|------------:|--------:|
228
+ | 🐢 | Null | 1 | 981 fps | 2.71 GB/s | |
229
+ | 🚀 | Null | 1 | 1145 fps | 3.17 GB/s | 16.74% |
230
+ | 🌀 | Null | 1 | 1504 fps | 4.16 GB/s | 53.38% |
231
+ | 🐢 | Null | 4 | 997 fps | 2.76 GB/s | |
232
+ | 🚀 | Null | 4 | 1117 fps | 3.09 GB/s | 12.08% |
233
+ | 🌀 | Null | 4 | 1467 fps | 4.06 GB/s | 47.14% |
234
+ | 🐢 | ultrafast | 4 | 601 fps | 1.66 GB/s | |
235
+ | 🚀 | ultrafast | 4 | 616 fps | 1.70 GB/s | 2.57% |
236
+ | 🌀 | ultrafast | 4 | 721 fps | 1.99 GB/s | 20.04% |
237
+ | 🐢 | slow | 4 | 206 fps | 0.57 GB/s | |
238
+ | 🚀 | slow | 4 | 206 fps | 0.57 GB/s | 0.39% |
239
+ | 🌀 | slow | 4 | 206 fps | 0.57 GB/s | 0.13% |
240
+
241
+ | 1080p | x264 | Buffers | Framerate | Bandwidth | Gain |
242
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
243
+ | 🐢 | Null | 1 | 451 fps | 2.81 GB/s | |
244
+ | 🚀 | Null | 1 | 542 fps | 3.38 GB/s | 20.31% |
245
+ | 🌀 | Null | 1 | 711 fps | 4.43 GB/s | 57.86% |
246
+ | 🐢 | Null | 4 | 449 fps | 2.79 GB/s | |
247
+ | 🚀 | Null | 4 | 518 fps | 3.23 GB/s | 15.48% |
248
+ | 🌀 | Null | 4 | 614 fps | 3.82 GB/s | 36.83% |
249
+ | 🐢 | ultrafast | 4 | 262 fps | 1.64 GB/s | |
250
+ | 🚀 | ultrafast | 4 | 266 fps | 1.66 GB/s | 1.57% |
251
+ | 🌀 | ultrafast | 4 | 319 fps | 1.99 GB/s | 21.88% |
252
+ | 🐢 | slow | 4 | 119 fps | 0.74 GB/s | |
253
+ | 🚀 | slow | 4 | 121 fps | 0.76 GB/s | 2.46% |
254
+ | 🌀 | slow | 4 | 121 fps | 0.75 GB/s | 1.90% |
255
+
256
+ | 1440p | x264 | Buffers | Framerate | Bandwidth | Gain |
257
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
258
+ | 🐢 | Null | 1 | 266 fps | 2.95 GB/s | |
259
+ | 🚀 | Null | 1 | 308 fps | 3.41 GB/s | 15.87% |
260
+ | 🌀 | Null | 1 | 402 fps | 4.45 GB/s | 51.22% |
261
+ | 🐢 | Null | 4 | 276 fps | 3.06 GB/s | |
262
+ | 🚀 | Null | 4 | 307 fps | 3.40 GB/s | 11.32% |
263
+ | 🌀 | Null | 4 | 427 fps | 4.73 GB/s | 54.86% |
264
+ | 🐢 | ultrafast | 4 | 152 fps | 1.68 GB/s | |
265
+ | 🚀 | ultrafast | 4 | 156 fps | 1.73 GB/s | 3.02% |
266
+ | 🌀 | ultrafast | 4 | 181 fps | 2.01 GB/s | 19.36% |
267
+ | 🐢 | slow | 4 | 77 fps | 0.86 GB/s | |
268
+ | 🚀 | slow | 4 | 79 fps | 0.88 GB/s | 3.27% |
269
+ | 🌀 | slow | 4 | 80 fps | 0.89 GB/s | 4.86% |
270
+
271
+ | 2160p | x264 | Buffers | Framerate | Bandwidth | Gain |
272
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
273
+ | 🐢 | Null | 1 | 134 fps | 3.35 GB/s | |
274
+ | 🚀 | Null | 1 | 152 fps | 3.81 GB/s | 14.15% |
275
+ | 🌀 | Null | 1 | 221 fps | 5.52 GB/s | 65.44% |
276
+ | 🐢 | Null | 4 | 135 fps | 3.36 GB/s | |
277
+ | 🚀 | Null | 4 | 151 fps | 3.76 GB/s | 11.89% |
278
+ | 🌀 | Null | 4 | 220 fps | 5.49 GB/s | 63.34% |
279
+ | 🐢 | ultrafast | 4 | 66 fps | 1.65 GB/s | |
280
+ | 🚀 | ultrafast | 4 | 70 fps | 1.75 GB/s | 6.44% |
281
+ | 🌀 | ultrafast | 4 | 82 fps | 2.04 GB/s | 24.31% |
282
+ | 🐢 | slow | 4 | 40 fps | 1.01 GB/s | |
283
+ | 🚀 | slow | 4 | 43 fps | 1.09 GB/s | 9.54% |
284
+ | 🌀 | slow | 4 | 44 fps | 1.10 GB/s | 10.15% |
285
+
286
+ </details>
287
+
288
+ <details>
289
+ <summary><b>Laptop</b> • (Intel Core i7 11800H) • (NVIDIA RTX 3070) • (DDR4 2x16 GB 3200 MT/s) • (Windows 11)</summary>
290
+ <br>
291
+
292
+ <b>Note:</b> Must select NVIDIA GPU on their Control Panel instead of Intel iGPU
293
+
294
+ | 720p | x264 | Buffers | Framerate | Bandwidth | Gain |
295
+ |:----:|:----------|:---------:|----------:|------------:|--------:|
296
+ | 🐢 | Null | 1 | 786 fps | 2.17 GB/s | |
297
+ | 🚀 | Null | 1 | 903 fps | 2.50 GB/s | 14.91% |
298
+ | 🌀 | Null | 1 | 1366 fps | 3.78 GB/s | 73.90% |
299
+ | 🐢 | Null | 4 | 739 fps | 2.04 GB/s | |
300
+ | 🚀 | Null | 4 | 855 fps | 2.37 GB/s | 15.78% |
301
+ | 🌀 | Null | 4 | 1240 fps | 3.43 GB/s | 67.91% |
302
+ | 🐢 | ultrafast | 4 | 484 fps | 1.34 GB/s | |
303
+ | 🚀 | ultrafast | 4 | 503 fps | 1.39 GB/s | 4.10% |
304
+ | 🌀 | ultrafast | 4 | 577 fps | 1.60 GB/s | 19.37% |
305
+ | 🐢 | slow | 4 | 143 fps | 0.40 GB/s | |
306
+ | 🚀 | slow | 4 | 145 fps | 0.40 GB/s | 1.78% |
307
+ | 🌀 | slow | 4 | 151 fps | 0.42 GB/s | 5.76% |
308
+
309
+ | 1080p | x264 | Buffers | Framerate | Bandwidth | Gain |
310
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
311
+ | 🐢 | Null | 1 | 358 fps | 2.23 GB/s | |
312
+ | 🚀 | Null | 1 | 427 fps | 2.66 GB/s | 19.45% |
313
+ | 🌀 | Null | 1 | 566 fps | 3.53 GB/s | 58.31% |
314
+ | 🐢 | Null | 4 | 343 fps | 2.14 GB/s | |
315
+ | 🚀 | Null | 4 | 404 fps | 2.51 GB/s | 17.86% |
316
+ | 🌀 | Null | 4 | 465 fps | 2.89 GB/s | 35.62% |
317
+ | 🐢 | ultrafast | 4 | 191 fps | 1.19 GB/s | |
318
+ | 🚀 | ultrafast | 4 | 207 fps | 1.29 GB/s | 8.89% |
319
+ | 🌀 | ultrafast | 4 | 234 fps | 1.46 GB/s | 22.77% |
320
+ | 🐢 | slow | 4 | 62 fps | 0.39 GB/s | |
321
+ | 🚀 | slow | 4 | 67 fps | 0.42 GB/s | 8.40% |
322
+ | 🌀 | slow | 4 | 74 fps | 0.47 GB/s | 20.89% |
323
+
324
+ | 1440p | x264 | Buffers | Framerate | Bandwidth | Gain |
325
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
326
+ | 🐢 | Null | 1 | 180 fps | 1.99 GB/s | |
327
+ | 🚀 | Null | 1 | 216 fps | 2.40 GB/s | 20.34% |
328
+ | 🌀 | Null | 1 | 264 fps | 2.92 GB/s | 46.74% |
329
+ | 🐢 | Null | 4 | 178 fps | 1.97 GB/s | |
330
+ | 🚀 | Null | 4 | 211 fps | 2.34 GB/s | 19.07% |
331
+ | 🌀 | Null | 4 | 250 fps | 2.77 GB/s | 40.48% |
332
+ | 🐢 | ultrafast | 4 | 98 fps | 1.09 GB/s | |
333
+ | 🚀 | ultrafast | 4 | 110 fps | 1.23 GB/s | 13.18% |
334
+ | 🌀 | ultrafast | 4 | 121 fps | 1.35 GB/s | 24.15% |
335
+ | 🐢 | slow | 4 | 40 fps | 0.45 GB/s | |
336
+ | 🚀 | slow | 4 | 41 fps | 0.46 GB/s | 4.90% |
337
+ | 🌀 | slow | 4 | 43 fps | 0.48 GB/s | 7.89% |
338
+
339
+ | 2160p | x264 | Buffers | Framerate | Bandwidth | Gain |
340
+ |:-----:|:----------|:---------:|----------:|------------:|--------:|
341
+ | 🐢 | Null | 1 | 79 fps | 1.98 GB/s | |
342
+ | 🚀 | Null | 1 | 95 fps | 2.37 GB/s | 20.52% |
343
+ | 🌀 | Null | 1 | 104 fps | 2.60 GB/s | 32.15% |
344
+ | 🐢 | Null | 4 | 80 fps | 2.00 GB/s | |
345
+ | 🚀 | Null | 4 | 94 fps | 2.35 GB/s | 17.82% |
346
+ | 🌀 | Null | 4 | 108 fps | 2.70 GB/s | 35.40% |
347
+ | 🐢 | ultrafast | 4 | 41 fps | 1.04 GB/s | |
348
+ | 🚀 | ultrafast | 4 | 48 fps | 1.20 GB/s | 17.67% |
349
+ | 🌀 | ultrafast | 4 | 52 fps | 1.30 GB/s | 27.49% |
350
+ | 🐢 | slow | 4 | 17 fps | 0.43 GB/s | |
351
+ | 🚀 | slow | 4 | 19 fps | 0.48 GB/s | 13.16% |
352
+ | 🌀 | slow | 4 | 19 fps | 0.48 GB/s | 13.78% |
353
+
354
+ </details>
355
+ <br>
356
+
357
+ <div align="justify">
358
+
359
+ # 🌀 Conclusion
360
+
361
+ TurboPipe significantly increases the feeding speed of FFmpeg with data, especially at higher resolutions. However, if there's few CPU compute available, or the video is too hard to encode (/slow preset), the gains are insignificant over the other methods (bottleneck). Multi-buffering didn't prove to have an advantage, debugging shows that TurboPipe C++ is often starved of data to write (as the file stream is buffered on the OS most likely), and the context switching, or cache misses, might be the cause of the slowdown.
362
+
363
+ The theory supports the threaded method being faster, as writing to a file descriptor is a blocking operation for python, but a syscall under the hood, that doesn't necessarily lock the GIL, just the thread. TurboPipe speeds that even further by avoiding an unecessary copy of the buffer data, and writing directly to the file descriptor on a C++ thread. Linux shows a better performance than Windows in the same system after the optimizations, but Windows wins on the standard method.
364
+
365
+ Interestingly, due either Linux's scheduler on AMD Ryzen CPUs, or their operating philosophy, it was experimentally seen that Ryzen's frenetic thread switching degrades a bit the single thread performance, which can be _"fixed"_ with prepending the command with `taskset --cpu 0,2` (not recommended at all), comparatively speaking to Windows performance on the same system (Linux 🚀 = Windows 🐢). This can also be due the topology of tested CPUs having more than one Core Complex Die (CCD). Intel CPUs seem to stick to the same thread for longer, which makes the Python threaded method often slightly faster.
366
+
367
+ ### Personal experience
368
+
369
+ On realistically loads, like [**ShaderFlow**](https://github.com/BrokenSource/ShaderFlow)'s default lightweight shader export, TurboPipe increases rendering speed from 1080p260 to 1080p360 on my system, with mid 80% CPU usage than low 60%s. For [**DepthFlow**](https://github.com/BrokenSource/ShaderFlow)'s default depth video export, no gains are seen, as the CPU is almost saturated encoding at 1080p130.
370
+
371
+ </div>
372
+
373
+ <br>
374
+
375
+ # 📚 Future work
376
+
377
+ - Disable/investigate performance degradation on Windows iGPUs
378
+ - Maybe use `mmap` instead of chunks writing on Linux
379
+ - Split the code into a libturbopipe? Not sure where it would be useful 😅
@@ -0,0 +1,5 @@
1
+ turbopipe/__init__.py,sha256=98UPmwKfHOuC6tZ-CX4nywLMETmkhZkME9D4mvvX8T8,745
2
+ turbopipe/_turbopipe.cpython-39-aarch64-linux-gnu.so,sha256=kKrCn3uY8YbniexZKJRwac0B_NN7HmO7gEpiIujAl6w,79592
3
+ turbopipe-1.2.3.dist-info/METADATA,sha256=MdldXjJ_g4d1H7thMGPxOrXbTYj3LiWFE3KaMJTACUw,21140
4
+ turbopipe-1.2.3.dist-info/WHEEL,sha256=P45OBan0HDQweRANQNyWFx9B-h69Wjgd18am9H84DoA,135
5
+ turbopipe-1.2.3.dist-info/RECORD,,
@@ -0,0 +1,6 @@
1
+ Wheel-Version: 1.0
2
+ Generator: meson
3
+ Root-Is-Purelib: false
4
+ Tag: cp39-cp39-manylinux_2_17_aarch64
5
+ Tag: cp39-cp39-manylinux2014_aarch64
6
+