pyvista-zstd 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2017-2025 The PyVista Developers
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
|
@@ -0,0 +1,340 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pyvista-zstd
|
|
3
|
+
Version: 0.2.0
|
|
4
|
+
Summary: VTK zstandard compression library.
|
|
5
|
+
Author-email: The PyVista developers <info@pyvista.org>
|
|
6
|
+
Description-Content-Type: text/x-rst
|
|
7
|
+
License-Expression: MIT
|
|
8
|
+
Classifier: Development Status :: 4 - Beta
|
|
9
|
+
Classifier: Intended Audience :: Developers
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
15
|
+
License-File: LICENSE
|
|
16
|
+
Requires-Dist: pyvista>=0.45.0
|
|
17
|
+
Requires-Dist: tqdm>=4.0.0
|
|
18
|
+
Requires-Dist: vtk>=9.4.0
|
|
19
|
+
Requires-Dist: zstandard>=0.24.0
|
|
20
|
+
Requires-Dist: numpydoc==1.10.0 ; extra == "docs"
|
|
21
|
+
Requires-Dist: sphinx-book-theme==1.2.0 ; extra == "docs"
|
|
22
|
+
Requires-Dist: sphinx-copybutton==0.5.2 ; extra == "docs"
|
|
23
|
+
Requires-Dist: sphinx-gallery==0.20.0 ; extra == "docs"
|
|
24
|
+
Requires-Dist: sphinx-notfound-page==1.1.0 ; extra == "docs"
|
|
25
|
+
Requires-Dist: sphinx==9.1.0 ; extra == "docs"
|
|
26
|
+
Requires-Dist: pytest-cov==7.1.0 ; extra == "tests"
|
|
27
|
+
Requires-Dist: pytest>=6.2.0 ; extra == "tests"
|
|
28
|
+
Project-URL: Home, https://github.com/pyvista/pyvista-zstd
|
|
29
|
+
Provides-Extra: docs
|
|
30
|
+
Provides-Extra: tests
|
|
31
|
+
|
|
32
|
+
pyvista-zstd
|
|
33
|
+
============
|
|
34
|
+
|
|
35
|
+
|pypi| |ci| |mit|
|
|
36
|
+
|
|
37
|
+
.. |pypi| image:: https://img.shields.io/pypi/v/pyvista-zstd.svg?logo=python&logoColor=white
|
|
38
|
+
:target: https://pypi.org/project/pyvista-zstd/
|
|
39
|
+
.. |ci| image:: https://github.com/pyvista/pyvista-zstd/actions/workflows/ci_cd.yml/badge.svg
|
|
40
|
+
:target: https://github.com/pyvista/pyvista-zstd/actions/workflows/ci_cd.yml
|
|
41
|
+
.. |mit| image:: https://img.shields.io/badge/License-MIT-yellow.svg
|
|
42
|
+
:target: https://opensource.org/license/mit/
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
Seamlessly compress VTK datasets using `Zstandard <https://github.com/facebook/zstd>`_.
|
|
46
|
+
|
|
47
|
+
**Read in VTK datasets 37x faster, write 14x faster, all while using 28% less
|
|
48
|
+
space versus VTK’s modern XML format.**
|
|
49
|
+
|
|
50
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/speed-up.png
|
|
51
|
+
:alt: Read/Write Speedup and Compression Ratios
|
|
52
|
+
|
|
53
|
+
Read/Write Speedup and Compression Ratios
|
|
54
|
+
|
|
55
|
+
|
|
56
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
57
|
+
| File Type / Method | Write Speed | Compression Ratio | Notes |
|
|
58
|
+
+==============================+===================+===================+======================+
|
|
59
|
+
| Legacy VTK (.vtk) | 465 MB/s | 0.88 | Significant overhead |
|
|
60
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
61
|
+
| VTK XML, none | 256 MB/s | 0.70 | Significant overhead |
|
|
62
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
63
|
+
| VTK XML, zlib | 105 MB/s | 2.52 | VTK Default |
|
|
64
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
65
|
+
| VTK XML, lz4 | 401 MB/s | 1.47 | |
|
|
66
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
67
|
+
| VTK XML, lzma | 9.93 MB/s | 3.10 | |
|
|
68
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
69
|
+
| VTK HDF (.vtkhdf), lvl0 | 1733 MB/s | 0.93 | No compression |
|
|
70
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
71
|
+
| VTK HDF (.vtkhdf), lvl4 | 137 MB/s | 2.37 | Default compression |
|
|
72
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
73
|
+
| pyvista-zstd (.pv), lvl3 | 711 MB/s | 3.02 | Threads = 0 |
|
|
74
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
75
|
+
| **pyvista-zstd (.pv), lvl3** | **1845 MB/s** | **3.02** | **Threads = 4** |
|
|
76
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
77
|
+
| pyvista-zstd (.pv), lvl22 | 15.8 MB/s | 3.79 | All threads (-1) |
|
|
78
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
79
|
+
|
|
80
|
+
|
|
81
|
+
Usage
|
|
82
|
+
~~~~~
|
|
83
|
+
|
|
84
|
+
Install with:
|
|
85
|
+
|
|
86
|
+
.. code:: bash
|
|
87
|
+
|
|
88
|
+
pip install pyvista-zstd
|
|
89
|
+
|
|
90
|
+
Compatible with all VTK dataset types. Uses
|
|
91
|
+
`PyVista <https://docs.pyvista.org/>`__ under the hood.
|
|
92
|
+
|
|
93
|
+
.. code:: py
|
|
94
|
+
|
|
95
|
+
import pyvista_zstd
|
|
96
|
+
|
|
97
|
+
# create and write out
|
|
98
|
+
ds = pv.Sphere()
|
|
99
|
+
pyvista_zstd.write(ds, "dataset.pv")
|
|
100
|
+
|
|
101
|
+
# read in and show these are identical
|
|
102
|
+
ds_in = pyvista_zstd.read("dataset.pv")
|
|
103
|
+
assert ds == ds_in
|
|
104
|
+
|
|
105
|
+
**Alternative VTK example**
|
|
106
|
+
|
|
107
|
+
.. code:: py
|
|
108
|
+
|
|
109
|
+
import vtk
|
|
110
|
+
import pyvista_zstd
|
|
111
|
+
|
|
112
|
+
# create dataset using VTK source
|
|
113
|
+
sphere_source = vtk.vtkSphereSource()
|
|
114
|
+
sphere_source.SetRadius(1.0)
|
|
115
|
+
sphere_source.SetThetaResolution(32)
|
|
116
|
+
sphere_source.SetPhiResolution(32)
|
|
117
|
+
sphere_source.Update()
|
|
118
|
+
|
|
119
|
+
vtk_ds = sphere_source.GetOutput()
|
|
120
|
+
|
|
121
|
+
# read back
|
|
122
|
+
pyvista_zstd.write(vtk_ds, "sphere.pv")
|
|
123
|
+
ds_in = pyvista_zstd.read("sphere.pv")
|
|
124
|
+
|
|
125
|
+
|
|
126
|
+
PyVista Integration
|
|
127
|
+
~~~~~~~~~~~~~~~~~~~
|
|
128
|
+
|
|
129
|
+
When ``pyvista-zstd`` is installed, it automatically registers with PyVista's
|
|
130
|
+
reader registry. This means ``pv.read()`` handles ``.pv`` files
|
|
131
|
+
directly:
|
|
132
|
+
|
|
133
|
+
.. code:: py
|
|
134
|
+
|
|
135
|
+
import pyvista as pv
|
|
136
|
+
|
|
137
|
+
mesh = pv.read("dataset.pv")
|
|
138
|
+
|
|
139
|
+
No additional imports needed. This works via PyVista's ``pyvista.readers``
|
|
140
|
+
entry point group, so the registration happens at install time.
|
|
141
|
+
|
|
142
|
+
|
|
143
|
+
Rational
|
|
144
|
+
~~~~~~~~
|
|
145
|
+
|
|
146
|
+
VTK’s XML writer is flexible and supports `most
|
|
147
|
+
datasets <https://docs.vtk.org/en/latest/vtk_file_formats/vtkxml_file_format.html>`__,
|
|
148
|
+
but its compression is limited to a single thread, has only a subset of
|
|
149
|
+
compression algorithms, and the XML format adds significant overhead.
|
|
150
|
+
|
|
151
|
+
To demonstrate this, the following example writes out a single file
|
|
152
|
+
without compression. This example requires ``pyvista>=0.47.0`` for the
|
|
153
|
+
``compression`` parameter.
|
|
154
|
+
|
|
155
|
+
.. code:: pycon
|
|
156
|
+
|
|
157
|
+
>>> import numpy as np
|
|
158
|
+
>>> import pyvista as pv
|
|
159
|
+
>>> ugrid = pv.ImageData(dimensions=(200, 200, 200)).to_tetrahedra()
|
|
160
|
+
>>> ugrid["pdata"] = np.random.random(ugrid.n_points)
|
|
161
|
+
>>> ugrid["cdata"] = np.random.random(ugrid.n_cells)
|
|
162
|
+
>>> nbytes = (
|
|
163
|
+
... ugrid.points.nbytes
|
|
164
|
+
... + ugrid.cell_connectivity.nbytes
|
|
165
|
+
... + ugrid.offset.nbytes
|
|
166
|
+
... + ugrid.celltypes.nbytes
|
|
167
|
+
... + ugrid["pdata"].nbytes
|
|
168
|
+
... + ugrid["cdata"].nbytes
|
|
169
|
+
... )
|
|
170
|
+
>>> print(f"Size in memory: {nbytes / 1024**2:.2f} MB")
|
|
171
|
+
|
|
172
|
+
Size in memory: 1993.89 MB
|
|
173
|
+
|
|
174
|
+
.. code:: pycon
|
|
175
|
+
|
|
176
|
+
Save using VTK XML format
|
|
177
|
+
|
|
178
|
+
>>> from pathlib import Path
|
|
179
|
+
>>> import time
|
|
180
|
+
>>> tmp_path = Path("/tmp/ds.vtu")
|
|
181
|
+
>>> tstart = time.time()
|
|
182
|
+
>>> ugrid.save(tmp_path, compression=None)
|
|
183
|
+
>>> print(f"Written without compression in {time.time() - tstart:.2f} seconds")
|
|
184
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
185
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
186
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
187
|
+
>>> print()
|
|
188
|
+
|
|
189
|
+
Written without compression in 7.93 seconds
|
|
190
|
+
File size: 2858.94 MB
|
|
191
|
+
Compression Ratio: 0.6974239255525742
|
|
192
|
+
|
|
193
|
+
This amounts to around a 43% overhead using VTK’s XML writer. Using the
|
|
194
|
+
default compression we can get the file size down to 791 MB, but it
|
|
195
|
+
takes 19 seconds to compress.
|
|
196
|
+
|
|
197
|
+
.. code:: pycon
|
|
198
|
+
|
|
199
|
+
>>> tstart = time.time()
|
|
200
|
+
>>> ugrid.save(tmp_path, compression='zlib') # default
|
|
201
|
+
>>> print(f"Compressed in {time.time() - tstart:.2f} seconds")
|
|
202
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
203
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
204
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
205
|
+
>>> print()
|
|
206
|
+
|
|
207
|
+
Compressed in 18.83 seconds
|
|
208
|
+
File size: 791.05 MB
|
|
209
|
+
Compression Ratio: 2.5205590295735663
|
|
210
|
+
|
|
211
|
+
Clearly there’s room for improvement here as this amounts to a
|
|
212
|
+
compression rate of 105.89 MB/s.
|
|
213
|
+
|
|
214
|
+
VTK Compression with Zstandard: pyvista-zstd
|
|
215
|
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
216
|
+
|
|
217
|
+
This library, ``pyvista-zstd``, writes out VTK datasets with minimal overhead
|
|
218
|
+
and uses `Zstandard <https://github.com/facebook/zstd>`__ for
|
|
219
|
+
compression. Moreover, it’s been implemented with multi-threading
|
|
220
|
+
support for both read and write operations.
|
|
221
|
+
|
|
222
|
+
Let’s compress that file again but this time using ``pyvista-zstd``:
|
|
223
|
+
|
|
224
|
+
.. code:: pycon
|
|
225
|
+
|
|
226
|
+
>>> import pyvista_zstd
|
|
227
|
+
>>> tmp_path = Path("/tmp/ds.pv")
|
|
228
|
+
>>> tstart = time.time()
|
|
229
|
+
>>> pyvista_zstd.write(ugrid, tmp_path)
|
|
230
|
+
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
|
|
231
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
232
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
233
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
234
|
+
|
|
235
|
+
Compressed pyvista_zstd in 0.92 seconds
|
|
236
|
+
Threads: -1
|
|
237
|
+
File size: 660.41 MB
|
|
238
|
+
Compression Ratio: 3.019175309922273
|
|
239
|
+
|
|
240
|
+
This gives us a write performance of 2167 MB/s using the default number
|
|
241
|
+
of threads and compression level, resulting in a 20x speedup in write
|
|
242
|
+
performance versus VTK’s XML writer. This speedup is most noticeable for
|
|
243
|
+
larger files:
|
|
244
|
+
|
|
245
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/synthetic-fig3.png
|
|
246
|
+
:alt: Speedup versus VTK’s XML
|
|
247
|
+
|
|
248
|
+
Speedup versus VTK’s XML
|
|
249
|
+
|
|
250
|
+
Even when disabling multi-threading we can still achieve excellent
|
|
251
|
+
performance:
|
|
252
|
+
|
|
253
|
+
.. code:: pycon
|
|
254
|
+
|
|
255
|
+
>>> tstart = time.time()
|
|
256
|
+
>>> pyvista_zstd.write(ugrid, tmp_path, n_threads=0)
|
|
257
|
+
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
|
|
258
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
259
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
260
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
261
|
+
|
|
262
|
+
Compressed pyvista_zstd in 2.91 seconds
|
|
263
|
+
Threads: 0
|
|
264
|
+
File size: 660.47 MB
|
|
265
|
+
Compression Ratio: 3.0188911592355683
|
|
266
|
+
|
|
267
|
+
This amounts to a single-core compression rate of 685.18 MB/s, which is
|
|
268
|
+
in agreement with Zstandard’s
|
|
269
|
+
`benchmarks <https://github.com/facebook/zstd#benchmarks>`__.
|
|
270
|
+
|
|
271
|
+
Note that the benefit of threading drops off rapidly past 8 threads,
|
|
272
|
+
though part of this is due to the performance versus efficiency cores of
|
|
273
|
+
the CPU used for benchmarking (see below).
|
|
274
|
+
|
|
275
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/pyvista-zstd-single-ds-fig3.png
|
|
276
|
+
:alt: Read/Write Speed versus Number of Threads
|
|
277
|
+
|
|
278
|
+
Read/Write Speed versus Number of Threads
|
|
279
|
+
|
|
280
|
+
--------------
|
|
281
|
+
|
|
282
|
+
Reading in the dataset is also fast. Comparing with VTK’s XML reader
|
|
283
|
+
using defaults:
|
|
284
|
+
|
|
285
|
+
.. code:: pycon
|
|
286
|
+
|
|
287
|
+
Read VTK XML
|
|
288
|
+
|
|
289
|
+
>>> print(f"Read VTK XML:")
|
|
290
|
+
>>> timeit pv.read("/tmp/ds.vtu")
|
|
291
|
+
6.22 s ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
292
|
+
|
|
293
|
+
Read zstd
|
|
294
|
+
|
|
295
|
+
>>> print(f"Read zstd:")
|
|
296
|
+
>>> timeit pyvista_zstd.read("/tmp/ds.pv")
|
|
297
|
+
563 ms ± 7.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
298
|
+
|
|
299
|
+
This is an 11x speedup for this dataset versus VTK’s XML, and it’s still
|
|
300
|
+
fast even with multi-threading disabled:
|
|
301
|
+
|
|
302
|
+
.. code:: pycon
|
|
303
|
+
|
|
304
|
+
>>> timeit pyvista_zstd.read("/tmp/ds.pv", n_threads=0)
|
|
305
|
+
1.11 s ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
306
|
+
|
|
307
|
+
This amounts to 1796 MB/s for a single core, which is also in agreement
|
|
308
|
+
with Zstandard’s
|
|
309
|
+
`benchmarks <https://github.com/facebook/zstd#benchmarks>`__.
|
|
310
|
+
|
|
311
|
+
Additionally, you can control Zstandard’s compression level by setting
|
|
312
|
+
``level=``. A quick benchmark for this dataset indicates the defaults
|
|
313
|
+
give a reasonable performance versus size tradeoff:
|
|
314
|
+
|
|
315
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/pyvista-zstd-single-ds-fig4.png
|
|
316
|
+
:alt: Read/Write Speed versus Compression Level
|
|
317
|
+
|
|
318
|
+
Read/Write Speed versus Compression Level
|
|
319
|
+
|
|
320
|
+
Note that both ``pyvista-zstd`` and VTK’s XML default compression give
|
|
321
|
+
relatively constant compression ratios for this dataset across varying
|
|
322
|
+
file sizes:
|
|
323
|
+
|
|
324
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/synthetic-fig4.png
|
|
325
|
+
:alt: Compression Ratio versus VTK’s XML
|
|
326
|
+
|
|
327
|
+
Compression Ratio versus VTK’s XML
|
|
328
|
+
|
|
329
|
+
These benchmarks were performed on an ``i9-14900KF`` running the Linux
|
|
330
|
+
kernel ``6.12.41`` using ``zstandard==0.24.0`` from PyPI. Storage was a
|
|
331
|
+
2TB Samsung 990 Pro without LUKS mounted at ``/tmp``.
|
|
332
|
+
|
|
333
|
+
Additional Information
|
|
334
|
+
~~~~~~~~~~~~~~~~~~~~~~
|
|
335
|
+
|
|
336
|
+
The ``benchmarks/`` directory contains additional benchmarks using many
|
|
337
|
+
datasets, including all applicable datasets in ``pyvista.examples`` (see
|
|
338
|
+
`PyVista Dataset
|
|
339
|
+
Gallery <https://docs.pyvista.org/api/examples/dataset_gallery#dataset-gallery>`__).
|
|
340
|
+
|
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
pyvista-zstd
|
|
2
|
+
============
|
|
3
|
+
|
|
4
|
+
|pypi| |ci| |mit|
|
|
5
|
+
|
|
6
|
+
.. |pypi| image:: https://img.shields.io/pypi/v/pyvista-zstd.svg?logo=python&logoColor=white
|
|
7
|
+
:target: https://pypi.org/project/pyvista-zstd/
|
|
8
|
+
.. |ci| image:: https://github.com/pyvista/pyvista-zstd/actions/workflows/ci_cd.yml/badge.svg
|
|
9
|
+
:target: https://github.com/pyvista/pyvista-zstd/actions/workflows/ci_cd.yml
|
|
10
|
+
.. |mit| image:: https://img.shields.io/badge/License-MIT-yellow.svg
|
|
11
|
+
:target: https://opensource.org/license/mit/
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
Seamlessly compress VTK datasets using `Zstandard <https://github.com/facebook/zstd>`_.
|
|
15
|
+
|
|
16
|
+
**Read in VTK datasets 37x faster, write 14x faster, all while using 28% less
|
|
17
|
+
space versus VTK’s modern XML format.**
|
|
18
|
+
|
|
19
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/speed-up.png
|
|
20
|
+
:alt: Read/Write Speedup and Compression Ratios
|
|
21
|
+
|
|
22
|
+
Read/Write Speedup and Compression Ratios
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
26
|
+
| File Type / Method | Write Speed | Compression Ratio | Notes |
|
|
27
|
+
+==============================+===================+===================+======================+
|
|
28
|
+
| Legacy VTK (.vtk) | 465 MB/s | 0.88 | Significant overhead |
|
|
29
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
30
|
+
| VTK XML, none | 256 MB/s | 0.70 | Significant overhead |
|
|
31
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
32
|
+
| VTK XML, zlib | 105 MB/s | 2.52 | VTK Default |
|
|
33
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
34
|
+
| VTK XML, lz4 | 401 MB/s | 1.47 | |
|
|
35
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
36
|
+
| VTK XML, lzma | 9.93 MB/s | 3.10 | |
|
|
37
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
38
|
+
| VTK HDF (.vtkhdf), lvl0 | 1733 MB/s | 0.93 | No compression |
|
|
39
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
40
|
+
| VTK HDF (.vtkhdf), lvl4 | 137 MB/s | 2.37 | Default compression |
|
|
41
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
42
|
+
| pyvista-zstd (.pv), lvl3 | 711 MB/s | 3.02 | Threads = 0 |
|
|
43
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
44
|
+
| **pyvista-zstd (.pv), lvl3** | **1845 MB/s** | **3.02** | **Threads = 4** |
|
|
45
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
46
|
+
| pyvista-zstd (.pv), lvl22 | 15.8 MB/s | 3.79 | All threads (-1) |
|
|
47
|
+
+------------------------------+-------------------+-------------------+----------------------+
|
|
48
|
+
|
|
49
|
+
|
|
50
|
+
Usage
|
|
51
|
+
~~~~~
|
|
52
|
+
|
|
53
|
+
Install with:
|
|
54
|
+
|
|
55
|
+
.. code:: bash
|
|
56
|
+
|
|
57
|
+
pip install pyvista-zstd
|
|
58
|
+
|
|
59
|
+
Compatible with all VTK dataset types. Uses
|
|
60
|
+
`PyVista <https://docs.pyvista.org/>`__ under the hood.
|
|
61
|
+
|
|
62
|
+
.. code:: py
|
|
63
|
+
|
|
64
|
+
import pyvista_zstd
|
|
65
|
+
|
|
66
|
+
# create and write out
|
|
67
|
+
ds = pv.Sphere()
|
|
68
|
+
pyvista_zstd.write(ds, "dataset.pv")
|
|
69
|
+
|
|
70
|
+
# read in and show these are identical
|
|
71
|
+
ds_in = pyvista_zstd.read("dataset.pv")
|
|
72
|
+
assert ds == ds_in
|
|
73
|
+
|
|
74
|
+
**Alternative VTK example**
|
|
75
|
+
|
|
76
|
+
.. code:: py
|
|
77
|
+
|
|
78
|
+
import vtk
|
|
79
|
+
import pyvista_zstd
|
|
80
|
+
|
|
81
|
+
# create dataset using VTK source
|
|
82
|
+
sphere_source = vtk.vtkSphereSource()
|
|
83
|
+
sphere_source.SetRadius(1.0)
|
|
84
|
+
sphere_source.SetThetaResolution(32)
|
|
85
|
+
sphere_source.SetPhiResolution(32)
|
|
86
|
+
sphere_source.Update()
|
|
87
|
+
|
|
88
|
+
vtk_ds = sphere_source.GetOutput()
|
|
89
|
+
|
|
90
|
+
# read back
|
|
91
|
+
pyvista_zstd.write(vtk_ds, "sphere.pv")
|
|
92
|
+
ds_in = pyvista_zstd.read("sphere.pv")
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
PyVista Integration
|
|
96
|
+
~~~~~~~~~~~~~~~~~~~
|
|
97
|
+
|
|
98
|
+
When ``pyvista-zstd`` is installed, it automatically registers with PyVista's
|
|
99
|
+
reader registry. This means ``pv.read()`` handles ``.pv`` files
|
|
100
|
+
directly:
|
|
101
|
+
|
|
102
|
+
.. code:: py
|
|
103
|
+
|
|
104
|
+
import pyvista as pv
|
|
105
|
+
|
|
106
|
+
mesh = pv.read("dataset.pv")
|
|
107
|
+
|
|
108
|
+
No additional imports needed. This works via PyVista's ``pyvista.readers``
|
|
109
|
+
entry point group, so the registration happens at install time.
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
Rational
|
|
113
|
+
~~~~~~~~
|
|
114
|
+
|
|
115
|
+
VTK’s XML writer is flexible and supports `most
|
|
116
|
+
datasets <https://docs.vtk.org/en/latest/vtk_file_formats/vtkxml_file_format.html>`__,
|
|
117
|
+
but its compression is limited to a single thread, has only a subset of
|
|
118
|
+
compression algorithms, and the XML format adds significant overhead.
|
|
119
|
+
|
|
120
|
+
To demonstrate this, the following example writes out a single file
|
|
121
|
+
without compression. This example requires ``pyvista>=0.47.0`` for the
|
|
122
|
+
``compression`` parameter.
|
|
123
|
+
|
|
124
|
+
.. code:: pycon
|
|
125
|
+
|
|
126
|
+
>>> import numpy as np
|
|
127
|
+
>>> import pyvista as pv
|
|
128
|
+
>>> ugrid = pv.ImageData(dimensions=(200, 200, 200)).to_tetrahedra()
|
|
129
|
+
>>> ugrid["pdata"] = np.random.random(ugrid.n_points)
|
|
130
|
+
>>> ugrid["cdata"] = np.random.random(ugrid.n_cells)
|
|
131
|
+
>>> nbytes = (
|
|
132
|
+
... ugrid.points.nbytes
|
|
133
|
+
... + ugrid.cell_connectivity.nbytes
|
|
134
|
+
... + ugrid.offset.nbytes
|
|
135
|
+
... + ugrid.celltypes.nbytes
|
|
136
|
+
... + ugrid["pdata"].nbytes
|
|
137
|
+
... + ugrid["cdata"].nbytes
|
|
138
|
+
... )
|
|
139
|
+
>>> print(f"Size in memory: {nbytes / 1024**2:.2f} MB")
|
|
140
|
+
|
|
141
|
+
Size in memory: 1993.89 MB
|
|
142
|
+
|
|
143
|
+
.. code:: pycon
|
|
144
|
+
|
|
145
|
+
Save using VTK XML format
|
|
146
|
+
|
|
147
|
+
>>> from pathlib import Path
|
|
148
|
+
>>> import time
|
|
149
|
+
>>> tmp_path = Path("/tmp/ds.vtu")
|
|
150
|
+
>>> tstart = time.time()
|
|
151
|
+
>>> ugrid.save(tmp_path, compression=None)
|
|
152
|
+
>>> print(f"Written without compression in {time.time() - tstart:.2f} seconds")
|
|
153
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
154
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
155
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
156
|
+
>>> print()
|
|
157
|
+
|
|
158
|
+
Written without compression in 7.93 seconds
|
|
159
|
+
File size: 2858.94 MB
|
|
160
|
+
Compression Ratio: 0.6974239255525742
|
|
161
|
+
|
|
162
|
+
This amounts to around a 43% overhead using VTK’s XML writer. Using the
|
|
163
|
+
default compression we can get the file size down to 791 MB, but it
|
|
164
|
+
takes 19 seconds to compress.
|
|
165
|
+
|
|
166
|
+
.. code:: pycon
|
|
167
|
+
|
|
168
|
+
>>> tstart = time.time()
|
|
169
|
+
>>> ugrid.save(tmp_path, compression='zlib') # default
|
|
170
|
+
>>> print(f"Compressed in {time.time() - tstart:.2f} seconds")
|
|
171
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
172
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
173
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
174
|
+
>>> print()
|
|
175
|
+
|
|
176
|
+
Compressed in 18.83 seconds
|
|
177
|
+
File size: 791.05 MB
|
|
178
|
+
Compression Ratio: 2.5205590295735663
|
|
179
|
+
|
|
180
|
+
Clearly there’s room for improvement here as this amounts to a
|
|
181
|
+
compression rate of 105.89 MB/s.
|
|
182
|
+
|
|
183
|
+
VTK Compression with Zstandard: pyvista-zstd
|
|
184
|
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
185
|
+
|
|
186
|
+
This library, ``pyvista-zstd``, writes out VTK datasets with minimal overhead
|
|
187
|
+
and uses `Zstandard <https://github.com/facebook/zstd>`__ for
|
|
188
|
+
compression. Moreover, it’s been implemented with multi-threading
|
|
189
|
+
support for both read and write operations.
|
|
190
|
+
|
|
191
|
+
Let’s compress that file again but this time using ``pyvista-zstd``:
|
|
192
|
+
|
|
193
|
+
.. code:: pycon
|
|
194
|
+
|
|
195
|
+
>>> import pyvista_zstd
|
|
196
|
+
>>> tmp_path = Path("/tmp/ds.pv")
|
|
197
|
+
>>> tstart = time.time()
|
|
198
|
+
>>> pyvista_zstd.write(ugrid, tmp_path)
|
|
199
|
+
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
|
|
200
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
201
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
202
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
203
|
+
|
|
204
|
+
Compressed pyvista_zstd in 0.92 seconds
|
|
205
|
+
Threads: -1
|
|
206
|
+
File size: 660.41 MB
|
|
207
|
+
Compression Ratio: 3.019175309922273
|
|
208
|
+
|
|
209
|
+
This gives us a write performance of 2167 MB/s using the default number
|
|
210
|
+
of threads and compression level, resulting in a 20x speedup in write
|
|
211
|
+
performance versus VTK’s XML writer. This speedup is most noticeable for
|
|
212
|
+
larger files:
|
|
213
|
+
|
|
214
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/synthetic-fig3.png
|
|
215
|
+
:alt: Speedup versus VTK’s XML
|
|
216
|
+
|
|
217
|
+
Speedup versus VTK’s XML
|
|
218
|
+
|
|
219
|
+
Even when disabling multi-threading we can still achieve excellent
|
|
220
|
+
performance:
|
|
221
|
+
|
|
222
|
+
.. code:: pycon
|
|
223
|
+
|
|
224
|
+
>>> tstart = time.time()
|
|
225
|
+
>>> pyvista_zstd.write(ugrid, tmp_path, n_threads=0)
|
|
226
|
+
>>> print(f"Compressed pyvista_zstd in {time.time() - tstart:.2f} seconds")
|
|
227
|
+
>>> nbytes_disk = tmp_path.stat().st_size
|
|
228
|
+
>>> print(f" File size: {nbytes_disk / 1024**2:.2f} MB")
|
|
229
|
+
>>> print(f" Compression Ratio: {nbytes / nbytes_disk}")
|
|
230
|
+
|
|
231
|
+
Compressed pyvista_zstd in 2.91 seconds
|
|
232
|
+
Threads: 0
|
|
233
|
+
File size: 660.47 MB
|
|
234
|
+
Compression Ratio: 3.0188911592355683
|
|
235
|
+
|
|
236
|
+
This amounts to a single-core compression rate of 685.18 MB/s, which is
|
|
237
|
+
in agreement with Zstandard’s
|
|
238
|
+
`benchmarks <https://github.com/facebook/zstd#benchmarks>`__.
|
|
239
|
+
|
|
240
|
+
Note that the benefit of threading drops off rapidly past 8 threads,
|
|
241
|
+
though part of this is due to the performance versus efficiency cores of
|
|
242
|
+
the CPU used for benchmarking (see below).
|
|
243
|
+
|
|
244
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/pyvista-zstd-single-ds-fig3.png
|
|
245
|
+
:alt: Read/Write Speed versus Number of Threads
|
|
246
|
+
|
|
247
|
+
Read/Write Speed versus Number of Threads
|
|
248
|
+
|
|
249
|
+
--------------
|
|
250
|
+
|
|
251
|
+
Reading in the dataset is also fast. Comparing with VTK’s XML reader
|
|
252
|
+
using defaults:
|
|
253
|
+
|
|
254
|
+
.. code:: pycon
|
|
255
|
+
|
|
256
|
+
Read VTK XML
|
|
257
|
+
|
|
258
|
+
>>> print(f"Read VTK XML:")
|
|
259
|
+
>>> timeit pv.read("/tmp/ds.vtu")
|
|
260
|
+
6.22 s ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
261
|
+
|
|
262
|
+
Read zstd
|
|
263
|
+
|
|
264
|
+
>>> print(f"Read zstd:")
|
|
265
|
+
>>> timeit pyvista_zstd.read("/tmp/ds.pv")
|
|
266
|
+
563 ms ± 7.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
267
|
+
|
|
268
|
+
This is an 11x speedup for this dataset versus VTK’s XML, and it’s still
|
|
269
|
+
fast even with multi-threading disabled:
|
|
270
|
+
|
|
271
|
+
.. code:: pycon
|
|
272
|
+
|
|
273
|
+
>>> timeit pyvista_zstd.read("/tmp/ds.pv", n_threads=0)
|
|
274
|
+
1.11 s ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
|
275
|
+
|
|
276
|
+
This amounts to 1796 MB/s for a single core, which is also in agreement
|
|
277
|
+
with Zstandard’s
|
|
278
|
+
`benchmarks <https://github.com/facebook/zstd#benchmarks>`__.
|
|
279
|
+
|
|
280
|
+
Additionally, you can control Zstandard’s compression level by setting
|
|
281
|
+
``level=``. A quick benchmark for this dataset indicates the defaults
|
|
282
|
+
give a reasonable performance versus size tradeoff:
|
|
283
|
+
|
|
284
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/pyvista-zstd-single-ds-fig4.png
|
|
285
|
+
:alt: Read/Write Speed versus Compression Level
|
|
286
|
+
|
|
287
|
+
Read/Write Speed versus Compression Level
|
|
288
|
+
|
|
289
|
+
Note that both ``pyvista-zstd`` and VTK’s XML default compression give
|
|
290
|
+
relatively constant compression ratios for this dataset across varying
|
|
291
|
+
file sizes:
|
|
292
|
+
|
|
293
|
+
.. figure:: https://github.com/pyvista/pyvista-zstd/raw/main/doc/images/synthetic-fig4.png
|
|
294
|
+
:alt: Compression Ratio versus VTK’s XML
|
|
295
|
+
|
|
296
|
+
Compression Ratio versus VTK’s XML
|
|
297
|
+
|
|
298
|
+
These benchmarks were performed on an ``i9-14900KF`` running the Linux
|
|
299
|
+
kernel ``6.12.41`` using ``zstandard==0.24.0`` from PyPI. Storage was a
|
|
300
|
+
2TB Samsung 990 Pro without LUKS mounted at ``/tmp``.
|
|
301
|
+
|
|
302
|
+
Additional Information
|
|
303
|
+
~~~~~~~~~~~~~~~~~~~~~~
|
|
304
|
+
|
|
305
|
+
The ``benchmarks/`` directory contains additional benchmarks using many
|
|
306
|
+
datasets, including all applicable datasets in ``pyvista.examples`` (see
|
|
307
|
+
`PyVista Dataset
|
|
308
|
+
Gallery <https://docs.pyvista.org/api/examples/dataset_gallery#dataset-gallery>`__).
|