sseflags 0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sseflags-0.1/LICENSE +21 -0
- sseflags-0.1/PKG-INFO +187 -0
- sseflags-0.1/README.md +163 -0
- sseflags-0.1/pyproject.toml +44 -0
- sseflags-0.1/setup.cfg +4 -0
- sseflags-0.1/setup.py +27 -0
- sseflags-0.1/sseflags/__init__.py +72 -0
- sseflags-0.1/sseflags/_lib.pyx +54 -0
- sseflags-0.1/sseflags/benchmark.py +121 -0
- sseflags-0.1/sseflags.egg-info/PKG-INFO +187 -0
- sseflags-0.1/sseflags.egg-info/SOURCES.txt +11 -0
- sseflags-0.1/sseflags.egg-info/dependency_links.txt +1 -0
- sseflags-0.1/sseflags.egg-info/top_level.txt +1 -0
sseflags-0.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Mikhail Ryazanov
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
sseflags-0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sseflags
|
|
3
|
+
Version: 0.1
|
|
4
|
+
Summary: Python package for accessing DAZ and FTZ flags
|
|
5
|
+
Author: Mikhail Ryazanov
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: homepage, https://github.com/MikhailRyazanov/SSEflags
|
|
8
|
+
Project-URL: documentation, https://github.com/MikhailRyazanov/SSEflags/README.md
|
|
9
|
+
Project-URL: github, https://github.com/MikhailRyazanov/SSEflags.git
|
|
10
|
+
Project-URL: issues, https://github.com/MikhailRyazanov/SSEflags/issues
|
|
11
|
+
Keywords: SSE,AVX,subnormal,denormal
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Dynamic: license-file
|
|
24
|
+
|
|
25
|
+
# SSE flags
|
|
26
|
+
|
|
27
|
+
NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
|
|
28
|
+
floating-point calculations. Unfortunately, on Intel CPUs, they work very
|
|
29
|
+
slowly with
|
|
30
|
+
[subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
|
|
31
|
+
To avoid such performance degradation, if somewhat worse floating-point
|
|
32
|
+
accuracy in extreme cases can be tolerated, the
|
|
33
|
+
[DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
|
|
34
|
+
were introduced to treat input and/or output subnormal numbers as zeros. This
|
|
35
|
+
module provides access to these CPU flags from Python.
|
|
36
|
+
|
|
37
|
+
To test the effect on your system, use ``sseflags.benchmark.run()`` or run
|
|
38
|
+
```
|
|
39
|
+
python3 -m sseflags.benchmark
|
|
40
|
+
```
|
|
41
|
+
in the command line. Example output on Intel i9-12900K (subnormal numbers are
|
|
42
|
+
very slow):
|
|
43
|
+
```
|
|
44
|
+
Times in milliseconds:
|
|
45
|
+
default 1.979
|
|
46
|
+
========================
|
|
47
|
+
FTZ off FTZ on
|
|
48
|
+
------------------------
|
|
49
|
+
DAZ off 1.993 2.037
|
|
50
|
+
DAZ on 6.669 0.037
|
|
51
|
+
========================
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
|
|
55
|
+
mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
|
|
56
|
+
Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
|
|
57
|
+
numbers; notice that times are in *micro*seconds):
|
|
58
|
+
```
|
|
59
|
+
Times in microseconds:
|
|
60
|
+
default 16.834
|
|
61
|
+
========================
|
|
62
|
+
FTZ off FTZ on
|
|
63
|
+
------------------------
|
|
64
|
+
DAZ off 16.829 15.383
|
|
65
|
+
DAZ on 15.353 14.500
|
|
66
|
+
========================
|
|
67
|
+
```
|
|
68
|
+
Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
|
|
69
|
+
difference):
|
|
70
|
+
```
|
|
71
|
+
Times in milliseconds:
|
|
72
|
+
default 0.229
|
|
73
|
+
========================
|
|
74
|
+
FTZ off FTZ on
|
|
75
|
+
------------------------
|
|
76
|
+
DAZ off 0.225 0.131
|
|
77
|
+
DAZ on 0.224 0.131
|
|
78
|
+
========================
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
On other architectures, or if the underlying Cython extension is not built, the
|
|
82
|
+
module only reports that it has no effect.
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
## ``sseflags`` module
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
get_flags()
|
|
89
|
+
Query current states of the DAZ and FTZ flags, see set_flags() for details.
|
|
90
|
+
Can be used for restoring the default behavior:
|
|
91
|
+
|
|
92
|
+
flags = get_flags() # remember the original flag states
|
|
93
|
+
set_flags(daz=True, ftz=True) # enable DAZ and FTZ
|
|
94
|
+
... # do some calculations
|
|
95
|
+
set_flags(**flags) # restore the original flag states
|
|
96
|
+
|
|
97
|
+
Returns
|
|
98
|
+
-------
|
|
99
|
+
flags : dict
|
|
100
|
+
dictionary with the keys 'daz' and 'ftz', values of which represent the
|
|
101
|
+
corresponding flag state: True for set, False for cleared, None if not
|
|
102
|
+
implemented
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
set_flags(daz=None, ftz=None, verbose=False)
|
|
106
|
+
Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
|
|
107
|
+
SSE and AVX floating-point calculations, which can be useful for Intel CPUs
|
|
108
|
+
that work very slowly with subnormal (denormal) numbers.
|
|
109
|
+
|
|
110
|
+
On unsupported architectures, or if the underlying Cython extension was not
|
|
111
|
+
built, this function only reports that it has no effect. The availability
|
|
112
|
+
can be checked by calling set_flags() without arguments.
|
|
113
|
+
|
|
114
|
+
Parameters
|
|
115
|
+
----------
|
|
116
|
+
daz : bool or None, optional
|
|
117
|
+
True to set, False to clear the DAZ flag; None (default) to leave
|
|
118
|
+
unchanged
|
|
119
|
+
|
|
120
|
+
ftz : bool or None, optional
|
|
121
|
+
True to set, False to clear the FTZ flag; None (default) to leave
|
|
122
|
+
unchanged
|
|
123
|
+
|
|
124
|
+
verbose : bool, optional
|
|
125
|
+
pass True to print a warning if the operation is not implemented
|
|
126
|
+
|
|
127
|
+
Returns
|
|
128
|
+
-------
|
|
129
|
+
implemented : bool
|
|
130
|
+
True if this operation is implemented, False if not
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### ``sseflags.benchmark`` submodule
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
run(repeat=100, min_t=1.0, verbose=True)
|
|
137
|
+
Run benchmarks with all possible combinations of the DAZ and FTZ flags to
|
|
138
|
+
check their effect on NumPy performance (see run_flags() for details).
|
|
139
|
+
|
|
140
|
+
Parameters
|
|
141
|
+
----------
|
|
142
|
+
repeat : int, optional
|
|
143
|
+
number of iterations in a batch
|
|
144
|
+
|
|
145
|
+
min_t : float, optional
|
|
146
|
+
minimal amount of time in seconds to benchmark each combination
|
|
147
|
+
|
|
148
|
+
verbose : bool, optional
|
|
149
|
+
pass False to suppress the progress report
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
run_flags(flags, repeat=100, min_t=1.0)
|
|
153
|
+
Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
|
|
154
|
+
matrix multiplication. Each iteration involves multiplication of normal
|
|
155
|
+
numbers that would produce subnormal numbers and multiplication of
|
|
156
|
+
subnormal numbers by normal numbers, which also would produce subnormal
|
|
157
|
+
numbers.
|
|
158
|
+
|
|
159
|
+
The test is designed for clear demonstration of performance degradation (if
|
|
160
|
+
it is present); the effect for real-world data is usually less severe.
|
|
161
|
+
|
|
162
|
+
Parameters
|
|
163
|
+
----------
|
|
164
|
+
flags : dict
|
|
165
|
+
dictionary with arguments passed to sseflags.set_flags()
|
|
166
|
+
|
|
167
|
+
repeat : int, optional
|
|
168
|
+
number of iterations in a batch
|
|
169
|
+
|
|
170
|
+
min_t : float, optional
|
|
171
|
+
batches are repeated until this amount of seconds passes
|
|
172
|
+
|
|
173
|
+
Returns
|
|
174
|
+
-------
|
|
175
|
+
time : float
|
|
176
|
+
average time per iteration in seconds
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Installation
|
|
180
|
+
|
|
181
|
+
Compiled wheels for Linux, macOS and Windows can be installed
|
|
182
|
+
[from PyPI](https://pypi.org/project/sseflags).
|
|
183
|
+
They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
|
|
184
|
+
that should be compatible with all Python versions ⩾3.10. For portability, a
|
|
185
|
+
“universal wheel” is also available. It does not contain the Cython extension,
|
|
186
|
+
and thus has no effect on computations, but can be installed on unsupported
|
|
187
|
+
systems.
|
sseflags-0.1/README.md
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
# SSE flags
|
|
2
|
+
|
|
3
|
+
NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
|
|
4
|
+
floating-point calculations. Unfortunately, on Intel CPUs, they work very
|
|
5
|
+
slowly with
|
|
6
|
+
[subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
|
|
7
|
+
To avoid such performance degradation, if somewhat worse floating-point
|
|
8
|
+
accuracy in extreme cases can be tolerated, the
|
|
9
|
+
[DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
|
|
10
|
+
were introduced to treat input and/or output subnormal numbers as zeros. This
|
|
11
|
+
module provides access to these CPU flags from Python.
|
|
12
|
+
|
|
13
|
+
To test the effect on your system, use ``sseflags.benchmark.run()`` or run
|
|
14
|
+
```
|
|
15
|
+
python3 -m sseflags.benchmark
|
|
16
|
+
```
|
|
17
|
+
in the command line. Example output on Intel i9-12900K (subnormal numbers are
|
|
18
|
+
very slow):
|
|
19
|
+
```
|
|
20
|
+
Times in milliseconds:
|
|
21
|
+
default 1.979
|
|
22
|
+
========================
|
|
23
|
+
FTZ off FTZ on
|
|
24
|
+
------------------------
|
|
25
|
+
DAZ off 1.993 2.037
|
|
26
|
+
DAZ on 6.669 0.037
|
|
27
|
+
========================
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
|
|
31
|
+
mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
|
|
32
|
+
Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
|
|
33
|
+
numbers; notice that times are in *micro*seconds):
|
|
34
|
+
```
|
|
35
|
+
Times in microseconds:
|
|
36
|
+
default 16.834
|
|
37
|
+
========================
|
|
38
|
+
FTZ off FTZ on
|
|
39
|
+
------------------------
|
|
40
|
+
DAZ off 16.829 15.383
|
|
41
|
+
DAZ on 15.353 14.500
|
|
42
|
+
========================
|
|
43
|
+
```
|
|
44
|
+
Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
|
|
45
|
+
difference):
|
|
46
|
+
```
|
|
47
|
+
Times in milliseconds:
|
|
48
|
+
default 0.229
|
|
49
|
+
========================
|
|
50
|
+
FTZ off FTZ on
|
|
51
|
+
------------------------
|
|
52
|
+
DAZ off 0.225 0.131
|
|
53
|
+
DAZ on 0.224 0.131
|
|
54
|
+
========================
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
On other architectures, or if the underlying Cython extension is not built, the
|
|
58
|
+
module only reports that it has no effect.
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
## ``sseflags`` module
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
get_flags()
|
|
65
|
+
Query current states of the DAZ and FTZ flags, see set_flags() for details.
|
|
66
|
+
Can be used for restoring the default behavior:
|
|
67
|
+
|
|
68
|
+
flags = get_flags() # remember the original flag states
|
|
69
|
+
set_flags(daz=True, ftz=True) # enable DAZ and FTZ
|
|
70
|
+
... # do some calculations
|
|
71
|
+
set_flags(**flags) # restore the original flag states
|
|
72
|
+
|
|
73
|
+
Returns
|
|
74
|
+
-------
|
|
75
|
+
flags : dict
|
|
76
|
+
dictionary with the keys 'daz' and 'ftz', values of which represent the
|
|
77
|
+
corresponding flag state: True for set, False for cleared, None if not
|
|
78
|
+
implemented
|
|
79
|
+
|
|
80
|
+
|
|
81
|
+
set_flags(daz=None, ftz=None, verbose=False)
|
|
82
|
+
Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
|
|
83
|
+
SSE and AVX floating-point calculations, which can be useful for Intel CPUs
|
|
84
|
+
that work very slowly with subnormal (denormal) numbers.
|
|
85
|
+
|
|
86
|
+
On unsupported architectures, or if the underlying Cython extension was not
|
|
87
|
+
built, this function only reports that it has no effect. The availability
|
|
88
|
+
can be checked by calling set_flags() without arguments.
|
|
89
|
+
|
|
90
|
+
Parameters
|
|
91
|
+
----------
|
|
92
|
+
daz : bool or None, optional
|
|
93
|
+
True to set, False to clear the DAZ flag; None (default) to leave
|
|
94
|
+
unchanged
|
|
95
|
+
|
|
96
|
+
ftz : bool or None, optional
|
|
97
|
+
True to set, False to clear the FTZ flag; None (default) to leave
|
|
98
|
+
unchanged
|
|
99
|
+
|
|
100
|
+
verbose : bool, optional
|
|
101
|
+
pass True to print a warning if the operation is not implemented
|
|
102
|
+
|
|
103
|
+
Returns
|
|
104
|
+
-------
|
|
105
|
+
implemented : bool
|
|
106
|
+
True if this operation is implemented, False if not
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### ``sseflags.benchmark`` submodule
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
run(repeat=100, min_t=1.0, verbose=True)
|
|
113
|
+
Run benchmarks with all possible combinations of the DAZ and FTZ flags to
|
|
114
|
+
check their effect on NumPy performance (see run_flags() for details).
|
|
115
|
+
|
|
116
|
+
Parameters
|
|
117
|
+
----------
|
|
118
|
+
repeat : int, optional
|
|
119
|
+
number of iterations in a batch
|
|
120
|
+
|
|
121
|
+
min_t : float, optional
|
|
122
|
+
minimal amount of time in seconds to benchmark each combination
|
|
123
|
+
|
|
124
|
+
verbose : bool, optional
|
|
125
|
+
pass False to suppress the progress report
|
|
126
|
+
|
|
127
|
+
|
|
128
|
+
run_flags(flags, repeat=100, min_t=1.0)
|
|
129
|
+
Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
|
|
130
|
+
matrix multiplication. Each iteration involves multiplication of normal
|
|
131
|
+
numbers that would produce subnormal numbers and multiplication of
|
|
132
|
+
subnormal numbers by normal numbers, which also would produce subnormal
|
|
133
|
+
numbers.
|
|
134
|
+
|
|
135
|
+
The test is designed for clear demonstration of performance degradation (if
|
|
136
|
+
it is present); the effect for real-world data is usually less severe.
|
|
137
|
+
|
|
138
|
+
Parameters
|
|
139
|
+
----------
|
|
140
|
+
flags : dict
|
|
141
|
+
dictionary with arguments passed to sseflags.set_flags()
|
|
142
|
+
|
|
143
|
+
repeat : int, optional
|
|
144
|
+
number of iterations in a batch
|
|
145
|
+
|
|
146
|
+
min_t : float, optional
|
|
147
|
+
batches are repeated until this amount of seconds passes
|
|
148
|
+
|
|
149
|
+
Returns
|
|
150
|
+
-------
|
|
151
|
+
time : float
|
|
152
|
+
average time per iteration in seconds
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
## Installation
|
|
156
|
+
|
|
157
|
+
Compiled wheels for Linux, macOS and Windows can be installed
|
|
158
|
+
[from PyPI](https://pypi.org/project/sseflags).
|
|
159
|
+
They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
|
|
160
|
+
that should be compatible with all Python versions ⩾3.10. For portability, a
|
|
161
|
+
“universal wheel” is also available. It does not contain the Cython extension,
|
|
162
|
+
and thus has no effect on computations, but can be installed on unsupported
|
|
163
|
+
systems.
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "sseflags"
|
|
3
|
+
description = "Python package for accessing DAZ and FTZ flags"
|
|
4
|
+
readme = "README.md"
|
|
5
|
+
requires-python = ">= 3.10"
|
|
6
|
+
license = "MIT"
|
|
7
|
+
authors = [{name = "Mikhail Ryazanov"}]
|
|
8
|
+
keywords = ["SSE", "AVX", "subnormal", "denormal"]
|
|
9
|
+
classifiers = [
|
|
10
|
+
"Development Status :: 4 - Beta",
|
|
11
|
+
"Topic :: Software Development :: Libraries :: Python Modules",
|
|
12
|
+
"Programming Language :: Python :: 3",
|
|
13
|
+
"Programming Language :: Python :: 3.10",
|
|
14
|
+
"Programming Language :: Python :: 3.11",
|
|
15
|
+
"Programming Language :: Python :: 3.12",
|
|
16
|
+
"Programming Language :: Python :: 3.13",
|
|
17
|
+
"Programming Language :: Python :: 3.14",
|
|
18
|
+
]
|
|
19
|
+
dynamic = ["version"] # __version__ from sseflags/__init__.py
|
|
20
|
+
|
|
21
|
+
[project.urls]
|
|
22
|
+
homepage = "https://github.com/MikhailRyazanov/SSEflags"
|
|
23
|
+
documentation = "https://github.com/MikhailRyazanov/SSEflags/README.md"
|
|
24
|
+
github = "https://github.com/MikhailRyazanov/SSEflags.git"
|
|
25
|
+
issues = "https://github.com/MikhailRyazanov/SSEflags/issues"
|
|
26
|
+
|
|
27
|
+
[build-system]
|
|
28
|
+
requires = ["setuptools >= 77.0", "cython >= 3"]
|
|
29
|
+
build-backend = "setuptools.build_meta"
|
|
30
|
+
|
|
31
|
+
[tool.setuptools.packages.find]
|
|
32
|
+
include = ["sseflags"]
|
|
33
|
+
|
|
34
|
+
[tool.cibuildwheel]
|
|
35
|
+
environment.FORCE_COLOR = 1
|
|
36
|
+
environment.PIP_PROGRESS_BAR = "off"
|
|
37
|
+
test-requires = "abi3audit"
|
|
38
|
+
test-command = "python -m abi3audit -v --summary {wheel}"
|
|
39
|
+
|
|
40
|
+
[tool.cibuildwheel.linux]
|
|
41
|
+
archs = ["i686", "x86_64"]
|
|
42
|
+
|
|
43
|
+
[tool.cibuildwheel.windows]
|
|
44
|
+
archs = ["x86", "AMD64"]
|
sseflags-0.1/setup.cfg
ADDED
sseflags-0.1/setup.py
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
import os
|
|
2
|
+
from pathlib import Path
|
|
3
|
+
import sys
|
|
4
|
+
from setuptools import setup, Extension
|
|
5
|
+
|
|
6
|
+
sys.path.insert(0, '') # include CWD (missing in build isolation)
|
|
7
|
+
from sseflags import __version__
|
|
8
|
+
|
|
9
|
+
if sys.platform == 'win32': # for MSVC
|
|
10
|
+
extra_compile_args = ['/Os']
|
|
11
|
+
else: # for GCC and Clang
|
|
12
|
+
extra_compile_args = ['-Os', '-g0']
|
|
13
|
+
ext_modules = [
|
|
14
|
+
# ("Path" below is a workaround for Setuptools bug on Windows,
|
|
15
|
+
# see https://github.com/pypa/setuptools/issues/5093)
|
|
16
|
+
Extension('sseflags._lib', [Path('sseflags/_lib.pyx')],
|
|
17
|
+
extra_compile_args=extra_compile_args,
|
|
18
|
+
define_macros=[('Py_LIMITED_API', 0x030A0000)], # 0x0B = 11
|
|
19
|
+
py_limited_api=True)
|
|
20
|
+
]
|
|
21
|
+
# "sseflags=none python -m build --wheel" to build a "universal" wheel
|
|
22
|
+
# (...-none-any.whl) without Cython extension
|
|
23
|
+
if os.environ.get('sseflags') == 'none':
|
|
24
|
+
ext_modules = None
|
|
25
|
+
|
|
26
|
+
setup(version=__version__, ext_modules=ext_modules,
|
|
27
|
+
options={'bdist_wheel': {'py_limited_api': 'cp310'}})
|
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
try:
|
|
2
|
+
from ._lib import _get_daz, _get_ftz, _set_daz, _set_ftz
|
|
3
|
+
_ext = True
|
|
4
|
+
except ImportError:
|
|
5
|
+
_ext = False
|
|
6
|
+
|
|
7
|
+
__version__ = '0.1'
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def get_flags():
|
|
11
|
+
"""
|
|
12
|
+
Query current states of the DAZ and FTZ flags, see set_flags() for details.
|
|
13
|
+
Can be used for restoring the default behavior:
|
|
14
|
+
|
|
15
|
+
flags = get_flags() # remember the original flag states
|
|
16
|
+
set_flags(daz=True, ftz=True) # enable DAZ and FTZ
|
|
17
|
+
... # do some calculations
|
|
18
|
+
set_flags(**flags) # restore the original flag states
|
|
19
|
+
|
|
20
|
+
Returns
|
|
21
|
+
-------
|
|
22
|
+
flags : dict
|
|
23
|
+
dictionary with the keys 'daz' and 'ftz', values of which represent the
|
|
24
|
+
corresponding flag state: True for set, False for cleared, None if not
|
|
25
|
+
implemented
|
|
26
|
+
"""
|
|
27
|
+
flags = {'daz': None, 'ftz': None}
|
|
28
|
+
if _ext:
|
|
29
|
+
flags['daz'] = _get_daz()
|
|
30
|
+
flags['ftz'] = _get_ftz()
|
|
31
|
+
return flags
|
|
32
|
+
|
|
33
|
+
|
|
34
|
+
def set_flags(daz=None, ftz=None, verbose=False):
|
|
35
|
+
"""
|
|
36
|
+
Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
|
|
37
|
+
SSE and AVX floating-point calculations, which can be useful for Intel CPUs
|
|
38
|
+
that work very slowly with subnormal (denormal) numbers.
|
|
39
|
+
|
|
40
|
+
On unsupported architectures, or if the underlying Cython extension was not
|
|
41
|
+
built, this function only reports that it has no effect. The availability
|
|
42
|
+
can be checked by calling set_flags() without arguments.
|
|
43
|
+
|
|
44
|
+
Parameters
|
|
45
|
+
----------
|
|
46
|
+
daz : bool or None, optional
|
|
47
|
+
True to set, False to clear the DAZ flag; None (default) to leave
|
|
48
|
+
unchanged
|
|
49
|
+
|
|
50
|
+
ftz : bool or None, optional
|
|
51
|
+
True to set, False to clear the FTZ flag; None (default) to leave
|
|
52
|
+
unchanged
|
|
53
|
+
|
|
54
|
+
verbose : bool, optional
|
|
55
|
+
pass True to print a warning if the operation is not implemented
|
|
56
|
+
|
|
57
|
+
Returns
|
|
58
|
+
-------
|
|
59
|
+
implemented : bool
|
|
60
|
+
True if this operation is implemented, False if not
|
|
61
|
+
"""
|
|
62
|
+
if _ext:
|
|
63
|
+
if daz is not None:
|
|
64
|
+
_set_daz(daz)
|
|
65
|
+
if ftz is not None:
|
|
66
|
+
_set_ftz(ftz)
|
|
67
|
+
return True
|
|
68
|
+
|
|
69
|
+
if verbose:
|
|
70
|
+
print('Cannot change the DAZ/FTZ flags: the extension was not '
|
|
71
|
+
'compiled or is not needed for this CPU.')
|
|
72
|
+
return False
|
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
# cython: language_level=3
|
|
2
|
+
# (disable unneeded features to reduce compiled size)
|
|
3
|
+
# cython: always_allow_keywords=False, auto_pickle=False, binding=False
|
|
4
|
+
|
|
5
|
+
from libcpp cimport bool
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
cdef extern from *:
|
|
9
|
+
r""" // C code for Cython
|
|
10
|
+
#include <stdbool.h>
|
|
11
|
+
#include <pmmintrin.h>
|
|
12
|
+
#include <xmmintrin.h>
|
|
13
|
+
|
|
14
|
+
void c_set_daz(bool on) {
|
|
15
|
+
_MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON
|
|
16
|
+
: _MM_DENORMALS_ZERO_OFF);
|
|
17
|
+
}
|
|
18
|
+
|
|
19
|
+
void c_set_ftz(bool on) {
|
|
20
|
+
_MM_SET_FLUSH_ZERO_MODE((on ? _MM_FLUSH_ZERO_ON
|
|
21
|
+
: _MM_FLUSH_ZERO_OFF));
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
bool c_get_daz(void) {
|
|
25
|
+
return _MM_GET_DENORMALS_ZERO_MODE();
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
bool c_get_ftz(void) {
|
|
29
|
+
return _MM_GET_FLUSH_ZERO_MODE();
|
|
30
|
+
}
|
|
31
|
+
"""
|
|
32
|
+
# Cython declarations (not exported)
|
|
33
|
+
void c_set_daz(bool on) nogil
|
|
34
|
+
void c_set_ftz(bool on) nogil
|
|
35
|
+
bool c_get_daz() nogil
|
|
36
|
+
bool c_get_ftz() nogil
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
# Python wrappers for the C functions above
|
|
40
|
+
|
|
41
|
+
cpdef void _set_daz(bool on) noexcept nogil:
|
|
42
|
+
c_set_daz(on)
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
cpdef void _set_ftz(bool on) noexcept nogil:
|
|
46
|
+
c_set_ftz(on)
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
cpdef bool _get_daz() noexcept nogil:
|
|
50
|
+
return c_get_daz()
|
|
51
|
+
|
|
52
|
+
|
|
53
|
+
cpdef bool _get_ftz() noexcept nogil:
|
|
54
|
+
return c_get_ftz()
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
from itertools import count
|
|
2
|
+
from sys import float_info
|
|
3
|
+
from time import time
|
|
4
|
+
|
|
5
|
+
import numpy as np
|
|
6
|
+
|
|
7
|
+
from . import set_flags, get_flags
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def run(repeat=100, min_t=1.0, verbose=True):
|
|
11
|
+
"""
|
|
12
|
+
Run benchmarks with all possible combinations of the DAZ and FTZ flags to
|
|
13
|
+
check their effect on NumPy performance (see run_flags() for details).
|
|
14
|
+
|
|
15
|
+
Parameters
|
|
16
|
+
----------
|
|
17
|
+
repeat : int, optional
|
|
18
|
+
number of iterations in a batch
|
|
19
|
+
|
|
20
|
+
min_t : float, optional
|
|
21
|
+
minimal amount of time in seconds to benchmark each combination
|
|
22
|
+
|
|
23
|
+
verbose : bool, optional
|
|
24
|
+
pass False to suppress the progress report
|
|
25
|
+
"""
|
|
26
|
+
def vprint(*args):
|
|
27
|
+
if verbose:
|
|
28
|
+
print(*args)
|
|
29
|
+
|
|
30
|
+
flags = get_flags()
|
|
31
|
+
vprint(f'Default: {flags}.')
|
|
32
|
+
|
|
33
|
+
if not set_flags():
|
|
34
|
+
print('Setting DAZ/FTZ is not implemented.')
|
|
35
|
+
return
|
|
36
|
+
|
|
37
|
+
res = {}
|
|
38
|
+
for daz, ftz in [(None, None),
|
|
39
|
+
(False, False), (False, True),
|
|
40
|
+
(True, False), (True, True)]:
|
|
41
|
+
res[daz, ftz] = run_flags({'daz': daz, 'ftz': ftz},
|
|
42
|
+
repeat=repeat, min_t=min_t)
|
|
43
|
+
vprint(f'Done {get_flags()}.')
|
|
44
|
+
|
|
45
|
+
set_flags(**flags)
|
|
46
|
+
vprint(f'Restored: {get_flags()}.\n')
|
|
47
|
+
|
|
48
|
+
if max(res.values()) < 100e-6:
|
|
49
|
+
prefix = 'micro'
|
|
50
|
+
factor = 1e6
|
|
51
|
+
else:
|
|
52
|
+
prefix = 'milli'
|
|
53
|
+
factor = 1e3
|
|
54
|
+
|
|
55
|
+
def fmt(daz, ftz):
|
|
56
|
+
return f'{res[(daz, ftz)] * factor:6.3f}'
|
|
57
|
+
|
|
58
|
+
print(f'Times in {prefix}seconds:')
|
|
59
|
+
print(f'default {fmt(None, None)}')
|
|
60
|
+
print('=' * 24)
|
|
61
|
+
print(' FTZ off FTZ on')
|
|
62
|
+
print('-' * 24)
|
|
63
|
+
print(f'DAZ off {fmt(False, False)} {fmt(False, True)}')
|
|
64
|
+
print(f'DAZ on {fmt(True, False)} {fmt(True, True)}')
|
|
65
|
+
print('=' * 24)
|
|
66
|
+
|
|
67
|
+
|
|
68
|
+
def run_flags(flags, repeat=100, min_t=1.0):
|
|
69
|
+
"""
|
|
70
|
+
Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
|
|
71
|
+
matrix multiplication. Each iteration involves multiplication of normal
|
|
72
|
+
numbers that would produce subnormal numbers and multiplication of
|
|
73
|
+
subnormal numbers by normal numbers, which also would produce subnormal
|
|
74
|
+
numbers.
|
|
75
|
+
|
|
76
|
+
The test is designed for clear demonstration of performance degradation (if
|
|
77
|
+
it is present); the effect for real-world data is usually less severe.
|
|
78
|
+
|
|
79
|
+
Parameters
|
|
80
|
+
----------
|
|
81
|
+
flags : dict
|
|
82
|
+
dictionary with arguments passed to sseflags.set_flags()
|
|
83
|
+
|
|
84
|
+
repeat : int, optional
|
|
85
|
+
number of iterations in a batch
|
|
86
|
+
|
|
87
|
+
min_t : float, optional
|
|
88
|
+
batches are repeated until this amount of seconds passes
|
|
89
|
+
|
|
90
|
+
Returns
|
|
91
|
+
-------
|
|
92
|
+
time : float
|
|
93
|
+
average time per iteration in seconds
|
|
94
|
+
"""
|
|
95
|
+
if None not in flags:
|
|
96
|
+
# to ensure that subnormal test data can be created
|
|
97
|
+
set_flags(daz=False, ftz=False)
|
|
98
|
+
|
|
99
|
+
# Python "float" (= NumPy "float64" = C "double" = IEEE 754 "binary64")
|
|
100
|
+
# numbers below 2**float_info.min_exp are subnormal and span
|
|
101
|
+
# float_info.mant_dig binary orders of magnitude, thus:
|
|
102
|
+
# A consists of normal elements, whose products would be subnormal,
|
|
103
|
+
# B consists of subnormal elements
|
|
104
|
+
x = np.arange(float_info.mant_dig)
|
|
105
|
+
A = 2.0**(float_info.min_exp / 2 - (x + x[:, None]) / 2)
|
|
106
|
+
B = 2.0**(float_info.min_exp - (x + x[:, None]) / 2)
|
|
107
|
+
C = np.ones_like(B)
|
|
108
|
+
|
|
109
|
+
set_flags(**flags)
|
|
110
|
+
t0 = time()
|
|
111
|
+
for i in count(1):
|
|
112
|
+
for _ in range(repeat):
|
|
113
|
+
A.dot(A) # sum(normal * normal = subnormal) = subnormal
|
|
114
|
+
B.dot(C) # sum(subnormal * 1.0 = subnormal) = subnormal
|
|
115
|
+
if time() > t0 + min_t:
|
|
116
|
+
break
|
|
117
|
+
return (time() - t0) / (i * repeat)
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
if __name__ == '__main__':
|
|
121
|
+
run()
|
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sseflags
|
|
3
|
+
Version: 0.1
|
|
4
|
+
Summary: Python package for accessing DAZ and FTZ flags
|
|
5
|
+
Author: Mikhail Ryazanov
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: homepage, https://github.com/MikhailRyazanov/SSEflags
|
|
8
|
+
Project-URL: documentation, https://github.com/MikhailRyazanov/SSEflags/README.md
|
|
9
|
+
Project-URL: github, https://github.com/MikhailRyazanov/SSEflags.git
|
|
10
|
+
Project-URL: issues, https://github.com/MikhailRyazanov/SSEflags/issues
|
|
11
|
+
Keywords: SSE,AVX,subnormal,denormal
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Description-Content-Type: text/markdown
|
|
22
|
+
License-File: LICENSE
|
|
23
|
+
Dynamic: license-file
|
|
24
|
+
|
|
25
|
+
# SSE flags
|
|
26
|
+
|
|
27
|
+
NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
|
|
28
|
+
floating-point calculations. Unfortunately, on Intel CPUs, they work very
|
|
29
|
+
slowly with
|
|
30
|
+
[subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
|
|
31
|
+
To avoid such performance degradation, if somewhat worse floating-point
|
|
32
|
+
accuracy in extreme cases can be tolerated, the
|
|
33
|
+
[DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
|
|
34
|
+
were introduced to treat input and/or output subnormal numbers as zeros. This
|
|
35
|
+
module provides access to these CPU flags from Python.
|
|
36
|
+
|
|
37
|
+
To test the effect on your system, use ``sseflags.benchmark.run()`` or run
|
|
38
|
+
```
|
|
39
|
+
python3 -m sseflags.benchmark
|
|
40
|
+
```
|
|
41
|
+
in the command line. Example output on Intel i9-12900K (subnormal numbers are
|
|
42
|
+
very slow):
|
|
43
|
+
```
|
|
44
|
+
Times in milliseconds:
|
|
45
|
+
default 1.979
|
|
46
|
+
========================
|
|
47
|
+
FTZ off FTZ on
|
|
48
|
+
------------------------
|
|
49
|
+
DAZ off 1.993 2.037
|
|
50
|
+
DAZ on 6.669 0.037
|
|
51
|
+
========================
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
|
|
55
|
+
mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
|
|
56
|
+
Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
|
|
57
|
+
numbers; notice that times are in *micro*seconds):
|
|
58
|
+
```
|
|
59
|
+
Times in microseconds:
|
|
60
|
+
default 16.834
|
|
61
|
+
========================
|
|
62
|
+
FTZ off FTZ on
|
|
63
|
+
------------------------
|
|
64
|
+
DAZ off 16.829 15.383
|
|
65
|
+
DAZ on 15.353 14.500
|
|
66
|
+
========================
|
|
67
|
+
```
|
|
68
|
+
Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
|
|
69
|
+
difference):
|
|
70
|
+
```
|
|
71
|
+
Times in milliseconds:
|
|
72
|
+
default 0.229
|
|
73
|
+
========================
|
|
74
|
+
FTZ off FTZ on
|
|
75
|
+
------------------------
|
|
76
|
+
DAZ off 0.225 0.131
|
|
77
|
+
DAZ on 0.224 0.131
|
|
78
|
+
========================
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
On other architectures, or if the underlying Cython extension is not built, the
|
|
82
|
+
module only reports that it has no effect.
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
## ``sseflags`` module
|
|
86
|
+
|
|
87
|
+
```
|
|
88
|
+
get_flags()
|
|
89
|
+
Query current states of the DAZ and FTZ flags, see set_flags() for details.
|
|
90
|
+
Can be used for restoring the default behavior:
|
|
91
|
+
|
|
92
|
+
flags = get_flags() # remember the original flag states
|
|
93
|
+
set_flags(daz=True, ftz=True) # enable DAZ and FTZ
|
|
94
|
+
... # do some calculations
|
|
95
|
+
set_flags(**flags) # restore the original flag states
|
|
96
|
+
|
|
97
|
+
Returns
|
|
98
|
+
-------
|
|
99
|
+
flags : dict
|
|
100
|
+
dictionary with the keys 'daz' and 'ftz', values of which represent the
|
|
101
|
+
corresponding flag state: True for set, False for cleared, None if not
|
|
102
|
+
implemented
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
set_flags(daz=None, ftz=None, verbose=False)
|
|
106
|
+
Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
|
|
107
|
+
SSE and AVX floating-point calculations, which can be useful for Intel CPUs
|
|
108
|
+
that work very slowly with subnormal (denormal) numbers.
|
|
109
|
+
|
|
110
|
+
On unsupported architectures, or if the underlying Cython extension was not
|
|
111
|
+
built, this function only reports that it has no effect. The availability
|
|
112
|
+
can be checked by calling set_flags() without arguments.
|
|
113
|
+
|
|
114
|
+
Parameters
|
|
115
|
+
----------
|
|
116
|
+
daz : bool or None, optional
|
|
117
|
+
True to set, False to clear the DAZ flag; None (default) to leave
|
|
118
|
+
unchanged
|
|
119
|
+
|
|
120
|
+
ftz : bool or None, optional
|
|
121
|
+
True to set, False to clear the FTZ flag; None (default) to leave
|
|
122
|
+
unchanged
|
|
123
|
+
|
|
124
|
+
verbose : bool, optional
|
|
125
|
+
pass True to print a warning if the operation is not implemented
|
|
126
|
+
|
|
127
|
+
Returns
|
|
128
|
+
-------
|
|
129
|
+
implemented : bool
|
|
130
|
+
True if this operation is implemented, False if not
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### ``sseflags.benchmark`` submodule
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
run(repeat=100, min_t=1.0, verbose=True)
|
|
137
|
+
Run benchmarks with all possible combinations of the DAZ and FTZ flags to
|
|
138
|
+
check their effect on NumPy performance (see run_flags() for details).
|
|
139
|
+
|
|
140
|
+
Parameters
|
|
141
|
+
----------
|
|
142
|
+
repeat : int, optional
|
|
143
|
+
number of iterations in a batch
|
|
144
|
+
|
|
145
|
+
min_t : float, optional
|
|
146
|
+
minimal amount of time in seconds to benchmark each combination
|
|
147
|
+
|
|
148
|
+
verbose : bool, optional
|
|
149
|
+
pass False to suppress the progress report
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
run_flags(flags, repeat=100, min_t=1.0)
|
|
153
|
+
Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
|
|
154
|
+
matrix multiplication. Each iteration involves multiplication of normal
|
|
155
|
+
numbers that would produce subnormal numbers and multiplication of
|
|
156
|
+
subnormal numbers by normal numbers, which also would produce subnormal
|
|
157
|
+
numbers.
|
|
158
|
+
|
|
159
|
+
The test is designed for clear demonstration of performance degradation (if
|
|
160
|
+
it is present); the effect for real-world data is usually less severe.
|
|
161
|
+
|
|
162
|
+
Parameters
|
|
163
|
+
----------
|
|
164
|
+
flags : dict
|
|
165
|
+
dictionary with arguments passed to sseflags.set_flags()
|
|
166
|
+
|
|
167
|
+
repeat : int, optional
|
|
168
|
+
number of iterations in a batch
|
|
169
|
+
|
|
170
|
+
min_t : float, optional
|
|
171
|
+
batches are repeated until this amount of seconds passes
|
|
172
|
+
|
|
173
|
+
Returns
|
|
174
|
+
-------
|
|
175
|
+
time : float
|
|
176
|
+
average time per iteration in seconds
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## Installation
|
|
180
|
+
|
|
181
|
+
Compiled wheels for Linux, macOS and Windows can be installed
|
|
182
|
+
[from PyPI](https://pypi.org/project/sseflags).
|
|
183
|
+
They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
|
|
184
|
+
that should be compatible with all Python versions ⩾3.10. For portability, a
|
|
185
|
+
“universal wheel” is also available. It does not contain the Cython extension,
|
|
186
|
+
and thus has no effect on computations, but can be installed on unsupported
|
|
187
|
+
systems.
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
sseflags
|