sseflags 0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
sseflags-0.1/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mikhail Ryazanov
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
sseflags-0.1/PKG-INFO ADDED
@@ -0,0 +1,187 @@
1
+ Metadata-Version: 2.4
2
+ Name: sseflags
3
+ Version: 0.1
4
+ Summary: Python package for accessing DAZ and FTZ flags
5
+ Author: Mikhail Ryazanov
6
+ License-Expression: MIT
7
+ Project-URL: homepage, https://github.com/MikhailRyazanov/SSEflags
8
+ Project-URL: documentation, https://github.com/MikhailRyazanov/SSEflags/README.md
9
+ Project-URL: github, https://github.com/MikhailRyazanov/SSEflags.git
10
+ Project-URL: issues, https://github.com/MikhailRyazanov/SSEflags/issues
11
+ Keywords: SSE,AVX,subnormal,denormal
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Programming Language :: Python :: 3.14
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Dynamic: license-file
24
+
25
+ # SSE flags
26
+
27
+ NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
28
+ floating-point calculations. Unfortunately, on Intel CPUs, they work very
29
+ slowly with
30
+ [subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
31
+ To avoid such performance degradation, if somewhat worse floating-point
32
+ accuracy in extreme cases can be tolerated, the
33
+ [DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
34
+ were introduced to treat input and/or output subnormal numbers as zeros. This
35
+ module provides access to these CPU flags from Python.
36
+
37
+ To test the effect on your system, use ``sseflags.benchmark.run()`` or run
38
+ ```
39
+ python3 -m sseflags.benchmark
40
+ ```
41
+ in the command line. Example output on Intel i9-12900K (subnormal numbers are
42
+ very slow):
43
+ ```
44
+ Times in milliseconds:
45
+ default 1.979
46
+ ========================
47
+ FTZ off FTZ on
48
+ ------------------------
49
+ DAZ off 1.993 2.037
50
+ DAZ on 6.669 0.037
51
+ ========================
52
+ ```
53
+
54
+ AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
55
+ mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
56
+ Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
57
+ numbers; notice that times are in *micro*seconds):
58
+ ```
59
+ Times in microseconds:
60
+ default 16.834
61
+ ========================
62
+ FTZ off FTZ on
63
+ ------------------------
64
+ DAZ off 16.829 15.383
65
+ DAZ on 15.353 14.500
66
+ ========================
67
+ ```
68
+ Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
69
+ difference):
70
+ ```
71
+ Times in milliseconds:
72
+ default 0.229
73
+ ========================
74
+ FTZ off FTZ on
75
+ ------------------------
76
+ DAZ off 0.225 0.131
77
+ DAZ on 0.224 0.131
78
+ ========================
79
+ ```
80
+
81
+ On other architectures, or if the underlying Cython extension is not built, the
82
+ module only reports that it has no effect.
83
+
84
+
85
+ ## ``sseflags`` module
86
+
87
+ ```
88
+ get_flags()
89
+ Query current states of the DAZ and FTZ flags, see set_flags() for details.
90
+ Can be used for restoring the default behavior:
91
+
92
+ flags = get_flags() # remember the original flag states
93
+ set_flags(daz=True, ftz=True) # enable DAZ and FTZ
94
+ ... # do some calculations
95
+ set_flags(**flags) # restore the original flag states
96
+
97
+ Returns
98
+ -------
99
+ flags : dict
100
+ dictionary with the keys 'daz' and 'ftz', values of which represent the
101
+ corresponding flag state: True for set, False for cleared, None if not
102
+ implemented
103
+
104
+
105
+ set_flags(daz=None, ftz=None, verbose=False)
106
+ Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
107
+ SSE and AVX floating-point calculations, which can be useful for Intel CPUs
108
+ that work very slowly with subnormal (denormal) numbers.
109
+
110
+ On unsupported architectures, or if the underlying Cython extension was not
111
+ built, this function only reports that it has no effect. The availability
112
+ can be checked by calling set_flags() without arguments.
113
+
114
+ Parameters
115
+ ----------
116
+ daz : bool or None, optional
117
+ True to set, False to clear the DAZ flag; None (default) to leave
118
+ unchanged
119
+
120
+ ftz : bool or None, optional
121
+ True to set, False to clear the FTZ flag; None (default) to leave
122
+ unchanged
123
+
124
+ verbose : bool, optional
125
+ pass True to print a warning if the operation is not implemented
126
+
127
+ Returns
128
+ -------
129
+ implemented : bool
130
+ True if this operation is implemented, False if not
131
+ ```
132
+
133
+ ### ``sseflags.benchmark`` submodule
134
+
135
+ ```
136
+ run(repeat=100, min_t=1.0, verbose=True)
137
+ Run benchmarks with all possible combinations of the DAZ and FTZ flags to
138
+ check their effect on NumPy performance (see run_flags() for details).
139
+
140
+ Parameters
141
+ ----------
142
+ repeat : int, optional
143
+ number of iterations in a batch
144
+
145
+ min_t : float, optional
146
+ minimal amount of time in seconds to benchmark each combination
147
+
148
+ verbose : bool, optional
149
+ pass False to suppress the progress report
150
+
151
+
152
+ run_flags(flags, repeat=100, min_t=1.0)
153
+ Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
154
+ matrix multiplication. Each iteration involves multiplication of normal
155
+ numbers that would produce subnormal numbers and multiplication of
156
+ subnormal numbers by normal numbers, which also would produce subnormal
157
+ numbers.
158
+
159
+ The test is designed for clear demonstration of performance degradation (if
160
+ it is present); the effect for real-world data is usually less severe.
161
+
162
+ Parameters
163
+ ----------
164
+ flags : dict
165
+ dictionary with arguments passed to sseflags.set_flags()
166
+
167
+ repeat : int, optional
168
+ number of iterations in a batch
169
+
170
+ min_t : float, optional
171
+ batches are repeated until this amount of seconds passes
172
+
173
+ Returns
174
+ -------
175
+ time : float
176
+ average time per iteration in seconds
177
+ ```
178
+
179
+ ## Installation
180
+
181
+ Compiled wheels for Linux, macOS and Windows can be installed
182
+ [from PyPI](https://pypi.org/project/sseflags).
183
+ They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
184
+ that should be compatible with all Python versions ⩾3.10. For portability, a
185
+ “universal wheel” is also available. It does not contain the Cython extension,
186
+ and thus has no effect on computations, but can be installed on unsupported
187
+ systems.
sseflags-0.1/README.md ADDED
@@ -0,0 +1,163 @@
1
+ # SSE flags
2
+
3
+ NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
4
+ floating-point calculations. Unfortunately, on Intel CPUs, they work very
5
+ slowly with
6
+ [subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
7
+ To avoid such performance degradation, if somewhat worse floating-point
8
+ accuracy in extreme cases can be tolerated, the
9
+ [DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
10
+ were introduced to treat input and/or output subnormal numbers as zeros. This
11
+ module provides access to these CPU flags from Python.
12
+
13
+ To test the effect on your system, use ``sseflags.benchmark.run()`` or run
14
+ ```
15
+ python3 -m sseflags.benchmark
16
+ ```
17
+ in the command line. Example output on Intel i9-12900K (subnormal numbers are
18
+ very slow):
19
+ ```
20
+ Times in milliseconds:
21
+ default 1.979
22
+ ========================
23
+ FTZ off FTZ on
24
+ ------------------------
25
+ DAZ off 1.993 2.037
26
+ DAZ on 6.669 0.037
27
+ ========================
28
+ ```
29
+
30
+ AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
31
+ mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
32
+ Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
33
+ numbers; notice that times are in *micro*seconds):
34
+ ```
35
+ Times in microseconds:
36
+ default 16.834
37
+ ========================
38
+ FTZ off FTZ on
39
+ ------------------------
40
+ DAZ off 16.829 15.383
41
+ DAZ on 15.353 14.500
42
+ ========================
43
+ ```
44
+ Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
45
+ difference):
46
+ ```
47
+ Times in milliseconds:
48
+ default 0.229
49
+ ========================
50
+ FTZ off FTZ on
51
+ ------------------------
52
+ DAZ off 0.225 0.131
53
+ DAZ on 0.224 0.131
54
+ ========================
55
+ ```
56
+
57
+ On other architectures, or if the underlying Cython extension is not built, the
58
+ module only reports that it has no effect.
59
+
60
+
61
+ ## ``sseflags`` module
62
+
63
+ ```
64
+ get_flags()
65
+ Query current states of the DAZ and FTZ flags, see set_flags() for details.
66
+ Can be used for restoring the default behavior:
67
+
68
+ flags = get_flags() # remember the original flag states
69
+ set_flags(daz=True, ftz=True) # enable DAZ and FTZ
70
+ ... # do some calculations
71
+ set_flags(**flags) # restore the original flag states
72
+
73
+ Returns
74
+ -------
75
+ flags : dict
76
+ dictionary with the keys 'daz' and 'ftz', values of which represent the
77
+ corresponding flag state: True for set, False for cleared, None if not
78
+ implemented
79
+
80
+
81
+ set_flags(daz=None, ftz=None, verbose=False)
82
+ Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
83
+ SSE and AVX floating-point calculations, which can be useful for Intel CPUs
84
+ that work very slowly with subnormal (denormal) numbers.
85
+
86
+ On unsupported architectures, or if the underlying Cython extension was not
87
+ built, this function only reports that it has no effect. The availability
88
+ can be checked by calling set_flags() without arguments.
89
+
90
+ Parameters
91
+ ----------
92
+ daz : bool or None, optional
93
+ True to set, False to clear the DAZ flag; None (default) to leave
94
+ unchanged
95
+
96
+ ftz : bool or None, optional
97
+ True to set, False to clear the FTZ flag; None (default) to leave
98
+ unchanged
99
+
100
+ verbose : bool, optional
101
+ pass True to print a warning if the operation is not implemented
102
+
103
+ Returns
104
+ -------
105
+ implemented : bool
106
+ True if this operation is implemented, False if not
107
+ ```
108
+
109
+ ### ``sseflags.benchmark`` submodule
110
+
111
+ ```
112
+ run(repeat=100, min_t=1.0, verbose=True)
113
+ Run benchmarks with all possible combinations of the DAZ and FTZ flags to
114
+ check their effect on NumPy performance (see run_flags() for details).
115
+
116
+ Parameters
117
+ ----------
118
+ repeat : int, optional
119
+ number of iterations in a batch
120
+
121
+ min_t : float, optional
122
+ minimal amount of time in seconds to benchmark each combination
123
+
124
+ verbose : bool, optional
125
+ pass False to suppress the progress report
126
+
127
+
128
+ run_flags(flags, repeat=100, min_t=1.0)
129
+ Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
130
+ matrix multiplication. Each iteration involves multiplication of normal
131
+ numbers that would produce subnormal numbers and multiplication of
132
+ subnormal numbers by normal numbers, which also would produce subnormal
133
+ numbers.
134
+
135
+ The test is designed for clear demonstration of performance degradation (if
136
+ it is present); the effect for real-world data is usually less severe.
137
+
138
+ Parameters
139
+ ----------
140
+ flags : dict
141
+ dictionary with arguments passed to sseflags.set_flags()
142
+
143
+ repeat : int, optional
144
+ number of iterations in a batch
145
+
146
+ min_t : float, optional
147
+ batches are repeated until this amount of seconds passes
148
+
149
+ Returns
150
+ -------
151
+ time : float
152
+ average time per iteration in seconds
153
+ ```
154
+
155
+ ## Installation
156
+
157
+ Compiled wheels for Linux, macOS and Windows can be installed
158
+ [from PyPI](https://pypi.org/project/sseflags).
159
+ They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
160
+ that should be compatible with all Python versions ⩾3.10. For portability, a
161
+ “universal wheel” is also available. It does not contain the Cython extension,
162
+ and thus has no effect on computations, but can be installed on unsupported
163
+ systems.
@@ -0,0 +1,44 @@
1
+ [project]
2
+ name = "sseflags"
3
+ description = "Python package for accessing DAZ and FTZ flags"
4
+ readme = "README.md"
5
+ requires-python = ">= 3.10"
6
+ license = "MIT"
7
+ authors = [{name = "Mikhail Ryazanov"}]
8
+ keywords = ["SSE", "AVX", "subnormal", "denormal"]
9
+ classifiers = [
10
+ "Development Status :: 4 - Beta",
11
+ "Topic :: Software Development :: Libraries :: Python Modules",
12
+ "Programming Language :: Python :: 3",
13
+ "Programming Language :: Python :: 3.10",
14
+ "Programming Language :: Python :: 3.11",
15
+ "Programming Language :: Python :: 3.12",
16
+ "Programming Language :: Python :: 3.13",
17
+ "Programming Language :: Python :: 3.14",
18
+ ]
19
+ dynamic = ["version"] # __version__ from sseflags/__init__.py
20
+
21
+ [project.urls]
22
+ homepage = "https://github.com/MikhailRyazanov/SSEflags"
23
+ documentation = "https://github.com/MikhailRyazanov/SSEflags/README.md"
24
+ github = "https://github.com/MikhailRyazanov/SSEflags.git"
25
+ issues = "https://github.com/MikhailRyazanov/SSEflags/issues"
26
+
27
+ [build-system]
28
+ requires = ["setuptools >= 77.0", "cython >= 3"]
29
+ build-backend = "setuptools.build_meta"
30
+
31
+ [tool.setuptools.packages.find]
32
+ include = ["sseflags"]
33
+
34
+ [tool.cibuildwheel]
35
+ environment.FORCE_COLOR = 1
36
+ environment.PIP_PROGRESS_BAR = "off"
37
+ test-requires = "abi3audit"
38
+ test-command = "python -m abi3audit -v --summary {wheel}"
39
+
40
+ [tool.cibuildwheel.linux]
41
+ archs = ["i686", "x86_64"]
42
+
43
+ [tool.cibuildwheel.windows]
44
+ archs = ["x86", "AMD64"]
sseflags-0.1/setup.cfg ADDED
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
sseflags-0.1/setup.py ADDED
@@ -0,0 +1,27 @@
1
+ import os
2
+ from pathlib import Path
3
+ import sys
4
+ from setuptools import setup, Extension
5
+
6
+ sys.path.insert(0, '') # include CWD (missing in build isolation)
7
+ from sseflags import __version__
8
+
9
+ if sys.platform == 'win32': # for MSVC
10
+ extra_compile_args = ['/Os']
11
+ else: # for GCC and Clang
12
+ extra_compile_args = ['-Os', '-g0']
13
+ ext_modules = [
14
+ # ("Path" below is a workaround for Setuptools bug on Windows,
15
+ # see https://github.com/pypa/setuptools/issues/5093)
16
+ Extension('sseflags._lib', [Path('sseflags/_lib.pyx')],
17
+ extra_compile_args=extra_compile_args,
18
+ define_macros=[('Py_LIMITED_API', 0x030A0000)], # 0x0B = 11
19
+ py_limited_api=True)
20
+ ]
21
+ # "sseflags=none python -m build --wheel" to build a "universal" wheel
22
+ # (...-none-any.whl) without Cython extension
23
+ if os.environ.get('sseflags') == 'none':
24
+ ext_modules = None
25
+
26
+ setup(version=__version__, ext_modules=ext_modules,
27
+ options={'bdist_wheel': {'py_limited_api': 'cp310'}})
@@ -0,0 +1,72 @@
1
+ try:
2
+ from ._lib import _get_daz, _get_ftz, _set_daz, _set_ftz
3
+ _ext = True
4
+ except ImportError:
5
+ _ext = False
6
+
7
+ __version__ = '0.1'
8
+
9
+
10
+ def get_flags():
11
+ """
12
+ Query current states of the DAZ and FTZ flags, see set_flags() for details.
13
+ Can be used for restoring the default behavior:
14
+
15
+ flags = get_flags() # remember the original flag states
16
+ set_flags(daz=True, ftz=True) # enable DAZ and FTZ
17
+ ... # do some calculations
18
+ set_flags(**flags) # restore the original flag states
19
+
20
+ Returns
21
+ -------
22
+ flags : dict
23
+ dictionary with the keys 'daz' and 'ftz', values of which represent the
24
+ corresponding flag state: True for set, False for cleared, None if not
25
+ implemented
26
+ """
27
+ flags = {'daz': None, 'ftz': None}
28
+ if _ext:
29
+ flags['daz'] = _get_daz()
30
+ flags['ftz'] = _get_ftz()
31
+ return flags
32
+
33
+
34
+ def set_flags(daz=None, ftz=None, verbose=False):
35
+ """
36
+ Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
37
+ SSE and AVX floating-point calculations, which can be useful for Intel CPUs
38
+ that work very slowly with subnormal (denormal) numbers.
39
+
40
+ On unsupported architectures, or if the underlying Cython extension was not
41
+ built, this function only reports that it has no effect. The availability
42
+ can be checked by calling set_flags() without arguments.
43
+
44
+ Parameters
45
+ ----------
46
+ daz : bool or None, optional
47
+ True to set, False to clear the DAZ flag; None (default) to leave
48
+ unchanged
49
+
50
+ ftz : bool or None, optional
51
+ True to set, False to clear the FTZ flag; None (default) to leave
52
+ unchanged
53
+
54
+ verbose : bool, optional
55
+ pass True to print a warning if the operation is not implemented
56
+
57
+ Returns
58
+ -------
59
+ implemented : bool
60
+ True if this operation is implemented, False if not
61
+ """
62
+ if _ext:
63
+ if daz is not None:
64
+ _set_daz(daz)
65
+ if ftz is not None:
66
+ _set_ftz(ftz)
67
+ return True
68
+
69
+ if verbose:
70
+ print('Cannot change the DAZ/FTZ flags: the extension was not '
71
+ 'compiled or is not needed for this CPU.')
72
+ return False
@@ -0,0 +1,54 @@
1
+ # cython: language_level=3
2
+ # (disable unneeded features to reduce compiled size)
3
+ # cython: always_allow_keywords=False, auto_pickle=False, binding=False
4
+
5
+ from libcpp cimport bool
6
+
7
+
8
+ cdef extern from *:
9
+ r""" // C code for Cython
10
+ #include <stdbool.h>
11
+ #include <pmmintrin.h>
12
+ #include <xmmintrin.h>
13
+
14
+ void c_set_daz(bool on) {
15
+ _MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON
16
+ : _MM_DENORMALS_ZERO_OFF);
17
+ }
18
+
19
+ void c_set_ftz(bool on) {
20
+ _MM_SET_FLUSH_ZERO_MODE((on ? _MM_FLUSH_ZERO_ON
21
+ : _MM_FLUSH_ZERO_OFF));
22
+ }
23
+
24
+ bool c_get_daz(void) {
25
+ return _MM_GET_DENORMALS_ZERO_MODE();
26
+ }
27
+
28
+ bool c_get_ftz(void) {
29
+ return _MM_GET_FLUSH_ZERO_MODE();
30
+ }
31
+ """
32
+ # Cython declarations (not exported)
33
+ void c_set_daz(bool on) nogil
34
+ void c_set_ftz(bool on) nogil
35
+ bool c_get_daz() nogil
36
+ bool c_get_ftz() nogil
37
+
38
+
39
+ # Python wrappers for the C functions above
40
+
41
+ cpdef void _set_daz(bool on) noexcept nogil:
42
+ c_set_daz(on)
43
+
44
+
45
+ cpdef void _set_ftz(bool on) noexcept nogil:
46
+ c_set_ftz(on)
47
+
48
+
49
+ cpdef bool _get_daz() noexcept nogil:
50
+ return c_get_daz()
51
+
52
+
53
+ cpdef bool _get_ftz() noexcept nogil:
54
+ return c_get_ftz()
@@ -0,0 +1,121 @@
1
+ from itertools import count
2
+ from sys import float_info
3
+ from time import time
4
+
5
+ import numpy as np
6
+
7
+ from . import set_flags, get_flags
8
+
9
+
10
+ def run(repeat=100, min_t=1.0, verbose=True):
11
+ """
12
+ Run benchmarks with all possible combinations of the DAZ and FTZ flags to
13
+ check their effect on NumPy performance (see run_flags() for details).
14
+
15
+ Parameters
16
+ ----------
17
+ repeat : int, optional
18
+ number of iterations in a batch
19
+
20
+ min_t : float, optional
21
+ minimal amount of time in seconds to benchmark each combination
22
+
23
+ verbose : bool, optional
24
+ pass False to suppress the progress report
25
+ """
26
+ def vprint(*args):
27
+ if verbose:
28
+ print(*args)
29
+
30
+ flags = get_flags()
31
+ vprint(f'Default: {flags}.')
32
+
33
+ if not set_flags():
34
+ print('Setting DAZ/FTZ is not implemented.')
35
+ return
36
+
37
+ res = {}
38
+ for daz, ftz in [(None, None),
39
+ (False, False), (False, True),
40
+ (True, False), (True, True)]:
41
+ res[daz, ftz] = run_flags({'daz': daz, 'ftz': ftz},
42
+ repeat=repeat, min_t=min_t)
43
+ vprint(f'Done {get_flags()}.')
44
+
45
+ set_flags(**flags)
46
+ vprint(f'Restored: {get_flags()}.\n')
47
+
48
+ if max(res.values()) < 100e-6:
49
+ prefix = 'micro'
50
+ factor = 1e6
51
+ else:
52
+ prefix = 'milli'
53
+ factor = 1e3
54
+
55
+ def fmt(daz, ftz):
56
+ return f'{res[(daz, ftz)] * factor:6.3f}'
57
+
58
+ print(f'Times in {prefix}seconds:')
59
+ print(f'default {fmt(None, None)}')
60
+ print('=' * 24)
61
+ print(' FTZ off FTZ on')
62
+ print('-' * 24)
63
+ print(f'DAZ off {fmt(False, False)} {fmt(False, True)}')
64
+ print(f'DAZ on {fmt(True, False)} {fmt(True, True)}')
65
+ print('=' * 24)
66
+
67
+
68
+ def run_flags(flags, repeat=100, min_t=1.0):
69
+ """
70
+ Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
71
+ matrix multiplication. Each iteration involves multiplication of normal
72
+ numbers that would produce subnormal numbers and multiplication of
73
+ subnormal numbers by normal numbers, which also would produce subnormal
74
+ numbers.
75
+
76
+ The test is designed for clear demonstration of performance degradation (if
77
+ it is present); the effect for real-world data is usually less severe.
78
+
79
+ Parameters
80
+ ----------
81
+ flags : dict
82
+ dictionary with arguments passed to sseflags.set_flags()
83
+
84
+ repeat : int, optional
85
+ number of iterations in a batch
86
+
87
+ min_t : float, optional
88
+ batches are repeated until this amount of seconds passes
89
+
90
+ Returns
91
+ -------
92
+ time : float
93
+ average time per iteration in seconds
94
+ """
95
+ if None not in flags:
96
+ # to ensure that subnormal test data can be created
97
+ set_flags(daz=False, ftz=False)
98
+
99
+ # Python "float" (= NumPy "float64" = C "double" = IEEE 754 "binary64")
100
+ # numbers below 2**float_info.min_exp are subnormal and span
101
+ # float_info.mant_dig binary orders of magnitude, thus:
102
+ # A consists of normal elements, whose products would be subnormal,
103
+ # B consists of subnormal elements
104
+ x = np.arange(float_info.mant_dig)
105
+ A = 2.0**(float_info.min_exp / 2 - (x + x[:, None]) / 2)
106
+ B = 2.0**(float_info.min_exp - (x + x[:, None]) / 2)
107
+ C = np.ones_like(B)
108
+
109
+ set_flags(**flags)
110
+ t0 = time()
111
+ for i in count(1):
112
+ for _ in range(repeat):
113
+ A.dot(A) # sum(normal * normal = subnormal) = subnormal
114
+ B.dot(C) # sum(subnormal * 1.0 = subnormal) = subnormal
115
+ if time() > t0 + min_t:
116
+ break
117
+ return (time() - t0) / (i * repeat)
118
+
119
+
120
+ if __name__ == '__main__':
121
+ run()
@@ -0,0 +1,187 @@
1
+ Metadata-Version: 2.4
2
+ Name: sseflags
3
+ Version: 0.1
4
+ Summary: Python package for accessing DAZ and FTZ flags
5
+ Author: Mikhail Ryazanov
6
+ License-Expression: MIT
7
+ Project-URL: homepage, https://github.com/MikhailRyazanov/SSEflags
8
+ Project-URL: documentation, https://github.com/MikhailRyazanov/SSEflags/README.md
9
+ Project-URL: github, https://github.com/MikhailRyazanov/SSEflags.git
10
+ Project-URL: issues, https://github.com/MikhailRyazanov/SSEflags/issues
11
+ Keywords: SSE,AVX,subnormal,denormal
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Programming Language :: Python :: 3.14
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Dynamic: license-file
24
+
25
+ # SSE flags
26
+
27
+ NumPy for x86 platforms (IA-32 and AMD64 architectures) uses SSE and/or AVX for
28
+ floating-point calculations. Unfortunately, on Intel CPUs, they work very
29
+ slowly with
30
+ [subnormal (denormal) numbers](https://en.wikipedia.org/wiki/Subnormal_number).
31
+ To avoid such performance degradation, if somewhat worse floating-point
32
+ accuracy in extreme cases can be tolerated, the
33
+ [DAZ (denormals-are-zero) and FTZ (flush-to-zero) CPU flags](https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/set-the-ftz-and-daz-flags.html)
34
+ were introduced to treat input and/or output subnormal numbers as zeros. This
35
+ module provides access to these CPU flags from Python.
36
+
37
+ To test the effect on your system, use ``sseflags.benchmark.run()`` or run
38
+ ```
39
+ python3 -m sseflags.benchmark
40
+ ```
41
+ in the command line. Example output on Intel i9-12900K (subnormal numbers are
42
+ very slow):
43
+ ```
44
+ Times in milliseconds:
45
+ default 1.979
46
+ ========================
47
+ FTZ off FTZ on
48
+ ------------------------
49
+ DAZ off 1.993 2.037
50
+ DAZ on 6.669 0.037
51
+ ========================
52
+ ```
53
+
54
+ AMD CPUs do not show performance degradation on subnormal numbers in the 64-bit
55
+ mode, and thus enabling DAZ/FTZ can only decrease the accuracy slightly.
56
+ Example benchmarks on AMD Ryzen 7 6800U (negligible degradation for subnormal
57
+ numbers; notice that times are in *micro*seconds):
58
+ ```
59
+ Times in microseconds:
60
+ default 16.834
61
+ ========================
62
+ FTZ off FTZ on
63
+ ------------------------
64
+ DAZ off 16.829 15.383
65
+ DAZ on 15.353 14.500
66
+ ========================
67
+ ```
68
+ Nevertheless, DAZ/FTZ might be useful in 32-bit Python (same CPU, noticeable
69
+ difference):
70
+ ```
71
+ Times in milliseconds:
72
+ default 0.229
73
+ ========================
74
+ FTZ off FTZ on
75
+ ------------------------
76
+ DAZ off 0.225 0.131
77
+ DAZ on 0.224 0.131
78
+ ========================
79
+ ```
80
+
81
+ On other architectures, or if the underlying Cython extension is not built, the
82
+ module only reports that it has no effect.
83
+
84
+
85
+ ## ``sseflags`` module
86
+
87
+ ```
88
+ get_flags()
89
+ Query current states of the DAZ and FTZ flags, see set_flags() for details.
90
+ Can be used for restoring the default behavior:
91
+
92
+ flags = get_flags() # remember the original flag states
93
+ set_flags(daz=True, ftz=True) # enable DAZ and FTZ
94
+ ... # do some calculations
95
+ set_flags(**flags) # restore the original flag states
96
+
97
+ Returns
98
+ -------
99
+ flags : dict
100
+ dictionary with the keys 'daz' and 'ftz', values of which represent the
101
+ corresponding flag state: True for set, False for cleared, None if not
102
+ implemented
103
+
104
+
105
+ set_flags(daz=None, ftz=None, verbose=False)
106
+ Set the DAZ (denormals-are-zero) and/or FTZ (flush-to-zero) CPU flags for
107
+ SSE and AVX floating-point calculations, which can be useful for Intel CPUs
108
+ that work very slowly with subnormal (denormal) numbers.
109
+
110
+ On unsupported architectures, or if the underlying Cython extension was not
111
+ built, this function only reports that it has no effect. The availability
112
+ can be checked by calling set_flags() without arguments.
113
+
114
+ Parameters
115
+ ----------
116
+ daz : bool or None, optional
117
+ True to set, False to clear the DAZ flag; None (default) to leave
118
+ unchanged
119
+
120
+ ftz : bool or None, optional
121
+ True to set, False to clear the FTZ flag; None (default) to leave
122
+ unchanged
123
+
124
+ verbose : bool, optional
125
+ pass True to print a warning if the operation is not implemented
126
+
127
+ Returns
128
+ -------
129
+ implemented : bool
130
+ True if this operation is implemented, False if not
131
+ ```
132
+
133
+ ### ``sseflags.benchmark`` submodule
134
+
135
+ ```
136
+ run(repeat=100, min_t=1.0, verbose=True)
137
+ Run benchmarks with all possible combinations of the DAZ and FTZ flags to
138
+ check their effect on NumPy performance (see run_flags() for details).
139
+
140
+ Parameters
141
+ ----------
142
+ repeat : int, optional
143
+ number of iterations in a batch
144
+
145
+ min_t : float, optional
146
+ minimal amount of time in seconds to benchmark each combination
147
+
148
+ verbose : bool, optional
149
+ pass False to suppress the progress report
150
+
151
+
152
+ run_flags(flags, repeat=100, min_t=1.0)
153
+ Set the DAZ and FTZ flags to given states and run a benchmark of NumPy
154
+ matrix multiplication. Each iteration involves multiplication of normal
155
+ numbers that would produce subnormal numbers and multiplication of
156
+ subnormal numbers by normal numbers, which also would produce subnormal
157
+ numbers.
158
+
159
+ The test is designed for clear demonstration of performance degradation (if
160
+ it is present); the effect for real-world data is usually less severe.
161
+
162
+ Parameters
163
+ ----------
164
+ flags : dict
165
+ dictionary with arguments passed to sseflags.set_flags()
166
+
167
+ repeat : int, optional
168
+ number of iterations in a batch
169
+
170
+ min_t : float, optional
171
+ batches are repeated until this amount of seconds passes
172
+
173
+ Returns
174
+ -------
175
+ time : float
176
+ average time per iteration in seconds
177
+ ```
178
+
179
+ ## Installation
180
+
181
+ Compiled wheels for Linux, macOS and Windows can be installed
182
+ [from PyPI](https://pypi.org/project/sseflags).
183
+ They use [“Stable ABI”](https://docs.python.org/3/c-api/stable.html#stable-abi)
184
+ that should be compatible with all Python versions ⩾3.10. For portability, a
185
+ “universal wheel” is also available. It does not contain the Cython extension,
186
+ and thus has no effect on computations, but can be installed on unsupported
187
+ systems.
@@ -0,0 +1,11 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ setup.py
5
+ sseflags/__init__.py
6
+ sseflags/_lib.pyx
7
+ sseflags/benchmark.py
8
+ sseflags.egg-info/PKG-INFO
9
+ sseflags.egg-info/SOURCES.txt
10
+ sseflags.egg-info/dependency_links.txt
11
+ sseflags.egg-info/top_level.txt
@@ -0,0 +1 @@
1
+ sseflags