erfi-pytorch 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- erfi_pytorch-0.1.0/LICENSE +21 -0
- erfi_pytorch-0.1.0/MANIFEST.in +4 -0
- erfi_pytorch-0.1.0/PKG-INFO +128 -0
- erfi_pytorch-0.1.0/README.md +95 -0
- erfi_pytorch-0.1.0/docs/faddeeva.md +451 -0
- erfi_pytorch-0.1.0/pyproject.toml +49 -0
- erfi_pytorch-0.1.0/setup.cfg +4 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/__init__.py +31 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/_coefficients.py +110 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/_coefficients_float64.py +106 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/_dispatch.py +43 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/_torch_impl.py +115 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/_triton.py +104 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch/licenses/FADDEEVA-MIT.txt +19 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch.egg-info/PKG-INFO +128 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch.egg-info/SOURCES.txt +23 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch.egg-info/dependency_links.txt +1 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch.egg-info/requires.txt +14 -0
- erfi_pytorch-0.1.0/src/erfi_pytorch.egg-info/top_level.txt +1 -0
- erfi_pytorch-0.1.0/tests/test_erfi.py +228 -0
- erfi_pytorch-0.1.0/third_party/faddeeva/Faddeeva.cc +2517 -0
- erfi_pytorch-0.1.0/third_party/faddeeva/Faddeeva.hh +62 -0
- erfi_pytorch-0.1.0/third_party/faddeeva/LICENSE +19 -0
- erfi_pytorch-0.1.0/third_party/faddeeva/README.md +15 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 erfi-pytorch contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: erfi-pytorch
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: GPU-accelerated imaginary error function for real PyTorch tensors
|
|
5
|
+
Author: erfi-pytorch contributors
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/ZhichaoZhu/erfi_pytorch
|
|
8
|
+
Project-URL: Repository, https://github.com/ZhichaoZhu/erfi_pytorch.git
|
|
9
|
+
Project-URL: Issues, https://github.com/ZhichaoZhu/erfi_pytorch/issues
|
|
10
|
+
Project-URL: Documentation, https://github.com/ZhichaoZhu/erfi_pytorch/blob/main/docs/faddeeva.md
|
|
11
|
+
Keywords: pytorch,cuda,triton,special-functions,erfi
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
19
|
+
Requires-Python: >=3.10
|
|
20
|
+
Description-Content-Type: text/markdown
|
|
21
|
+
License-File: LICENSE
|
|
22
|
+
License-File: third_party/faddeeva/LICENSE
|
|
23
|
+
Requires-Dist: torch>=2.7
|
|
24
|
+
Provides-Extra: test
|
|
25
|
+
Requires-Dist: mpmath>=1.3; extra == "test"
|
|
26
|
+
Requires-Dist: pytest>=8; extra == "test"
|
|
27
|
+
Requires-Dist: scipy>=1.11; extra == "test"
|
|
28
|
+
Provides-Extra: benchmark
|
|
29
|
+
Requires-Dist: scipy>=1.11; extra == "benchmark"
|
|
30
|
+
Provides-Extra: triton
|
|
31
|
+
Requires-Dist: triton>=3.3; platform_system == "Linux" and extra == "triton"
|
|
32
|
+
Dynamic: license-file
|
|
33
|
+
|
|
34
|
+
# erfi-pytorch
|
|
35
|
+
|
|
36
|
+
`erfi-pytorch` provides a forward-only imaginary error function for real
|
|
37
|
+
PyTorch tensors:
|
|
38
|
+
|
|
39
|
+
```python
|
|
40
|
+
import torch
|
|
41
|
+
from erfi_pytorch import erfi
|
|
42
|
+
|
|
43
|
+
x = torch.linspace(-4, 4, 1_000_000, device="cuda")
|
|
44
|
+
y = erfi(x)
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
The package supports `torch.float32` and `torch.float64` and preserves tensor
|
|
48
|
+
shape, dtype, and device. Its pure-PyTorch graph is compatible with
|
|
49
|
+
`torch.compile(fullgraph=True, backend="eager")`. Inductor compilation depends
|
|
50
|
+
on a working platform compiler or Triton installation and is validated
|
|
51
|
+
separately on supported Linux CUDA environments.
|
|
52
|
+
|
|
53
|
+
## Backends
|
|
54
|
+
|
|
55
|
+
- **Pure PyTorch:** portable CPU and CUDA implementation.
|
|
56
|
+
- **Triton:** fused path for contiguous NVIDIA CUDA tensors with at least
|
|
57
|
+
65,536 elements, when Triton is available.
|
|
58
|
+
|
|
59
|
+
Windows and systems without Triton automatically use the pure PyTorch path.
|
|
60
|
+
No CUDA toolkit or native compiler is required.
|
|
61
|
+
|
|
62
|
+
## Installation
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
pip install erfi-pytorch
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
For development and reference tests:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
pip install -e ".[test]"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
On Linux, install the optional Triton dependency if it is not already
|
|
75
|
+
provided by your PyTorch installation:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
pip install -e ".[test,triton]"
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Numerical method
|
|
82
|
+
|
|
83
|
+
For real `x`, the implementation uses
|
|
84
|
+
|
|
85
|
+
```text
|
|
86
|
+
erfi(x) = exp(x^2) Im(w(x)),
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
where `w` is the Faddeeva function. `Im(w(x))` is evaluated with a Taylor
|
|
90
|
+
polynomial near zero and a 100-interval table of low-degree polynomial
|
|
91
|
+
approximations elsewhere. Near floating-point overflow, the final magnitude
|
|
92
|
+
is reconstructed in the log domain so representable results are not lost to
|
|
93
|
+
premature overflow in `exp(x^2)`.
|
|
94
|
+
|
|
95
|
+
The polynomial coefficients originate from Steven G. Johnson's
|
|
96
|
+
MIT-licensed Faddeeva implementation. The original license notice is retained
|
|
97
|
+
in
|
|
98
|
+
[`third_party/faddeeva`](https://github.com/ZhichaoZhu/erfi_pytorch/tree/main/third_party/faddeeva).
|
|
99
|
+
|
|
100
|
+
The detailed implementation notes are in
|
|
101
|
+
[`docs/faddeeva.md`](https://github.com/ZhichaoZhu/erfi_pytorch/blob/main/docs/faddeeva.md).
|
|
102
|
+
|
|
103
|
+
## License
|
|
104
|
+
|
|
105
|
+
This project is released under the MIT License. The vendored Faddeeva sources
|
|
106
|
+
and material derived from them retain the original Copyright (c) 2012
|
|
107
|
+
Massachusetts Institute of Technology attribution and MIT license notice.
|
|
108
|
+
|
|
109
|
+
## Limitations
|
|
110
|
+
|
|
111
|
+
- Inputs must be real `torch.float32` or `torch.float64` tensors.
|
|
112
|
+
- This release is forward-only. `requires_grad=True` raises an error.
|
|
113
|
+
- Triton acceleration currently targets NVIDIA CUDA.
|
|
114
|
+
- Windows uses the pure-PyTorch CUDA backend because upstream Triton support
|
|
115
|
+
is not generally available there.
|
|
116
|
+
|
|
117
|
+
## Benchmark
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
python benchmarks/benchmark_erfi.py --dtype float32
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
The benchmark covers powers of two from `2^10` through `2^24` and reports
|
|
124
|
+
eager PyTorch, compiled PyTorch, eager dispatch, and compiled dispatch.
|
|
125
|
+
Before timing, it compares the operator against `scipy.special.erfi` and
|
|
126
|
+
reports maximum absolute error, maximum and mean relative error, and infinity
|
|
127
|
+
mismatches. Use `--precision-elements` to change the comparison sample count
|
|
128
|
+
or `--skip-precision` to run timing only.
|
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# erfi-pytorch
|
|
2
|
+
|
|
3
|
+
`erfi-pytorch` provides a forward-only imaginary error function for real
|
|
4
|
+
PyTorch tensors:
|
|
5
|
+
|
|
6
|
+
```python
|
|
7
|
+
import torch
|
|
8
|
+
from erfi_pytorch import erfi
|
|
9
|
+
|
|
10
|
+
x = torch.linspace(-4, 4, 1_000_000, device="cuda")
|
|
11
|
+
y = erfi(x)
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
The package supports `torch.float32` and `torch.float64` and preserves tensor
|
|
15
|
+
shape, dtype, and device. Its pure-PyTorch graph is compatible with
|
|
16
|
+
`torch.compile(fullgraph=True, backend="eager")`. Inductor compilation depends
|
|
17
|
+
on a working platform compiler or Triton installation and is validated
|
|
18
|
+
separately on supported Linux CUDA environments.
|
|
19
|
+
|
|
20
|
+
## Backends
|
|
21
|
+
|
|
22
|
+
- **Pure PyTorch:** portable CPU and CUDA implementation.
|
|
23
|
+
- **Triton:** fused path for contiguous NVIDIA CUDA tensors with at least
|
|
24
|
+
65,536 elements, when Triton is available.
|
|
25
|
+
|
|
26
|
+
Windows and systems without Triton automatically use the pure PyTorch path.
|
|
27
|
+
No CUDA toolkit or native compiler is required.
|
|
28
|
+
|
|
29
|
+
## Installation
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
pip install erfi-pytorch
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
For development and reference tests:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
pip install -e ".[test]"
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
On Linux, install the optional Triton dependency if it is not already
|
|
42
|
+
provided by your PyTorch installation:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
pip install -e ".[test,triton]"
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Numerical method
|
|
49
|
+
|
|
50
|
+
For real `x`, the implementation uses
|
|
51
|
+
|
|
52
|
+
```text
|
|
53
|
+
erfi(x) = exp(x^2) Im(w(x)),
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
where `w` is the Faddeeva function. `Im(w(x))` is evaluated with a Taylor
|
|
57
|
+
polynomial near zero and a 100-interval table of low-degree polynomial
|
|
58
|
+
approximations elsewhere. Near floating-point overflow, the final magnitude
|
|
59
|
+
is reconstructed in the log domain so representable results are not lost to
|
|
60
|
+
premature overflow in `exp(x^2)`.
|
|
61
|
+
|
|
62
|
+
The polynomial coefficients originate from Steven G. Johnson's
|
|
63
|
+
MIT-licensed Faddeeva implementation. The original license notice is retained
|
|
64
|
+
in
|
|
65
|
+
[`third_party/faddeeva`](https://github.com/ZhichaoZhu/erfi_pytorch/tree/main/third_party/faddeeva).
|
|
66
|
+
|
|
67
|
+
The detailed implementation notes are in
|
|
68
|
+
[`docs/faddeeva.md`](https://github.com/ZhichaoZhu/erfi_pytorch/blob/main/docs/faddeeva.md).
|
|
69
|
+
|
|
70
|
+
## License
|
|
71
|
+
|
|
72
|
+
This project is released under the MIT License. The vendored Faddeeva sources
|
|
73
|
+
and material derived from them retain the original Copyright (c) 2012
|
|
74
|
+
Massachusetts Institute of Technology attribution and MIT license notice.
|
|
75
|
+
|
|
76
|
+
## Limitations
|
|
77
|
+
|
|
78
|
+
- Inputs must be real `torch.float32` or `torch.float64` tensors.
|
|
79
|
+
- This release is forward-only. `requires_grad=True` raises an error.
|
|
80
|
+
- Triton acceleration currently targets NVIDIA CUDA.
|
|
81
|
+
- Windows uses the pure-PyTorch CUDA backend because upstream Triton support
|
|
82
|
+
is not generally available there.
|
|
83
|
+
|
|
84
|
+
## Benchmark
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
python benchmarks/benchmark_erfi.py --dtype float32
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
The benchmark covers powers of two from `2^10` through `2^24` and reports
|
|
91
|
+
eager PyTorch, compiled PyTorch, eager dispatch, and compiled dispatch.
|
|
92
|
+
Before timing, it compares the operator against `scipy.special.erfi` and
|
|
93
|
+
reports maximum absolute error, maximum and mean relative error, and infinity
|
|
94
|
+
mismatches. Use `--precision-elements` to change the comparison sample count
|
|
95
|
+
or `--skip-precision` to run timing only.
|
|
@@ -0,0 +1,451 @@
|
|
|
1
|
+
# Faddeeva.cc and Faddeeva.hh
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This code implements a family of related special functions for real and
|
|
6
|
+
complex double-precision arguments:
|
|
7
|
+
|
|
8
|
+
- the Faddeeva function `w(z)`
|
|
9
|
+
- the scaled complementary error function `erfcx(z)`
|
|
10
|
+
- the error function `erf(z)`
|
|
11
|
+
- the imaginary error function `erfi(z)`
|
|
12
|
+
- the complementary error function `erfc(z)`
|
|
13
|
+
- Dawson's integral
|
|
14
|
+
|
|
15
|
+
The central idea is to implement the Faddeeva function accurately over the
|
|
16
|
+
complex plane, then derive most of the other functions from mathematical
|
|
17
|
+
identities. Special real-argument implementations and local Taylor expansions
|
|
18
|
+
are used where they are faster or avoid numerical cancellation.
|
|
19
|
+
|
|
20
|
+
The header places the C++ API in namespace `Faddeeva` and supplies overloads
|
|
21
|
+
for `double` and `std::complex<double>`. Complex functions accept an optional
|
|
22
|
+
relative-error target:
|
|
23
|
+
|
|
24
|
+
```cpp
|
|
25
|
+
std::complex<double> Faddeeva::w(std::complex<double> z,
|
|
26
|
+
double relerr = 0);
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
A non-positive `relerr` means approximately machine precision. Internally,
|
|
30
|
+
the value is clamped to the range from `DBL_EPSILON` to `0.1`.
|
|
31
|
+
|
|
32
|
+
## Mathematical Relationships
|
|
33
|
+
|
|
34
|
+
The primary function is
|
|
35
|
+
|
|
36
|
+
```text
|
|
37
|
+
w(z) = exp(-z^2) erfc(-i z).
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The other important identities are
|
|
41
|
+
|
|
42
|
+
```text
|
|
43
|
+
erfcx(z) = exp(z^2) erfc(z) = w(i z)
|
|
44
|
+
erfi(z) = -i erf(i z)
|
|
45
|
+
D(z) = sqrt(pi)/2 exp(-z^2) erfi(z)
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
where `D(z)` denotes Dawson's integral.
|
|
49
|
+
|
|
50
|
+
For real `x`, the Faddeeva function has the especially useful form
|
|
51
|
+
|
|
52
|
+
```text
|
|
53
|
+
w(x) = exp(-x^2) + i * 2/sqrt(pi) * D(x).
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Therefore,
|
|
57
|
+
|
|
58
|
+
```text
|
|
59
|
+
w_im(x) = Im(w(x)) = 2/sqrt(pi) * D(x)
|
|
60
|
+
erfi(x) = exp(x^2) * w_im(x).
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
This last identity is the basis of the optimized real `erfi` implementation.
|
|
64
|
+
|
|
65
|
+
## Structure of the Source
|
|
66
|
+
|
|
67
|
+
### Portability layer
|
|
68
|
+
|
|
69
|
+
The opening macros allow the same source to compile as either C++ using
|
|
70
|
+
`std::complex<double>` or C using C99 complex numbers. They also normalize
|
|
71
|
+
construction of complex values, infinity, NaN, `isnan`, `isinf`, and
|
|
72
|
+
`copysign` across older compilers.
|
|
73
|
+
|
|
74
|
+
In C++, macros such as
|
|
75
|
+
|
|
76
|
+
```cpp
|
|
77
|
+
#define FADDEEVA(name) Faddeeva::name
|
|
78
|
+
#define C(a,b) std::complex<double>(a,b)
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
turn the shared implementation into the namespace API declared by
|
|
82
|
+
`Faddeeva.hh`.
|
|
83
|
+
|
|
84
|
+
### The `w(z)` numerical engine
|
|
85
|
+
|
|
86
|
+
`w(z)` first handles the coordinate axes:
|
|
87
|
+
|
|
88
|
+
- For purely imaginary `z = i y`, `w(i y) = erfcx(y)`.
|
|
89
|
+
- For real `z = x`, it returns
|
|
90
|
+
`exp(-x*x) + i*w_im(x)`.
|
|
91
|
+
|
|
92
|
+
For general complex arguments, it chooses between two main algorithms.
|
|
93
|
+
|
|
94
|
+
1. **Large arguments:** a continued-fraction expansion is used because it is
|
|
95
|
+
fast and asymptotically accurate. For extremely large values this reduces
|
|
96
|
+
to one or two terms, such as
|
|
97
|
+
`w(z) ~= i / (sqrt(pi) z)`.
|
|
98
|
+
2. **Smaller arguments:** a convergent summation based on Algorithm 916 is
|
|
99
|
+
used. Precomputed exponential coefficients accelerate the normal
|
|
100
|
+
machine-precision path.
|
|
101
|
+
|
|
102
|
+
The branch boundary is not just a simple `|z|` test. The code avoids the
|
|
103
|
+
continued fraction near parts of the real axis where the real component of
|
|
104
|
+
`w(z)` would have poor relative accuracy.
|
|
105
|
+
|
|
106
|
+
For arguments in the lower half-plane, it uses the symmetry
|
|
107
|
+
|
|
108
|
+
```text
|
|
109
|
+
w(z) = 2 exp(-z^2) - w(-z).
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Several expressions are algebraically rearranged to avoid intermediate
|
|
113
|
+
overflow. For example, the real part of `-z^2` is computed as
|
|
114
|
+
|
|
115
|
+
```cpp
|
|
116
|
+
(y - x) * (x + y)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
instead of directly evaluating `y*y - x*x`.
|
|
120
|
+
|
|
121
|
+
### Real helper functions
|
|
122
|
+
|
|
123
|
+
`erfcx(double)` and `w_im(double)` use similar three-region strategies:
|
|
124
|
+
|
|
125
|
+
- continued fractions for large magnitude arguments;
|
|
126
|
+
- piecewise Chebyshev polynomial approximations for the middle range;
|
|
127
|
+
- Taylor expansions near zero.
|
|
128
|
+
|
|
129
|
+
The Chebyshev lookup tables dominate the size of `Faddeeva.cc`. They are
|
|
130
|
+
precomputed polynomial coefficients, not separate conceptual algorithms.
|
|
131
|
+
|
|
132
|
+
## Imaginary Error Function
|
|
133
|
+
|
|
134
|
+
The imaginary error function is defined by
|
|
135
|
+
|
|
136
|
+
```text
|
|
137
|
+
erfi(z) = -i erf(i z)
|
|
138
|
+
= 2/sqrt(pi) integral_0^z exp(t^2) dt.
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
Unlike `erf(x)`, which approaches `+1` or `-1` on the real axis, `erfi(x)`
|
|
142
|
+
grows approximately like
|
|
143
|
+
|
|
144
|
+
```text
|
|
145
|
+
erfi(x) ~ exp(x^2) / (sqrt(pi) x).
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
That rapid growth is why overflow handling matters.
|
|
149
|
+
|
|
150
|
+
### Complex implementation
|
|
151
|
+
|
|
152
|
+
The complex implementation is:
|
|
153
|
+
|
|
154
|
+
```cpp
|
|
155
|
+
cmplx FADDEEVA(erfi)(cmplx z, double relerr)
|
|
156
|
+
{
|
|
157
|
+
cmplx e = FADDEEVA(erf)(C(-cimag(z),creal(z)), relerr);
|
|
158
|
+
return C(cimag(e), -creal(e));
|
|
159
|
+
}
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
If `z = x + i y`, then
|
|
163
|
+
|
|
164
|
+
```text
|
|
165
|
+
i z = -y + i x.
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
That explains the input rotation:
|
|
169
|
+
|
|
170
|
+
```cpp
|
|
171
|
+
C(-imag(z), real(z))
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
If the result is `e = a + i b`, multiplying it by `-i` gives
|
|
175
|
+
|
|
176
|
+
```text
|
|
177
|
+
-i(a + i b) = b - i a.
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
That explains the output rotation:
|
|
181
|
+
|
|
182
|
+
```cpp
|
|
183
|
+
C(imag(e), -real(e))
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
Thus the complex `erfi` routine contains no independent approximation.
|
|
187
|
+
It delegates to the complex `erf` implementation and performs two exact
|
|
188
|
+
coordinate rotations. This also means that `relerr` is passed directly to
|
|
189
|
+
the underlying Faddeeva calculation.
|
|
190
|
+
|
|
191
|
+
### Why complex `erf` is the hard part
|
|
192
|
+
|
|
193
|
+
The delegated `erf(z)` routine derives its normal path from `w(z)`, but it
|
|
194
|
+
does not blindly evaluate one formula everywhere. It has several stability
|
|
195
|
+
branches:
|
|
196
|
+
|
|
197
|
+
- real and imaginary axes are handled directly;
|
|
198
|
+
- positive and negative real parts use different symmetry formulas;
|
|
199
|
+
- a Taylor series is used near `z = 0`;
|
|
200
|
+
- a second expansion is used near the imaginary axis when the direct
|
|
201
|
+
expression would subtract nearly equal values.
|
|
202
|
+
|
|
203
|
+
The imaginary-axis identity used there is
|
|
204
|
+
|
|
205
|
+
```text
|
|
206
|
+
erf(i y) = i erfi(y)
|
|
207
|
+
= i exp(y^2) w_im(y).
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
This is the `taylor_erfi` branch in `erf(z)`. Despite its label, it is a local
|
|
211
|
+
expansion of `erf(x+i y)` around the imaginary axis, using the real `erfi(y)`
|
|
212
|
+
value as its base term.
|
|
213
|
+
|
|
214
|
+
### Real implementation
|
|
215
|
+
|
|
216
|
+
The optimized real overload is:
|
|
217
|
+
|
|
218
|
+
```cpp
|
|
219
|
+
double FADDEEVA_RE(erfi)(double x)
|
|
220
|
+
{
|
|
221
|
+
return x*x > 720 ? (x > 0 ? Inf : -Inf)
|
|
222
|
+
: exp(x*x) * FADDEEVA(w_im)(x);
|
|
223
|
+
}
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
To derive this formula, begin with the definition of the Faddeeva function
|
|
227
|
+
for real `x`:
|
|
228
|
+
|
|
229
|
+
```text
|
|
230
|
+
w(x) = exp(-x^2) erfc(-i x).
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
Using
|
|
234
|
+
|
|
235
|
+
```text
|
|
236
|
+
erfc(-i x) = 1 - erf(-i x)
|
|
237
|
+
= 1 + i erfi(x),
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
we obtain
|
|
241
|
+
|
|
242
|
+
```text
|
|
243
|
+
w(x) = exp(-x^2) [1 + i erfi(x)].
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
Taking its imaginary part gives
|
|
247
|
+
|
|
248
|
+
```text
|
|
249
|
+
Im(w(x)) = exp(-x^2) erfi(x),
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
and hence
|
|
253
|
+
|
|
254
|
+
```text
|
|
255
|
+
erfi(x) = exp(x^2) Im(w(x))
|
|
256
|
+
= exp(x^2) w_im(x).
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
It is also useful to express this through Dawson's integral:
|
|
260
|
+
|
|
261
|
+
```text
|
|
262
|
+
w_im(x) = 2 D(x) / sqrt(pi)
|
|
263
|
+
erfi(x) = 2 exp(x^2) D(x) / sqrt(pi).
|
|
264
|
+
```
|
|
265
|
+
|
|
266
|
+
This specialized route is faster than constructing a complex value, calling
|
|
267
|
+
complex `erf`, and rotating the answer.
|
|
268
|
+
|
|
269
|
+
The `x*x > 720` check is a coarse explicit overflow guard. The largest finite
|
|
270
|
+
exponential has an exponent near 709.78, and `erfi` itself also eventually
|
|
271
|
+
exceeds the double range, so some smaller inputs can naturally overflow
|
|
272
|
+
during the multiplication. Above the guard, the code directly returns the
|
|
273
|
+
mathematically correct signed infinity. This is especially important for
|
|
274
|
+
infinite or enormous inputs, where evaluating the identity mechanically
|
|
275
|
+
could produce the indeterminate form `Inf * 0` because `w_im(x)` tends to
|
|
276
|
+
zero as `|x|` grows.
|
|
277
|
+
|
|
278
|
+
The result is odd because `w_im` is odd:
|
|
279
|
+
|
|
280
|
+
```text
|
|
281
|
+
erfi(-x) = -erfi(x).
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
### How `w_im(x)` is computed
|
|
285
|
+
|
|
286
|
+
`w_im(x)` is a scaled Dawson integral:
|
|
287
|
+
|
|
288
|
+
```text
|
|
289
|
+
w_im(x) = 2 D(x) / sqrt(pi).
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
The actual numerical work for real `erfi` therefore occurs inside `w_im`.
|
|
293
|
+
It divides the real axis into three numerical regions.
|
|
294
|
+
|
|
295
|
+
1. **Small `|x|`**, approximately `|x| <= 0.0309`:
|
|
296
|
+
|
|
297
|
+
```text
|
|
298
|
+
w_im(x) = 2/sqrt(pi)
|
|
299
|
+
* (x - 2x^3/3 + 4x^5/15 - 8x^7/105 + 16x^9/945).
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
The source evaluates this in Horner form:
|
|
303
|
+
|
|
304
|
+
```cpp
|
|
305
|
+
double x2 = x*x;
|
|
306
|
+
return x * (1.1283791670955125739
|
|
307
|
+
- x2 * (0.75225277806367504925
|
|
308
|
+
- x2 * (0.30090111122547001970
|
|
309
|
+
- x2 * (0.085971746064420005629
|
|
310
|
+
- x2 * 0.016931216931216931217))));
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
Horner form uses fewer multiplications and generally accumulates less
|
|
314
|
+
rounding error. The explicit leading factor `x` naturally preserves odd
|
|
315
|
+
symmetry. Near zero,
|
|
316
|
+
|
|
317
|
+
```text
|
|
318
|
+
w_im(x) ~= 2x/sqrt(pi)
|
|
319
|
+
erfi(x) ~= 2x/sqrt(pi),
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
because `exp(x^2) ~= 1`.
|
|
323
|
+
|
|
324
|
+
2. **Moderate `|x|`**, up to 45:
|
|
325
|
+
|
|
326
|
+
The code maps positive `x` using
|
|
327
|
+
|
|
328
|
+
```text
|
|
329
|
+
y = 1 / (1 + x)
|
|
330
|
+
y100 = 100 y.
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
The call is:
|
|
334
|
+
|
|
335
|
+
```cpp
|
|
336
|
+
return w_im_y100(100/(1+x), x);
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
The integer part of `y100` selects one of 100 intervals in a `switch`.
|
|
340
|
+
Each case maps its local interval approximately to `[-1,1]`, for example:
|
|
341
|
+
|
|
342
|
+
```cpp
|
|
343
|
+
double t = 2*y100 - 1;
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
It then evaluates a low-degree, Chebyshev-derived polynomial in `t`, again
|
|
347
|
+
in Horner form. The large coefficient table in `Faddeeva.cc` is therefore
|
|
348
|
+
a lookup table of fitted polynomials. At runtime the process is simply:
|
|
349
|
+
|
|
350
|
+
```text
|
|
351
|
+
transform x -> select interval -> evaluate polynomial.
|
|
352
|
+
```
|
|
353
|
+
|
|
354
|
+
For a moderate negative input, the code evaluates the positive argument
|
|
355
|
+
and negates it:
|
|
356
|
+
|
|
357
|
+
```cpp
|
|
358
|
+
return -w_im_y100(100/(1-x), -x);
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
This explicitly applies `w_im(-x) = -w_im(x)`.
|
|
362
|
+
|
|
363
|
+
3. **Large `|x|`**, `45 < |x| <= 5e7`:
|
|
364
|
+
|
|
365
|
+
A five-term continued-fraction approximation is simplified into a rational
|
|
366
|
+
expression:
|
|
367
|
+
|
|
368
|
+
```cpp
|
|
369
|
+
return ispi * ((x*x) * (x*x-4.5) + 2)
|
|
370
|
+
/ (x * ((x*x) * (x*x-5) + 3.75));
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
Here `ispi = 1/sqrt(pi)`. This is an algebraically simplified form of
|
|
374
|
+
|
|
375
|
+
```text
|
|
376
|
+
1/sqrt(pi)
|
|
377
|
+
------------------------------------------------
|
|
378
|
+
x - (1/2)/(x - 1/(x - (3/2)/(x - 2/x))).
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
For `|x| > 5e7`, only the leading asymptotic term is needed:
|
|
382
|
+
|
|
383
|
+
```text
|
|
384
|
+
w_im(x) ~= 1 / (sqrt(pi) x).
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
Combining this asymptotic form with the real `erfi` formula gives the expected
|
|
388
|
+
large-argument behavior:
|
|
389
|
+
|
|
390
|
+
```text
|
|
391
|
+
erfi(x) ~= exp(x^2) / (sqrt(pi) x).
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
The complete real execution path is therefore:
|
|
395
|
+
|
|
396
|
+
```text
|
|
397
|
+
erfi(x)
|
|
398
|
+
|
|
|
399
|
+
+-- x^2 > 720? --> return signed infinity
|
|
400
|
+
|
|
|
401
|
+
+-- compute w_im(x)
|
|
402
|
+
| +-- tiny x: Taylor polynomial
|
|
403
|
+
| +-- moderate x: piecewise Chebyshev polynomials
|
|
404
|
+
| +-- large x: continued-fraction rational approximation
|
|
405
|
+
| +-- enormous x: 1/(sqrt(pi) x)
|
|
406
|
+
|
|
|
407
|
+
+-- return exp(x^2) * w_im(x)
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
## Other Exported Functions
|
|
411
|
+
|
|
412
|
+
- `erfcx(z)` is a direct rotation into `w`: `erfcx(z) = w(i z)`.
|
|
413
|
+
- `erf(z)` uses `w` plus symmetry and Taylor expansions to avoid
|
|
414
|
+
cancellation.
|
|
415
|
+
- `erfc(z)` uses direct formulas rather than always computing `1-erf(z)`,
|
|
416
|
+
because that subtraction would lose precision when `erfc(z)` is tiny.
|
|
417
|
+
- `Dawson(z)` uses `w`, axis-specific formulas, continued fractions, and
|
|
418
|
+
Taylor expansions. For real input it is simply
|
|
419
|
+
`sqrt(pi)/2 * w_im(x)`.
|
|
420
|
+
|
|
421
|
+
## Numerical Design Lessons
|
|
422
|
+
|
|
423
|
+
The implementation is less about finding one universal formula and more
|
|
424
|
+
about choosing an equivalent formula that is well-conditioned in each
|
|
425
|
+
region:
|
|
426
|
+
|
|
427
|
+
- use asymptotic continued fractions when arguments are large;
|
|
428
|
+
- use convergent sums or fitted polynomials in the middle range;
|
|
429
|
+
- use Taylor expansions near zeros and cancellation points;
|
|
430
|
+
- exploit oddness, conjugation, rotations, and reflection identities;
|
|
431
|
+
- special-case axes, infinities, NaNs, signed zero, overflow, and underflow;
|
|
432
|
+
- avoid forming huge and tiny intermediate quantities whose final product
|
|
433
|
+
would be representable.
|
|
434
|
+
|
|
435
|
+
For `erfi` specifically, the important implementation chain is:
|
|
436
|
+
|
|
437
|
+
```text
|
|
438
|
+
complex erfi(z)
|
|
439
|
+
-> rotate z to i*z
|
|
440
|
+
-> stabilized complex erf(i*z)
|
|
441
|
+
-> rotate result by -i
|
|
442
|
+
|
|
443
|
+
real erfi(x)
|
|
444
|
+
-> specialized w_im(x)
|
|
445
|
+
-> multiply by exp(x^2)
|
|
446
|
+
-> return signed infinity before overflow becomes indeterminate
|
|
447
|
+
```
|
|
448
|
+
|
|
449
|
+
The complex path maximizes reuse of the robust `erf` implementation, while
|
|
450
|
+
the real path maximizes speed and preserves accuracy through the
|
|
451
|
+
Faddeeva/Dawson relationship.
|