finufft 2.3.0rc1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. finufft-2.3.0rc1/CHANGELOG +504 -0
  2. finufft-2.3.0rc1/CMakeLists.txt +379 -0
  3. finufft-2.3.0rc1/LICENSE +44 -0
  4. finufft-2.3.0rc1/PKG-INFO +81 -0
  5. finufft-2.3.0rc1/README.md +65 -0
  6. finufft-2.3.0rc1/cmake/CheckAVX.cpp +48 -0
  7. finufft-2.3.0rc1/cmake/setupCPM.cmake +21 -0
  8. finufft-2.3.0rc1/cmake/setupDUCC.cmake +46 -0
  9. finufft-2.3.0rc1/cmake/setupFFTW.cmake +86 -0
  10. finufft-2.3.0rc1/cmake/setupSphinx.cmake +21 -0
  11. finufft-2.3.0rc1/cmake/setupXSIMD.cmake +28 -0
  12. finufft-2.3.0rc1/cmake/utils.cmake +90 -0
  13. finufft-2.3.0rc1/contrib/legendre_rule_fast.cpp +490 -0
  14. finufft-2.3.0rc1/contrib/legendre_rule_fast.h +10 -0
  15. finufft-2.3.0rc1/contrib/legendre_rule_fast.license +8 -0
  16. finufft-2.3.0rc1/include/cufinufft/common.h +70 -0
  17. finufft-2.3.0rc1/include/cufinufft/contrib/helper_cuda.h +147 -0
  18. finufft-2.3.0rc1/include/cufinufft/contrib/ker_horner_allw_loop.inc +205 -0
  19. finufft-2.3.0rc1/include/cufinufft/contrib/ker_lowupsampfac_horner_allw_loop.inc +171 -0
  20. finufft-2.3.0rc1/include/cufinufft/contrib/legendre_rule_fast.h +6 -0
  21. finufft-2.3.0rc1/include/cufinufft/cudeconvolve.h +40 -0
  22. finufft-2.3.0rc1/include/cufinufft/defs.h +36 -0
  23. finufft-2.3.0rc1/include/cufinufft/impl.h +469 -0
  24. finufft-2.3.0rc1/include/cufinufft/memtransfer.h +19 -0
  25. finufft-2.3.0rc1/include/cufinufft/precision_independent.h +70 -0
  26. finufft-2.3.0rc1/include/cufinufft/spreadinterp.h +180 -0
  27. finufft-2.3.0rc1/include/cufinufft/types.h +91 -0
  28. finufft-2.3.0rc1/include/cufinufft/utils.h +152 -0
  29. finufft-2.3.0rc1/include/cufinufft.h +34 -0
  30. finufft-2.3.0rc1/include/cufinufft_opts.h +34 -0
  31. finufft-2.3.0rc1/include/finufft/defs.h +260 -0
  32. finufft-2.3.0rc1/include/finufft/dirft.h +22 -0
  33. finufft-2.3.0rc1/include/finufft/fft.h +18 -0
  34. finufft-2.3.0rc1/include/finufft/fftw_defs.h +48 -0
  35. finufft-2.3.0rc1/include/finufft/spreadinterp.h +63 -0
  36. finufft-2.3.0rc1/include/finufft/test_defs.h +32 -0
  37. finufft-2.3.0rc1/include/finufft/utils.h +25 -0
  38. finufft-2.3.0rc1/include/finufft/utils_precindep.h +44 -0
  39. finufft-2.3.0rc1/include/finufft.fh +11 -0
  40. finufft-2.3.0rc1/include/finufft.h +52 -0
  41. finufft-2.3.0rc1/include/finufft_eitherprec.h +186 -0
  42. finufft-2.3.0rc1/include/finufft_errors.h +26 -0
  43. finufft-2.3.0rc1/include/finufft_mod.f90 +17 -0
  44. finufft-2.3.0rc1/include/finufft_opts.h +57 -0
  45. finufft-2.3.0rc1/include/finufft_spread_opts.h +34 -0
  46. finufft-2.3.0rc1/pyproject.toml +73 -0
  47. finufft-2.3.0rc1/python/CMakeLists.txt +11 -0
  48. finufft-2.3.0rc1/python/finufft/README.md +13 -0
  49. finufft-2.3.0rc1/python/finufft/finufft/__init__.py +20 -0
  50. finufft-2.3.0rc1/python/finufft/finufft/_finufft.py +127 -0
  51. finufft-2.3.0rc1/python/finufft/finufft/_interfaces.py +856 -0
  52. finufft-2.3.0rc1/python/finufft/pyproject.toml +73 -0
  53. finufft-2.3.0rc1/python/finufft/requirements.txt +1 -0
  54. finufft-2.3.0rc1/src/fft.cpp +115 -0
  55. finufft-2.3.0rc1/src/finufft.cpp +1251 -0
  56. finufft-2.3.0rc1/src/ker_horner_allw_loop_constexpr.h +711 -0
  57. finufft-2.3.0rc1/src/ker_lowupsampfac_horner_allw_loop_constexpr.h +576 -0
  58. finufft-2.3.0rc1/src/simpleinterfaces.cpp +291 -0
  59. finufft-2.3.0rc1/src/spreadinterp.cpp +2292 -0
  60. finufft-2.3.0rc1/src/utils.cpp +86 -0
  61. finufft-2.3.0rc1/src/utils_precindep.cpp +90 -0
@@ -0,0 +1,504 @@
1
+ List of features / changes made / release notes, in reverse chronological order.
2
+ If not stated, FINUFFT is assumed (cuFINUFFT <=1.3 is listed separately).
3
+
4
+ V 2.3.0-rc1 (8/6/24)
5
+
6
+ * Switched C++ standards from C++14 to C++17, allowing various templating
7
+ improvements (Barbone).
8
+ * Python build modernized to pyproject.toml (for both CPU and GPU).
9
+ PR 507 (Anden, Lu, Barbone). Compiles from source for the local build.
10
+ * Switchable FFT: either FFTW or DUCC0 (latter needs no plan stage; also it is
11
+ used to exploit sparsity pattern to achieve FFT speedups 1-3x in 2D and 3D).
12
+ PR463, Martin Reinecke. Both CMake and makefile includes this DUCC0 option
13
+ (makefile PR511 by Barnett; CMake by Barbone).
14
+ * ES kernel rescaled to max value 1, reduced poly degrees for upsampfac=1.25,
15
+ cleaner Horner coefficient generation PR499 (fixes fp32 overflow issue #454).
16
+ * Major manual acceleration of spread/interp kernels via XSIMD header-only lib,
17
+ kernel evaluation, templating by ns with AVX-width-dependent decisions.
18
+ Up to 80% faster, dep on compiler. (Marco Barbone with help from Libin Lu).
19
+ A large chunk of work: PRs 459, 471, 502.
20
+ NOTE: introduces new dependency (XSIMD), added to CMake and makefile.
21
+ * Exploiting even/odd symmetry for 10% faster xsimd-accel kernel poly eval
22
+ (Libin Lu based on idea of Martin Reinecke; PR477,492,493).
23
+ * new test/finufft3dkernel_test checks kerevalmeth=0 and 1 agree to tolerance
24
+ PR 473 (M Barbone).
25
+ * new perftest/compare_spreads.jl compares two spreadinterp libs (A Barnett).
26
+ * new benchmarker perftest/spreadtestndall sweeps all kernel widths (M Barbone).
27
+ * cufinufft now supports modeord(type 1,2 only): 0 CMCL-style increasing mode
28
+ order, 1 FFT-style mode order. PR447,446 (Libin Lu, Joakim Anden).
29
+ * New doc page: migration guide from NFFT3 (2d1 case only), Barnett.
30
+ * New foldrescale, removes [-3pi,3pi) restriction on NU points, and slight
31
+ speedup at large tols. Deprecates both opts.chkbnds and error code
32
+ FINUFFT_ERR_SPREAD_PTS_OUT_RANGE. Also inlined kernel eval code (increases
33
+ compile of spreadinterp.cpp to 10s). PR440 Marco Barbone + Martin Reinecke.
34
+ * CPU plan stage allows any # threads, warns if > omp_get_max_threads(); or
35
+ if single-threaded fixes nthr=1 and warns opts.nthreads>1 attempt.
36
+ Sort now respects spread_opts.sort_threads not nthreads. Supercedes PR 431.
37
+ * new docs troubleshooting accuracy limitations due to condition number of the
38
+ NUFFT problem (Barnett).
39
+ * new sanity check on nj and nk (<0 or too big); new err code, tester, doc.
40
+ * MAX_NF increased from 1e11 to 1e12, since machines grow.
41
+ * improved GPU python docs: migration guide; usage from cupy, numba, torch,
42
+ pycuda. Docs for all GPU options. PyPI pkg still at 2.2.0beta.
43
+ * Added a clang-format pre-commit hook to ensure consistent code style.
44
+ Created a .clang-format file to define a style similar to the existing style.
45
+ Applied clang-format to all cmake, C, C++, and CUDA code. Ignored the blame
46
+ using .git-blame-ignore-revs. contributing.md for devs. PR450,455, Barbone.
47
+ * cuFINUFFT interface update: number of nonuniform points M is now a 64-bit int
48
+ as opposed to 32-bit. While this does modify the ABI, most code will just
49
+ need to recompile against the new library as compilers will silently upcast
50
+ any 32-bit integers to 64-bit when calling cufinufft(f)_setpts. Note that
51
+ internally, 32-bit integers are still used, so calling cufinufft with more
52
+ than 2e9 points will fail. This restriction may be lifted in the future.
53
+ * CMake build system revamped completely, using more modern practices (Barbone).
54
+ It now auto-selects compiler flags based on those supported on all OSes, and
55
+ has support for Windows (llvm, msvc), Linux (llvm, gcc) and MacOS (llvm, gcc).
56
+ * CMake added nvcc and msvc optimization flags.
57
+ * sphinx local doc build also using CMake. (Barbone)
58
+ * updated install docs, including for DUCC0 FFT and new python build.
59
+ * updated install docs (Barnett)
60
+ * Major acceleration effort for the GPU library cufinufft (M Barbone, PR488):
61
+ - binsize is now a function of the shared memory available where possible.
62
+ - GM 1D sorts using thrust::sort instead of bin-sort.
63
+ - uses the new normalized Horner coefficients and added support for
64
+ upsampfac=1.25 on GPU, for first time.
65
+ - new compile flags for extra-vectorization, flushing single
66
+ precision denormals to 0 and using fma where possible.
67
+ - using intrinsics (eg FMA) in foldrescale and other places to increase
68
+ performance
69
+ - using SM90 float2 vector atomicAdd where supported
70
+ - make default binsize = 0
71
+ * overide single-output relative error by l2 relative error in exit codes of
72
+ test/finufft?d_test.cpp to reduce CI fails due to random numbers on some
73
+ platforms in single-prec (with DUCC, etc). (Barnett PR516)
74
+
75
+ V 2.2.0 (12/12/23)
76
+
77
+ * MERGE OF CUFINUFFT (GPU CODE) INTO FINUFFT SOURCE TREE:
78
+ - combined cmake build system via FINUFFT_USE_CUDA flag
79
+ - python wrapper for GPU code included
80
+ - GPU documentation (improving on cufinufft) added {install,c,python)_gpu.rst
81
+ - CI includes GPU full test via C++, and python four styles, via Jenkins.
82
+ - common spread_opts.h header; other code not yet made common.
83
+ - GPU interface has been changed (ie broken) to more closely match finufft
84
+ - cufinufft repo is left in legacy state at v1.3.
85
+ - Add support for cuda streams, allowing for concurrent memory transfer and
86
+ execution streams (PR #330)
87
+ [coding lead on this: Robert Blackwell, with help from Joakim Anden]
88
+ * CMake build structure (thanks: Wenda Zhou, Marco Barbone, Libin Lu)
89
+ - Note: the plan is to continue to support GNU makefile and make.inc.* but
90
+ to transition to CMake as the main build system.
91
+ - CI workflow using CMake on 3 OSes, 2 compilers each, PR #382 (Libin Lu)
92
+ * Docs: new tutorial content on iterative inverse NUFFTs; troubleshooting.
93
+ * GitHub-facing badges
94
+ * include/finufft/finufft_eitherprec.h moved up directory to be public (bea316c)
95
+ * interp (for type 2) accel by up to 60% in high-acc 2D/3D, by FMA/SIMD-friendly
96
+ rearranging of loops, by Martin Reinecke, PR #292.
97
+ * remove inv array in binsort; speeds up multithreaded case by up to 50%
98
+ but no effect on single-threaded. Martin Reinecke, PR #291.
99
+ * Fix memleak in repeated setpts (Issue #269); thanks Aaron Shih & Libin Lu.
100
+ * Fortran90 example via a new FINUFFT fortran module, thanks Reinhard Neder.
101
+ * made the C++ plan object (finufft_plan_s) private; only opaque pointer
102
+ remains public, as should be (PR #233). Allows plan to have C++ constructs.
103
+ * fixed single-thread (OMP=OFF) build which failed due to fftw_defs.h/defs.h
104
+ * finally thread-safety for all fftw cases, kill FFTW_PLAN_SAFE (PR 354)
105
+ * Python interface: - better type checking (PR #237).
106
+ - fixing edge cases (singleton dims, issue #359).
107
+ - supports batch dimension of length 1 (issue #367).
108
+ * fix issue where repeated calls of finufft_makeplan with different numbers of
109
+ requested threads would always use the first requested number of threads
110
+
111
+ CUFINUFFT v 1.3 (06/10/23) (Final legacy release outside of FINUFFT repo)
112
+
113
+ * Move second half of onedim_fseries_kernel() to GPU (with a simple heuristic
114
+ basing on nf1 to switch between the CPU and the GPU version).
115
+ * Melody fixed bug in MAX_NF being 0 due to typecasting 1e11 to int (thanks
116
+ Elliot Slaughter for catching that).
117
+ * Melody fixed kernel eval so done w*d not w^d times, speeds up 2d a little, 3d
118
+ quite a lot! (PR#130)
119
+ * Melody added 1D support for both types 1 (GM-sort and SM methods) 2 (GM-sort),
120
+ in C++/CUDA and their test executables (but not Python interface).
121
+ * Various fixes to package config.
122
+ * Miscellaneous bug fixes.
123
+
124
+ V 2.1.0 (6/10/22)
125
+
126
+ * BREAKING INTERFACE CHANGE: nufft_opts is now called finufft_opts.
127
+ This is needed for consistency and fixes a historical problem.
128
+ We have compile-time warning, and backwards-compatibility for now.
129
+ * Professionalized the public-facing interface:
130
+ - safe lib (.so, .a) symbols via hierarchical namespacing of private funcs
131
+ that do not already begin with finufft{f}, in finufft:: namespace.
132
+ This fixes, eg, clash with linking against cufinufft (their Issue #138).
133
+ - public headers (finufft.h) has all macro names safe (ie FINUFFT suffix).
134
+ Headers both public and private rationalized/simplified.
135
+ - private headers are in include/finufft/, so not exposed by -Iinclude
136
+ - spread_opts renamed finufft_spread_opts, since publicly exposed and name
137
+ must respect library naming.
138
+ * change nj and nk in plan to BIGINT (int64_t), new big2d2f perftest, fixing
139
+ Issue #215.
140
+ * PDF manual moved from local to readthedocs.io hosting, Issue #221.
141
+ * Py doc for dtype fixed, Issue #216.
142
+ * spreadinterp evaluate_kernel_vector uses single arith when FLT=single.
143
+ * spread_opts.h fix duplication for FLT=single/double by making FLT->double.
144
+ * examples/simulplans1d1 demos ability to to wield independent plans.
145
+ * sped up float32 1d type 3 by 20% by using float32 cos()... thanks Wenda Zhou.
146
+
147
+ V 2.0.4 (1/13/22)
148
+
149
+ * makefile now appends (not replaces by) environment {C,F,CXX}FLAGS (PR #199).
150
+ * fixed MATLAB Contents.m and guru help strings.
151
+ * fortran examples: avoided clash with keywords "type" and "null", and correct
152
+ creation of null ptr for default opts (issues #195-196, Jiri Kulda).
153
+ * various fixes to python wheels CI.
154
+ * various docs improvements.
155
+ * fixed modeord=1 failure for type 3 even though should never be used anyway
156
+ (issue #194).
157
+ * fixed spreadcheck NaN failure to detect bug introduced in 2.0.3 (9566511).
158
+ * Dan Fortunato found and fixed MATLAB setpts temporary array loss, issue #185.
159
+
160
+ V 2.0.3 (4/22/21)
161
+
162
+ * finufft (plan) now thread-safe via OMP lock (if nthr=1 and -DFFTW_PLAN_SAFE)
163
+ + new example/threadsafe*.cpp demos. Needs FFTW>=3.3.6 (Issues #72 #180 #183)
164
+ * fixed bug in checkbounds that falsely reported NU pt as invalid if exactly 1
165
+ ULP below +pi, for certain N values only, egad! (Issue #181)
166
+ * GH workflows continuous integration (CI) in four setups (linux, osx*2, mingw)
167
+ * fixed memory leak in type 3.
168
+ * corrected C guru execute documentation.
169
+
170
+ CUFINUFFT v 1.2 (02/17/21)
171
+
172
+ * Warning: Following are Python interface changes -- not backwards compatible
173
+ with v 1.1 (See examples/example2d1,2many.py for updated usage)
174
+
175
+ - Made opts a kwarg dict instead of an object:
176
+ def __init__(self, ... , opts=None, dtype=np.float32)
177
+ => def __init__(self, ... , dtype=np.float32, **kwargs)
178
+ - Renamed arguments in plan creation `__init__`:
179
+ ntransforms => n_trans, tol => eps
180
+ - Changed order of arguments in plan creation `__init__`:
181
+ def __init__(self, ... ,isign, eps, ntransforms, opts, dtype)
182
+ => def __init__(self, ... ,ntransforms, eps, isign, opts, dtype)
183
+ - Removed M in `set_pts` arguments:
184
+ def set_pts(self, M, kx, ky=None, kz=None)
185
+ => def set_pts(self, kx, ky=None, kz=None)
186
+
187
+ * Python: added multi-gpu support (in beta)
188
+ * Python: added more unit tests (wrong input, kwarg args, multi-gpu)
189
+ * Fixed various memory leaks
190
+ * Added index bound check in 2D spread kernels (Spread_2d_Subprob(_Horner))
191
+ * Added spread/interp tests to `make check`
192
+ * Fixed user request tolerance (eps) to kernel width (w) calculation
193
+ * Default kernel evaluation method set to 0, ie exp(sqrt()), since faster
194
+ * Removed outdated benchmark codes, cleaner spread/interp tests
195
+
196
+ V 2.0.2 (12/5/20)
197
+
198
+ * fixed spreader segfault in obscure use case: single-precision N1>1e7, where
199
+ rounding error is O(1) anyway. Now uses consistent int(ceil()) grid index.
200
+ * Improved large-thread scaling of type-1 (spreading) via transition from OMP
201
+ critical to atomic add_wrapped_subgrid() operations; thanks Rob Blackwell.
202
+ * Increased heuristic t1 spreader max_subproblem_size, faster in 2D, 3D, and
203
+ allowed this and the above atomic threshold to be controlled as nufft_opts.
204
+ * Removed MAX_USEFUL_NTHREADS from defs.h and all code, for simplicity, since
205
+ large thread number now scales better.
206
+ * multithreaded one-mode accuracy test in C++ tests, t1 & t3, for faster tests.
207
+
208
+ V 2.0.1 (10/6/20)
209
+
210
+ * python (under-the-hood) interfacing changed from pybind11 to cleaner ctypes.
211
+ * non-stochastic test/*.cpp routines, zeroing small chance of incorrect failure
212
+ * Windows compatible makefile
213
+ * mac OSX improved installation instructions and make.inc.*
214
+
215
+ CUFINUFFT v 1.1 (09/22/20)
216
+
217
+ * Python: extended the mode tuple to 3D and reorder from C/python
218
+ ndarray.shape style input (nZ, nY, nX) to to the (F) order expected by the
219
+ low level library (nX, nY, nZ).
220
+ * Added bound checking on the bin size
221
+ * Dual-precision support of spread/interp tests
222
+ * Improved documentation of spread/interp tests
223
+ * Added dummy call of cuFFTPlan1d to avoid timing the constant cost of cuFFT
224
+ library.
225
+ * Added heuristic decision of maximum batch size (number of vectors with the
226
+ same nupts to transform at the same time)
227
+ * Reported execution throughput in the test codes
228
+ * Fixed timing in the tests code
229
+ * Professionalized handling of too-small-eps (requested tolerance)
230
+ * Rewrote README.md and added cuFINUFFT logo.
231
+ * Support of advanced Makefile usage, e.g. make -site=olcf_summit
232
+ * Removed FFTW dependency
233
+
234
+ V 2.0.0 (8/28/20)
235
+
236
+ * major changes to code, internally, and major improvements to operation and
237
+ language interfaces.
238
+
239
+ WARNING!: Here are all the interface compatibility changes from 1.1.2:
240
+ - opts (nufft_opts) is now always passed as a pointer in C++/C, not
241
+ pass-by-reference as in v1.1.2 or earlier.
242
+ - Fortran simple calls are now finufft?d?(..) not finufft?d?_f(..), and
243
+ they add a penultimate opts argument.
244
+ - Python module name is now finufft not finufftpy, and the interface has
245
+ been completely changed (allowing major improvements, see below).
246
+ - ier=1 is now a warning not an error; this indicates requested tol
247
+ was too small, but that a transform *was* done at the best possible
248
+ accuracy.
249
+ - opts.fftw directly controls the FFTW plan mode consistently in all
250
+ language interfaces (thus changing the meaning of fftw=0 in MATLAB).
251
+ - Octave now needs version >= 4.4, since OO features used by guru.
252
+
253
+ These changes were deemed necessary to rationalize and improve FINUFFT
254
+ for the long term.
255
+ There are also many other new interface options (many-vector, guru)
256
+ added; see docs.
257
+ * the C++ library is now dual-precision, with distinct function interfaces for
258
+ double vs single precision operation, that are C and C++ compatible. Under
259
+ the hood this is achieved via simple C macros. All language interfaces now
260
+ have dual precision options.
261
+ * completely new (although backward compatible) MATLAB/octave interface,
262
+ including object-style wrapper around the guru interface, dual precision.
263
+ * completely new Fortran interface, allowing >2^31 sized (int64) arrays,
264
+ all simple, many-vector and guru interface, with full options control,
265
+ and dual precisions.
266
+ * all simple and many-vector interfaces now call guru interface, for much
267
+ better maintainability and less code repetition.
268
+ * new guru interface, by Andrea Malleo and Alex Barnett, allowing easier
269
+ language wrapping and control of point-setting, reuse of sorting and FFTW
270
+ plans. This finally bypasses the 0.1ms/thread cost of FFTW looking up previous
271
+ wisdom, which slowed down performance for many small problems.
272
+ * removed obsolete -DNEED_EXTERN_C flag.
273
+ * major rewrite of documentation, plus tutorial application examples in MATLAB.
274
+ * numdiff dependency is removed for pass-fail library validation.
275
+ * new (professional!) logo for FINUFFT. Sphinx HTML and PDF aesthetics.
276
+
277
+ CUFINUFFT v 1.0 (07/29/20)
278
+ * Started by Melody Shih.
279
+
280
+ V 1.1.2 (1/31/20)
281
+
282
+ * Ludvig's padding of Horner loop to w=4n, speeds up kernel, esp for GCC5.4.
283
+ * Bansal's Mingw32 python patches.
284
+
285
+ V 1.1.1 (11/2/18)
286
+
287
+ * Mac OSX installation on clang and gcc-8, clearer install docs.
288
+ * LIBSOMP split off in makefile.
289
+ * printf(...%lld..) w/ long long typecast
290
+ * new basic passfail tester
291
+ * precompiled binaries
292
+
293
+ V 1.1 (9/24/18)
294
+
295
+ * NOTE TO USERS: changed interface for setting default opts in C++ and C, from
296
+ pass by reference to pass by value of a pointer (see docs/). Unifies C++/C
297
+ interfaces in a clean way.
298
+ * fftw3_omp instead of fftw3_threads (on linux), is faster.
299
+ * rationalized header files.
300
+
301
+ V 1.0.1 (9/14/18)
302
+
303
+ * Ludvig's removal of omp chunksize in dir=2, another 20%+ speedup.
304
+ * Matlab doesn't change omp internal state.
305
+
306
+ V 1.0 (8/20/18)
307
+
308
+ * repo transferred to flatironinstitute
309
+ * usage doc simpler
310
+ * 2d1many and 2d2many interfaces by Melody Shih, for multiple vectors with same
311
+ nonuniform points. All tests and docs for these interfaces.
312
+ * horner optimized kernel for sigma=5/4 (low upsampling), to go along with the
313
+ default sigma=2. Cmdline arg to change sigma in finufft?d_test.
314
+ * simplified various int types: only BIGINT remains.
315
+ * clearer docs.
316
+ * remaining C interfaces, with opts control.
317
+
318
+ V 0.99 (4/24/18)
319
+
320
+ * piecewise polynomial kernel evaluation by Horner, for faster spreading esp at
321
+ low accuracy and 1d or 2d.
322
+ * various heuristic decisions re whether to sort, and if sorting is single or
323
+ multi-threaded.
324
+ * single-precision libs get an "f" suffix so can coexist with double-prec.
325
+
326
+ V 0.98 (3/1/18)
327
+
328
+ * makefile includes make.inc for OS-specific defs.
329
+ * decided that, since Legendre nodes code of GLR alg by Hale/Burkhardt is LGPL
330
+ licensed, our use (not changing source) is not a "derived work", therefore
331
+ ok under our Apache v2 license. See:
332
+ https://tldrlegal.com/license/gnu-lesser-general-public-license-v3-(lgpl-3)
333
+ https://www.apache.org/licenses/GPL-compatibility.html
334
+ https://softwareengineering.stackexchange.com/questions/233571/
335
+ open-source-what-is-the-definition-of-derivative-work-and-how-does-it-impact
336
+ * fixed MATLAB FFTW incompat alloc crash, by hack of Joakim, calling fft()
337
+ first.
338
+ * python tests fixed, brought into makefile.
339
+ * brought in af Klinteberg spreader optimizations & SSE tricks.
340
+ * logo
341
+
342
+ V 0.97 (12/6/17)
343
+
344
+ * tidied all docs -> readthedocs.io host. README.md now a stub. TODO tidied.
345
+ * made sort=1 in tests for xeon (used to be 0)
346
+ * removed mcwrap and python dirs
347
+ * changed name of py routines to nufft* from finufft*
348
+ * python interfaces doc, up-to-date. Removed ms,.. from type-2 interfaces.
349
+ * removed RESCALEs from lower dims in bin_sort, speeds up a few % in 1D.
350
+ * allowed NU pts to be currectly folded from +-1 periods from central box, as
351
+ per David Stein request. Adds 5% to time at 1e-2 accuracy, less at higher acc.
352
+ * corrected dynamic C++ array allocs in spreader (some made static, 5% speedup)
353
+ * removed all C++11 dependencies, mainly that opts structs are all explicitly
354
+ initialized.
355
+ * fixed python interface to have chkbnds.
356
+ * tidied MEX interface
357
+ * removed memory leaks (!)
358
+ * opts.modeord implemented and exposed to matlab/python interfaces. Also removes
359
+ looping backwards in RAM in deconvolveshuffle.
360
+
361
+ V 0.96 (10/15/17)
362
+
363
+ * apache v2 license, exposed flags in python interface.
364
+
365
+ V 0.95 (10/2/17)
366
+
367
+ * brought in JFM's in-package python wrapper & doc, create lib-static dir,
368
+ removed devel dir.
369
+
370
+ V 0.9: (6/17/17)
371
+
372
+ * adapted adv4 into main code, inner loops separate by dim, kill
373
+ the current spreader. Incorporate old ideas such as: checkerboard
374
+ per-thread grid cuboids, compare speed in 2d and 3d against
375
+ current 1d slicing. See cnufftspread:set_thread_index_box()
376
+ * added FFTW_MEAS vs FFTW_EST (set as default) opts flag in nufft_opts, and
377
+ matlab/python interfaces
378
+ * removed opts.maxnalloc in favor of #defined MAX_NF
379
+ * fixed the 1-target case in type-3, all dims, to avoid nan; clarified logic
380
+ for handling X=0 and/or S=0. 6/12/17
381
+ * changed arraywidcen to snap to C=0 if relative shift is <0.1, avoids cexps in
382
+ type-3.
383
+ * t3: if C1 < X1/10 and D1 < S1/10 then don't rephase. Same for d=2,3.
384
+ * removed the 1/M type-1 prefactor, also in all test routines. 6/6/17
385
+ * removed timestamp-based make decision whether to rebuild matlab/finufft.cpp,
386
+ since git clone creates files with random timestamp order!
387
+ * theory work on exp(sqrt) being close to PSWF. Analysis.
388
+ * fix issue w/ needing mwrap when do make matlab.
389
+ * makefile has variables customizing openmp and precision, non-omp tested
390
+ * fortran single-prec demos (required all direct ft's in single prec too!)
391
+ * examples changed to err rel to max F.
392
+ * matlab interface control of opts.spread_sort.
393
+ * matlab interface using doubles as big ints w/ correct typecasting.
394
+ * twopispread removed, used flag in spread_opts for [-pi,pi] input instead.
395
+ * testfinufft* use same integer type INT as for interfaces, typecast all %ld in
396
+ printf warnings, use omp rand array filling
397
+ * INT64 for necessary size-setting arrays, removed all %lf printf warnings in
398
+ finufft*
399
+ * all internal array indexing is BIGINT, switchable from long long to int via
400
+ SMALLINT compile flag (default off, see utils.h)
401
+ * all integers in interfaces are type INT, default 64-bit, switchable to 32 bit
402
+ by INTERGER32 compile flag (see utils.h)
403
+ * test big probs (speed, crashing) and decide if BIGINT is long long or int?
404
+ slows any array access down, or spreading? allows I/O sizes
405
+ (M, N1*N2*N3) > 2^31. Note June-Yub int*8 in nufft-1.3.x slowed things by
406
+ factor 2-3.
407
+ * tidy up spreader to be BIGINT = long long compatible and test > 2^31.
408
+ * spreadtest parallel rand()
409
+ * sort flag passed to spreader via finufft, and test scripts check if Xeon
410
+ (-> sort=0)
411
+ * opts in the manual
412
+ * removed all xk2, dNU2 sorted arrays, and not-needed dims y,z; halved RAM usage
413
+
414
+ V 0.8: (3/27/17)
415
+
416
+ * bnderr checking done in dir=1,2 main loops, not before.
417
+ * all kx2, dNU2 arrays removed, just done by permutation index when needed.
418
+ * MAC OSX test, makefile, instructions.
419
+ * matlab wrappers in 3D
420
+ * matlab wrappers, mcwrap issue w/ openmp, mex, and subdirs. Ship mex
421
+ executables for linux. Link to .a
422
+ * matlab wrappers need ier output? yes, and internal omp numthreads control
423
+ (since matlab's is poor)
424
+ * wrappers for MEX octave, instructions. Ship .mex for octave.
425
+ * python wrappers - Dan Foreman-Mackey starting to add something similar to
426
+ https://github.com/dfm/python-nufft
427
+ * check is done before attempting to malloc ridiculous array sizes, eg if a
428
+ large product of x width and k width is requested in type 3 transforms.
429
+ * draft make python
430
+ * basic manual (txt)
431
+
432
+ V. 0.7:
433
+
434
+ * build static & shared lib
435
+ * fixed bug when Nth>Ntop
436
+ * fortran drivers use dynamic malloc to prevent stack segfaults that CMCL had
437
+ * bugs found in fortran drivers, removed
438
+ * split out devel text files (TODO, etc)
439
+ * made pass-fail test script counting crashes and numdiff fails.
440
+ * finufft?d_test have a no-timings option, and exit with ier.
441
+ * global error codes
442
+ * made finufft routines & testers return error codes rather than exit().
443
+ * dumbinput test executable
444
+ * found nan returned error for nj=0 in type-1, fixed so returns the zero array.
445
+ * fixed type 2 to not segfault when ms,mt, or mu=0, doing dir=2 0-padding right
446
+ * array utils use pointers to make which vars they write to explicit.
447
+ * don't do final type-3 rephase if C1 nan or 0.
448
+ * finished all dumbinputs, all dims
449
+ * fortran compilation fixed
450
+ * makefile self-documents
451
+ * nf1 (etc) size check before alloc, exit gracefully if exceeds RAM
452
+ * integrate into nufft_comparison, esp vs NFFT - jfm did
453
+ * simple examples, simpler than the test drivers
454
+ * fortran link via gfortran, better fortran docs
455
+ * boilerplate stuff as in CMCL page
456
+
457
+ pre-V. 0.7: (Jan-Feb 2017)
458
+
459
+ * efficient modulo in spreader, done by conditionals
460
+ * removed data-zeroing bug in t-II spreader, slowness of large arrays in t-I.
461
+ * clean dir tree
462
+ * spreader dir=1,2 math tests in 3d, then nd.
463
+ * Jeremy's request re only computing kernel vals needed (actually
464
+ was vital for efficiency in dir=1 openmp version), Ie fix KB kereval in
465
+ spreader so doesn't wdo 3d fill when 1 or 2 will do.
466
+ * spreader removed modulo altogether in favor of ifs
467
+ * OpenMP spreader, all dims
468
+ * multidim spreader test, command line args and bash driver
469
+ * cnufft->finufft names, except spreader still called cnufft
470
+ * make ier report accuracy out of range, malloc size errors, etc
471
+ * moved wrappers to own directories so the basic lib is clean
472
+ * fortran wrapper added ier argument
473
+ * types 1,2 in all dims, using 1d kernel for all dims.
474
+ * fix twopispread so doesn't create dummy ky,kz, and fix sort so doesn't ever
475
+ access unused ky,kz dims.
476
+ * cleaner spread and nufft test scripts
477
+ * build universal ndim Fourier coeff copiers in C and use for finufft
478
+ * makefile opts and compiler directives to link against FFTW.
479
+ * t-I, t-II convergence params test: R=M/N and KB params
480
+ * overall scale factor understand in KB
481
+ * check J's bessel10 approx is ok. - became irrelevant
482
+ * meas speed of I_0 for KB kernel eval - became irrelevant
483
+ * understand origin of dfftpack (netlib fftpack is real*4) - not needed
484
+ * [spreader: make compute_sort_indices sensible for 1d and 2d. not needed]
485
+ * next235even for nf's
486
+ * switched pre/post-amp correction from DFT of kernel to F series (FT) of
487
+ kernel, more accurate
488
+ * Gauss-Legendre quadrature for direct eval of kernel FT, openmp since cexp slow
489
+ * optimize q (# G-L nodes) for kernel FT eval on reg and irreg grids
490
+ (common.cpp). Needs q a bit bigger than like (2-3x the PTR, when 1.57x is
491
+ expected). Why?
492
+ * type 3 segfault in dumb case of nj=1 (SX product = 0). By keeping gam>1/S
493
+ * optimize that phi(z) kernel support is only +-(nspread-1)/2, so w/ prob 1 you
494
+ only use nspread-1 pts in the support. Could gain several % speed for same acc
495
+ * new simpler kernel entirely
496
+ * cleaned up set_nf calls and removed params from within core libs
497
+ * test isign=-1 works
498
+ * type 3 in 2d, 3d
499
+ * style: headers should only include other headers needed to compile the .h;
500
+ all other headers go in .cpp, even if that involves repetition I guess.
501
+ * changed library interface and twopispread to dcomplex
502
+ * fortran wrappers (rmdir greengard_work, merge needed into fortran)
503
+
504
+ FINUFFT Started: mid-January 2017, building on nufft_comparison of J. Magland.