halide 19.0.0__cp38-cp38-win_amd64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- halide/__init__.py +39 -0
- halide/_generator_helpers.py +835 -0
- halide/bin/Halide.dll +0 -0
- halide/bin/adams2019_retrain_cost_model.exe +0 -0
- halide/bin/adams2019_weightsdir_to_weightsfile.exe +0 -0
- halide/bin/anderson2021_retrain_cost_model.exe +0 -0
- halide/bin/anderson2021_weightsdir_to_weightsfile.exe +0 -0
- halide/bin/featurization_to_sample.exe +0 -0
- halide/bin/gengen.exe +0 -0
- halide/bin/get_host_target.exe +0 -0
- halide/halide_.cp38-win_amd64.pyd +0 -0
- halide/imageio.py +60 -0
- halide/include/Halide.h +35293 -0
- halide/include/HalideBuffer.h +2618 -0
- halide/include/HalidePyTorchCudaHelpers.h +64 -0
- halide/include/HalidePyTorchHelpers.h +120 -0
- halide/include/HalideRuntime.h +2221 -0
- halide/include/HalideRuntimeCuda.h +89 -0
- halide/include/HalideRuntimeD3D12Compute.h +91 -0
- halide/include/HalideRuntimeHexagonDma.h +104 -0
- halide/include/HalideRuntimeHexagonHost.h +157 -0
- halide/include/HalideRuntimeMetal.h +112 -0
- halide/include/HalideRuntimeOpenCL.h +119 -0
- halide/include/HalideRuntimeQurt.h +32 -0
- halide/include/HalideRuntimeVulkan.h +137 -0
- halide/include/HalideRuntimeWebGPU.h +44 -0
- halide/lib/Halide.lib +0 -0
- halide/lib/HalidePyStubs.lib +0 -0
- halide/lib/Halide_GenGen.lib +0 -0
- halide/lib/autoschedule_adams2019.dll +0 -0
- halide/lib/autoschedule_anderson2021.dll +0 -0
- halide/lib/autoschedule_li2018.dll +0 -0
- halide/lib/autoschedule_mullapudi2016.dll +0 -0
- halide/lib/cmake/Halide/FindHalide_LLVM.cmake +152 -0
- halide/lib/cmake/Halide/FindV8.cmake +33 -0
- halide/lib/cmake/Halide/Halide-shared-deps.cmake +0 -0
- halide/lib/cmake/Halide/Halide-shared-targets-release.cmake +29 -0
- halide/lib/cmake/Halide/Halide-shared-targets.cmake +154 -0
- halide/lib/cmake/Halide/HalideConfig.cmake +162 -0
- halide/lib/cmake/Halide/HalideConfigVersion.cmake +65 -0
- halide/lib/cmake/HalideHelpers/FindHalide_WebGPU.cmake +27 -0
- halide/lib/cmake/HalideHelpers/Halide-Interfaces-release.cmake +112 -0
- halide/lib/cmake/HalideHelpers/Halide-Interfaces.cmake +236 -0
- halide/lib/cmake/HalideHelpers/HalideGeneratorHelpers.cmake +1056 -0
- halide/lib/cmake/HalideHelpers/HalideHelpersConfig.cmake +28 -0
- halide/lib/cmake/HalideHelpers/HalideHelpersConfigVersion.cmake +54 -0
- halide/lib/cmake/HalideHelpers/HalideTargetHelpers.cmake +99 -0
- halide/lib/cmake/HalideHelpers/MutexCopy.ps1 +31 -0
- halide/lib/cmake/HalideHelpers/TargetExportScript.cmake +55 -0
- halide/lib/cmake/Halide_Python/Halide_Python-targets-release.cmake +29 -0
- halide/lib/cmake/Halide_Python/Halide_Python-targets.cmake +125 -0
- halide/lib/cmake/Halide_Python/Halide_PythonConfig.cmake +26 -0
- halide/lib/cmake/Halide_Python/Halide_PythonConfigVersion.cmake +65 -0
- halide/share/doc/Halide/LICENSE.txt +233 -0
- halide/share/doc/Halide/README.md +439 -0
- halide/share/doc/Halide/doc/BuildingHalideWithCMake.md +626 -0
- halide/share/doc/Halide/doc/CodeStyleCMake.md +393 -0
- halide/share/doc/Halide/doc/FuzzTesting.md +104 -0
- halide/share/doc/Halide/doc/HalideCMakePackage.md +812 -0
- halide/share/doc/Halide/doc/Hexagon.md +73 -0
- halide/share/doc/Halide/doc/Python.md +844 -0
- halide/share/doc/Halide/doc/RunGen.md +283 -0
- halide/share/doc/Halide/doc/Testing.md +125 -0
- halide/share/doc/Halide/doc/Vulkan.md +287 -0
- halide/share/doc/Halide/doc/WebAssembly.md +228 -0
- halide/share/doc/Halide/doc/WebGPU.md +128 -0
- halide/share/tools/RunGen.h +1470 -0
- halide/share/tools/RunGenMain.cpp +642 -0
- halide/share/tools/adams2019_autotune_loop.sh +227 -0
- halide/share/tools/anderson2021_autotune_loop.sh +591 -0
- halide/share/tools/halide_benchmark.h +240 -0
- halide/share/tools/halide_image.h +31 -0
- halide/share/tools/halide_image_info.h +318 -0
- halide/share/tools/halide_image_io.h +2794 -0
- halide/share/tools/halide_malloc_trace.h +102 -0
- halide/share/tools/halide_thread_pool.h +161 -0
- halide/share/tools/halide_trace_config.h +559 -0
- halide-19.0.0.data/data/share/cmake/Halide/HalideConfig.cmake +6 -0
- halide-19.0.0.data/data/share/cmake/Halide/HalideConfigVersion.cmake +65 -0
- halide-19.0.0.data/data/share/cmake/HalideHelpers/HalideHelpersConfig.cmake +6 -0
- halide-19.0.0.data/data/share/cmake/HalideHelpers/HalideHelpersConfigVersion.cmake +54 -0
- halide-19.0.0.dist-info/METADATA +301 -0
- halide-19.0.0.dist-info/RECORD +85 -0
- halide-19.0.0.dist-info/WHEEL +5 -0
- halide-19.0.0.dist-info/licenses/LICENSE.txt +233 -0
@@ -0,0 +1,287 @@
|
|
1
|
+
# Vulkan Support for Halide
|
2
|
+
|
3
|
+
Halide supports the Khronos Vulkan framework as a compute API backend for GPU-like
|
4
|
+
devices, and compiles directly to a binary SPIR-V representation as part of its
|
5
|
+
code generation before submitting it to the Vulkan API. Both JIT and AOT usage
|
6
|
+
are supported via the `vulkan` target flag (e.g. `HL_JIT_TARGET=host-vulkan`).
|
7
|
+
|
8
|
+
Vulkan support is actively under development, and considered *BETA* quality
|
9
|
+
at this stage. Tests are passing, but performance tuning and user testing is needed
|
10
|
+
to identify potential issues before rolling this into production.
|
11
|
+
|
12
|
+
See [below](#current-status) for details.
|
13
|
+
|
14
|
+
# Compiling Halide w/Vulkan Support
|
15
|
+
|
16
|
+
You'll need to configure Halide and enable the cmake option TARGET_VULKAN (which is now ON by default).
|
17
|
+
|
18
|
+
For example, on Linux & OSX:
|
19
|
+
|
20
|
+
```
|
21
|
+
% cmake -G Ninja -DTARGET_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm
|
22
|
+
% cmake --build build --config Release
|
23
|
+
```
|
24
|
+
|
25
|
+
On Windows, you may need to specify the location of the Vulkan SDK if the paths aren't resolved by CMake automatically. For example (assuming the Vulkan SDK is installed in the default path):
|
26
|
+
|
27
|
+
```
|
28
|
+
C:\> cmake -G Ninja -DTARGET_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR=$LLVM_ROOT/lib/cmake/llvm -DVulkan_LIBRARY=C:\VulkanSDK\1.3.231.1\Lib\vulkan-1.lib -DVulkan_INCLUDE_DIR=C:\VulkanSDK\1.3.231.1\Include\vulkan -S . -B build
|
29
|
+
C:\> cmake --build build --config Release
|
30
|
+
|
31
|
+
```
|
32
|
+
|
33
|
+
# Vulkan Runtime Environment:
|
34
|
+
|
35
|
+
Halide has no direct dependency on Vulkan for code-generation, but the runtime
|
36
|
+
requires a working Vulkan environment to run Halide generated code. Any valid
|
37
|
+
Vulkan v1.0+ device driver should work.
|
38
|
+
|
39
|
+
Specifically, you'll need:
|
40
|
+
|
41
|
+
- A vendor specific Vulkan device driver
|
42
|
+
- The generic Vulkan loader library
|
43
|
+
|
44
|
+
For AMD & NVIDIA & Intel devices, download and install the latest graphics driver
|
45
|
+
for your platform. Vulkan support should be included.
|
46
|
+
|
47
|
+
## Windows
|
48
|
+
|
49
|
+
To build Halide AOT generators, you'll need the Vulkan SDK (specifically the Vulkan loader library and headers):
|
50
|
+
https://sdk.lunarg.com/sdk/download/latest/windows/vulkan-sdk.exe
|
51
|
+
|
52
|
+
For Vulkan device drivers, consult the appropriate hardware vendor for your device. A few common ones are listed below.
|
53
|
+
|
54
|
+
- [AMD Vulkan Driver](https://www.amd.com/en/technologies/vulkan)
|
55
|
+
- [NVIDIA Vulkan Driver](https://developer.nvidia.com/vulkan-driver)
|
56
|
+
- [INTEL Vulkan Driver](https://www.intel.com/content/www/us/en/download-center/home.html)
|
57
|
+
|
58
|
+
## Linux
|
59
|
+
|
60
|
+
The Vulkan SDK packages are now being maintained by LunarG. These include the Vulkan Loader library, as well as the Vulkan Tools packages. Instructions for installing these can be found on their [Getting Started Guide](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html).
|
61
|
+
|
62
|
+
Once the SDK has been installed, you need to install the appropriate driver for your device. Proprietary drivers can be installed via 'apt' using PPA's for each vendor. Examples for AMD and NVIDIA are provided below.
|
63
|
+
|
64
|
+
For AMD on Ubuntu v22.04:
|
65
|
+
```
|
66
|
+
$ sudo add-apt-repository ppa:oibaf/graphics-drivers
|
67
|
+
$ sudo apt update
|
68
|
+
$ sudo apt upgrade
|
69
|
+
$ sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-tools
|
70
|
+
```
|
71
|
+
|
72
|
+
For NVIDIA on Ubuntu v22.04:
|
73
|
+
```
|
74
|
+
$ sudo add-apt-repository ppa:graphics-drivers/ppa
|
75
|
+
$ sudo apt update
|
76
|
+
$ sudo apt upgrade
|
77
|
+
# - replace ### with latest driver release (e.g. 515)
|
78
|
+
$ sudo apt install nvidia-driver-### nvidia-settings libvulkan1 vulkan-tools
|
79
|
+
```
|
80
|
+
|
81
|
+
Note that only valid drivers for your system should be installed since there are
|
82
|
+
reports of the Vulkan loader segfaulting just by having a non-supported driver present.
|
83
|
+
Specifically, the seemingly generic `mesa-vulkan-drivers` actually includes the AMD
|
84
|
+
graphics driver, which can cause problems if installed on an NVIDIA-only system.
|
85
|
+
|
86
|
+
## Mac
|
87
|
+
|
88
|
+
You're better off using Halide's Metal backend instead, but it is possible to run
|
89
|
+
Vulkan apps on a Mac via the MoltenVK library:
|
90
|
+
|
91
|
+
- [MoltenVK Project](https://github.com/KhronosGroup/MoltenVK)
|
92
|
+
|
93
|
+
The easiest way to get the necessary dependencies is to use the official MoltenVK SDK
|
94
|
+
installer provided by LunarG:
|
95
|
+
|
96
|
+
- [MoltenVK SDK (Latest Release)](https://sdk.lunarg.com/sdk/download/latest/mac/vulkan-sdk.dmg)
|
97
|
+
|
98
|
+
Alternatively, if you have the [Homebrew](https://brew.sh/) package manager installed
|
99
|
+
for MacOS, you can use it to install the Vulkan Loader and MoltenVK compatibility
|
100
|
+
layer:
|
101
|
+
|
102
|
+
```
|
103
|
+
$ brew install vulkan-loader molten-vk
|
104
|
+
```
|
105
|
+
|
106
|
+
# Testing Your Vulkan Environment
|
107
|
+
|
108
|
+
You can validate that everything is configured correctly by running the `vulkaninfo`
|
109
|
+
app (bundled in the vulkan-utils package) to make sure your device is detected (eg):
|
110
|
+
|
111
|
+
```
|
112
|
+
$ vulkaninfo
|
113
|
+
==========
|
114
|
+
VULKANINFO
|
115
|
+
==========
|
116
|
+
|
117
|
+
Vulkan Instance Version: 1.3.224
|
118
|
+
|
119
|
+
|
120
|
+
Instance Extensions: count = 19
|
121
|
+
===============================
|
122
|
+
...
|
123
|
+
|
124
|
+
Layers: count = 10
|
125
|
+
==================
|
126
|
+
VK_LAYER_KHRONOS_profiles (Khronos Profiles layer) Vulkan version 1.3.224, layer version 1:
|
127
|
+
Layer Extensions: count = 0
|
128
|
+
Devices: count = 1
|
129
|
+
GPU id = 0 (NVIDIA GeForce RTX 3070 Ti)
|
130
|
+
Layer-Device Extensions: count = 1
|
131
|
+
|
132
|
+
...
|
133
|
+
|
134
|
+
```
|
135
|
+
|
136
|
+
Make sure everything looks correct before continuing!
|
137
|
+
|
138
|
+
# Targetting Vulkan
|
139
|
+
|
140
|
+
To generate Halide code for Vulkan, simply add the `vulkan` flag to your target as well as any other optional device specific features you wish to enable for Halide:
|
141
|
+
|
142
|
+
| Target Feature | Description |
|
143
|
+
| -- | -- |
|
144
|
+
| `vulkan` | Enables the vulkan backend |
|
145
|
+
| `vk_int8` | Allows 8-bit integer storage types to be used |
|
146
|
+
| `vk_int16` | Allows 16-bit integer storage types to be used |
|
147
|
+
| `vk_int64` | Allows 64-bit integer storage types to be used |
|
148
|
+
| `vk_float16` | Allows 16-bit floating-point values to be used for computation |
|
149
|
+
| `vk_float64` | Allows 64-bit floating-point values to be used for computation |
|
150
|
+
| `vk_v10` | Generates code compatible with the Vulkan v1.0+ API |
|
151
|
+
| `vk_v12` | Generates code compatible with the Vulkan v1.2+ API |
|
152
|
+
| `vk_v13` | Generates code compatible with the Vulkan v1.3+ API |
|
153
|
+
|
154
|
+
Note that 32-bit integer and floating-point types are always available. All other optional device features are off by default (since they are not required by the Vulkan API, and thus must be explicitly enabled to ensure that the code being generated will be compatible with the device and API version being used for execution).
|
155
|
+
|
156
|
+
For AOT generators add `vulkan` (and any other flags you wish to use) to the target command line option:
|
157
|
+
|
158
|
+
```
|
159
|
+
$ ./lesson_15_generate -g my_first_generator -o . target=host-vulkan-vk_int8-vk_int16
|
160
|
+
```
|
161
|
+
|
162
|
+
For JIT apps use the `HL_JIT_TARGET` environment variable:
|
163
|
+
|
164
|
+
```
|
165
|
+
$ HL_JIT_TARGET=host-vulkan-vk_int8-vk_int16 ./tutorial/lesson_01_basics
|
166
|
+
```
|
167
|
+
|
168
|
+
# Useful Runtime Environment Variables
|
169
|
+
|
170
|
+
To modify the default behavior of the runtime, the following environment
|
171
|
+
variables can be used to adjust the configuration of the Vulkan backend
|
172
|
+
at execution time:
|
173
|
+
|
174
|
+
`HL_VK_LAYERS=...` will tell Halide to choose a suitable Vulkan instance
|
175
|
+
that supports the given list of layers. If not set, `VK_INSTANCE_LAYERS=...`
|
176
|
+
will be used instead. If neither are present, Halide will use the first
|
177
|
+
Vulkan compute device it can find. Multiple layers can be specified using
|
178
|
+
the appropriate environment variable list delimiter (`:` on Linux/OSX/Posix,
|
179
|
+
or `;` on Windows).
|
180
|
+
|
181
|
+
`HL_VK_DEVICE_TYPE=...` will tell Halide to choose which type of device
|
182
|
+
to select for creating the Vulkan instance. Valid options are 'gpu',
|
183
|
+
'discrete-gpu', 'integrated-gpu', 'virtual-gpu', or 'cpu'. If not set,
|
184
|
+
Halide will search for the first 'gpu' like device it can find, or fall back
|
185
|
+
to the first compute device it can find.
|
186
|
+
|
187
|
+
`HL_VK_ALLOC_CONFIG=...` will tell Halide to configure the Vulkan memory
|
188
|
+
allocator use the given constraints specified as 5x integer values
|
189
|
+
separated by the appropriate environment variable list delimiter
|
190
|
+
(e.g. `N:N:N:N:N` on Linux/OSX/Posix, or `N;N;N;N;N` on Windows). These values
|
191
|
+
correspond to `maximum_pool_size`, `minimum_block_size`, `maximum_block_size`,
|
192
|
+
`maximum_block_count` and `nearest_multiple`.
|
193
|
+
|
194
|
+
The `maximum_pool_size` constraint will tell Halide to configure the
|
195
|
+
Vulkan memory allocator to never request more than N megabytes for the
|
196
|
+
entire pool of allocations for the context. This includes all resource
|
197
|
+
blocks used for suballocations. Setting this to a non-zero value will
|
198
|
+
limit the amount device memory used by Halide, which may be useful when
|
199
|
+
other applications and frameworks are competing for resources.
|
200
|
+
Default is 0 ... meaning no limit.
|
201
|
+
|
202
|
+
The `minimum_block_size` constraint will tell Halide to configure the
|
203
|
+
Vulkan memory allocator to always request a minimum of N megabytes for
|
204
|
+
a resource block, which will be used as a pool for suballocations.
|
205
|
+
Increasing this value may improve performance while sacrificing the amount
|
206
|
+
of available device memory. Default is 32MB.
|
207
|
+
|
208
|
+
The `maximum_block_size` constraint will tell Halide to configure the
|
209
|
+
Vulkan memory allocator to never exceed a maximum of N megabytes for a
|
210
|
+
resource block. Decreasing this value may free up more memory but may
|
211
|
+
impact performance, and/or restrict allocations to be unusably small.
|
212
|
+
Default is 0 ... meaning no limit.
|
213
|
+
|
214
|
+
The `maximum_block_count` constraint will tell Halide to configure the
|
215
|
+
Vulkan memory allocator to never exceed a total of N block allocations.
|
216
|
+
Decreasing this value may free up more memory but may impact performance,
|
217
|
+
and/or restrict allocations. Default is 0 ... meaning no limit.
|
218
|
+
|
219
|
+
The `nearest_multiple` constraint will tell Halide to configure the
|
220
|
+
Vulkan memory allocator to always round up the requested allocation sizes
|
221
|
+
to the given integer value. This is useful for architectures that
|
222
|
+
require specific alignments for subregions allocated within a block.
|
223
|
+
Default is 32 ... setting this to zero means no constraint.
|
224
|
+
|
225
|
+
# Debug Environment Variables
|
226
|
+
|
227
|
+
The following environment variables may be useful for tracking down potential
|
228
|
+
issues related to Vulkan:
|
229
|
+
|
230
|
+
`HL_DEBUG_CODEGEN=3` will print out debug info that includees the SPIR-V
|
231
|
+
code generator used for Vulkan while it is compiling.
|
232
|
+
|
233
|
+
`HL_SPIRV_DUMP_FILE=...` specifies a file to dump the binary SPIR-V generated
|
234
|
+
during compilation. Useful for debugging CodeGen issues. Can be inspected,
|
235
|
+
validated and disassembled via the SPIR-V tools:
|
236
|
+
|
237
|
+
https://github.com/KhronosGroup/SPIRV-Tools
|
238
|
+
|
239
|
+
|
240
|
+
In addition to the SPIR-V Tools, you may also wish to install the Khronos Validation
|
241
|
+
Layers which provide an exhaustive suite of runtime checks that can be injected
|
242
|
+
by adding `VK_LAYER_KHRONOS_validation` to the `VK_INSTANCE_LAYERS=` environment
|
243
|
+
variable.
|
244
|
+
|
245
|
+
To install the validation layers and the SPIR-V tools on Ubuntu v22.04:
|
246
|
+
|
247
|
+
```
|
248
|
+
$ sudo apt install vulkan-validationlayers vulkan-validationlayers-dev spirv-tools
|
249
|
+
```
|
250
|
+
|
251
|
+
To test the validation layer, you can prepend your shell command for any Vulkan
|
252
|
+
enabled binary with the appropriate environment settings. For example,
|
253
|
+
you can run one of the JIT-enabled correctness tests w/debug output and validation
|
254
|
+
layers enabled like so:
|
255
|
+
|
256
|
+
```
|
257
|
+
$ VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation HL_JIT_TARGET=host-vulkan-vk_int8-vk_int16-vk_int64-vk_float16-vk_float64-vk_v13-debug ./build/test/correctness/correctness_hello_gpu
|
258
|
+
```
|
259
|
+
|
260
|
+
# Current Status
|
261
|
+
|
262
|
+
All correctness tests are now passing on tested configs for Linux & Windows using the target `host-vulkan-vk_int8-vk_int16-vk_int64-vk_float16-vk_float64-vk_v13` on LLVM v14.x.
|
263
|
+
|
264
|
+
MacOS passes most tests but encounters internal MoltenVK code translation issues for wide vectors, and ambiguous function calls.
|
265
|
+
|
266
|
+
Python apps, tutorials and correctness tests are now passing, but the AOT cases are skipped since the runtime environment needs to be customized to locate the platform specific Vulkan loader library.
|
267
|
+
|
268
|
+
Android platform support is currently being worked on.
|
269
|
+
|
270
|
+
# Caveats:
|
271
|
+
|
272
|
+
- Other than 32-bit floats and integers, every other data type is optional per the Vulkan spec
|
273
|
+
- Float 64-bit types can be enabled, but there aren't any native math functions available in SPIR-V
|
274
|
+
- Only one dynamically sized shared memory allocation can be used, but any number of
|
275
|
+
fixed sized allocation are supported (up to the maximum amount allowed by the device)
|
276
|
+
|
277
|
+
# Known TODO:
|
278
|
+
|
279
|
+
- Performance tuning of CodeGen and Runtime
|
280
|
+
- More platform support (Android is work-in-progress, RISC-V, etc)
|
281
|
+
- Adapt unsupported types to supported types (if missing vk_int8 then promote to uint32_t)?
|
282
|
+
- Better debugging utilities using the Vulkan debug hooks.
|
283
|
+
- Allow debug symbols to be stripped from SPIR-V during codegen to reduce
|
284
|
+
memory overhead for large kernels.
|
285
|
+
- Investigate floating point rounding and precision (v1.3 adds more controls)
|
286
|
+
- Investigate memory model usage (can Halide gain anything from these?)
|
287
|
+
|
@@ -0,0 +1,228 @@
|
|
1
|
+
# WebAssembly Support for Halide
|
2
|
+
|
3
|
+
Halide supports WebAssembly (Wasm) code generation from Halide using the LLVM
|
4
|
+
backend.
|
5
|
+
|
6
|
+
As WebAssembly itself is still under active development, Halide's support has
|
7
|
+
some limitations. Some of the most important:
|
8
|
+
|
9
|
+
- Sign-extension operations are enabled by default (but can be avoided via
|
10
|
+
Target::WasmMvpOnly).
|
11
|
+
- Non-trapping float-to-int conversions are enabled by default (but can be
|
12
|
+
avoided via Target::WasmMvpOnly).
|
13
|
+
- Fixed-width SIMD (128 bit) can be enabled via Target::WasmSimd128.
|
14
|
+
- Threads have very limited support via Target::WasmThreads; see
|
15
|
+
[below](#using-threads) for more details.
|
16
|
+
- Halide's JIT for Wasm is extremely limited and really useful only for
|
17
|
+
internal testing purposes.
|
18
|
+
|
19
|
+
# Additional Tooling Requirements:
|
20
|
+
|
21
|
+
- In additional to the usual install of LLVM and clang, you'll need lld.
|
22
|
+
- Locally-installed version of Emscripten, 1.39.19+
|
23
|
+
|
24
|
+
Note that for all of the above, earlier versions might work, but have not been
|
25
|
+
tested.
|
26
|
+
|
27
|
+
# AOT Limitations
|
28
|
+
|
29
|
+
Halide outputs a Wasm object (.o) or static library (.a) file, much like any
|
30
|
+
other architecture; to use it, of course, you must link it to suitable calling
|
31
|
+
code. Additionally, you must link to something that provides an implementation
|
32
|
+
of `libc`; as a practical matter, this means using the Emscripten tool to do
|
33
|
+
your linking, as it provides the most complete such implementation we're aware
|
34
|
+
of at this time.
|
35
|
+
|
36
|
+
- Halide ahead-of-time tests assume/require that you have Emscripten installed
|
37
|
+
and available on your system, with the `EMSDK` environment variable set
|
38
|
+
properly.
|
39
|
+
|
40
|
+
# JIT Limitations
|
41
|
+
|
42
|
+
It's important to reiterate that the WebAssembly JIT mode is not (and will never
|
43
|
+
be) appropriate for anything other than limited self tests, for a number of
|
44
|
+
reasons:
|
45
|
+
|
46
|
+
- It actually uses an interpreter (from the WABT toolkit
|
47
|
+
[https://github.com/WebAssembly/wabt]) to execute wasm bytecode; not
|
48
|
+
surprisingly, this can be *very* slow.
|
49
|
+
- Wasm effectively runs in a private, 32-bit memory address space; while the
|
50
|
+
host has access to that entire space, the reverse is not true, and thus any
|
51
|
+
`define_extern` calls require copying all `halide_buffer_t` data across the
|
52
|
+
Wasm<->host boundary in both directions. This has severe implications for
|
53
|
+
existing benchmarks, which don't currently attempt to account for this extra
|
54
|
+
overhead. (This could possibly be improved by modeling the Wasm JIT's buffer
|
55
|
+
support as a `device` model that would allow lazy copy-on-demand.)
|
56
|
+
- Host functions used via `define_extern` or `HalideExtern` cannot accept or
|
57
|
+
return values that are pointer types or 64-bit integer types; this includes
|
58
|
+
things like `const char *` and `user_context`. Fixing this is tractable, but
|
59
|
+
is currently omitted as the fix is nontrivial and the tests that are
|
60
|
+
affected are mostly non-critical. (Note that `halide_buffer_t*` is
|
61
|
+
explicitly supported as a special case, however.)
|
62
|
+
- Threading isn't supported at all (yet); all `parallel()` schedules will be
|
63
|
+
run serially.
|
64
|
+
- The `.async()` directive isn't supported at all, not even in
|
65
|
+
serial-emulation mode.
|
66
|
+
- You can't use `Param<void *>` (or any other arbitrary pointer type) with the
|
67
|
+
Wasm jit.
|
68
|
+
- You can't use `Func.debug_to_file()`, `Func.set_custom_do_par_for()`,
|
69
|
+
`Func.set_custom_do_task()`, or `Func.set_custom_allocator()`.
|
70
|
+
- The implementation of `malloc()` used by the JIT is incredibly simpleminded
|
71
|
+
and unsuitable for anything other than the most basic of tests.
|
72
|
+
- GPU usage (or any buffer usage that isn't 100% host-memory) isn't supported
|
73
|
+
at all yet. (This should be doable, just omitted for now.)
|
74
|
+
|
75
|
+
Note that while some of these limitations may be improved in the future, some
|
76
|
+
are effectively intrinsic to the nature of this problem. Realistically, this JIT
|
77
|
+
implementation is intended solely for running Halide self-tests (and even then,
|
78
|
+
a number of them are fundamentally impractical to support in a hosted-Wasm
|
79
|
+
environment and are disabled).
|
80
|
+
|
81
|
+
In sum: don't plan on using Halide JIT mode with Wasm unless you are working on
|
82
|
+
the Halide library itself.
|
83
|
+
|
84
|
+
## Using V8 as the interpreter
|
85
|
+
|
86
|
+
There is experimental support for using V8 as the interpreter in JIT mode, rather than WABT.
|
87
|
+
This is enabled by the CMake command line options `-DWITH_V8=ON -DWITH_WABT=OFF` (only one of them can be used at a time).
|
88
|
+
You must build V8 locally V8, then specify the path to the library and headers as CMake options.
|
89
|
+
This is currently only tested on x86-64-Linux and requires v8 version 9.8.177 as a minimum.
|
90
|
+
|
91
|
+
The canonical instructions to build V8 are at
|
92
|
+
[v8.dev](https://v8.dev/docs/build), and [there are examples for embedding
|
93
|
+
v8](https://v8.dev/docs/embed). The process for Halide is summarized below.
|
94
|
+
|
95
|
+
- Install
|
96
|
+
[`depot_tools`](https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html#_setting_up)
|
97
|
+
- Fetch v8 source code (and install required dependencies):
|
98
|
+
```
|
99
|
+
$ gclient
|
100
|
+
$ mkdir ~/v8 && cd ~/v8
|
101
|
+
$ fetch v8
|
102
|
+
$ cd ~/v8/v8
|
103
|
+
$ git checkout origin/9.8.177
|
104
|
+
```
|
105
|
+
- Create a build configuration: `tools/dev/v8gen.py x64.release.sample`
|
106
|
+
- Turn off pointer compression: `echo 'v8_enable_pointer_compression = false' >> out.gn/x64.release.sample/args.gn`
|
107
|
+
- Disable the GDB-JIT interface (conflicts with LLVM): `echo 'v8_enable_gdbjit = false' >> out.gn/x64.release.sample/args.gn`
|
108
|
+
- Build the static library: `autoninja -C out.gn/x64.release.sample v8_monolith`
|
109
|
+
|
110
|
+
With V8 built, we can pass the CMake options:
|
111
|
+
|
112
|
+
- `V8_INCLUDE_DIR`, path to V8 includes, e.g. `$HOME/v8/v8/include`
|
113
|
+
- `V8_LIBRARY`, path to V8 static library, e.g. `$HOME/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a`
|
114
|
+
|
115
|
+
An example to configure Halide with V8 support, build and run an example test:
|
116
|
+
|
117
|
+
```
|
118
|
+
$ cd /path/to/halide
|
119
|
+
$ export HL_TARGET=wasm-32-wasmrt-wasm_simd128
|
120
|
+
$ export HL_JIT_TARGET=${HL_TARGET}
|
121
|
+
$ cmake -G Ninja \
|
122
|
+
-DWITH_WABT=OFF \
|
123
|
+
-DWITH_V8=ON \
|
124
|
+
-DV8_INCLUDE_DIR=$HOME/v8/v8/include \
|
125
|
+
-DV8_LIBRARY=$HOME/v8/v8/out.gn/x64.release.sample/obj/libv8_monolith.a \
|
126
|
+
-DHalide_TARGET=${HL_TARGET} \
|
127
|
+
/* other cmake settings here as appropriate */
|
128
|
+
|
129
|
+
$ cmake --build .
|
130
|
+
$ ctest -L "correctness|generator" -j
|
131
|
+
```
|
132
|
+
|
133
|
+
|
134
|
+
# To Use Halide For WebAssembly:
|
135
|
+
|
136
|
+
- Ensure WebAssembly is in LLVM_TARGETS_TO_BUILD; if you use the default
|
137
|
+
(`"all"`) then it's already present, but otherwise, add it explicitly:
|
138
|
+
|
139
|
+
```
|
140
|
+
-DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;PowerPC;Hexagon;WebAssembly
|
141
|
+
```
|
142
|
+
|
143
|
+
## Enabling wasm JIT
|
144
|
+
|
145
|
+
If you want to run `test_correctness` and other interesting parts of the Halide
|
146
|
+
test suite (and you almost certainly will), you'll need to ensure that LLVM is
|
147
|
+
built with wasm-ld:
|
148
|
+
|
149
|
+
- Ensure that you have lld in LVM_ENABLE_PROJECTS:
|
150
|
+
|
151
|
+
```
|
152
|
+
cmake -DLLVM_ENABLE_PROJECTS="clang;lld" ...
|
153
|
+
```
|
154
|
+
|
155
|
+
- To run the JIT tests, set `HL_JIT_TARGET=wasm-32-wasmrt` (possibly adding
|
156
|
+
`wasm_simd128`) and run CMake/CTest normally. Note that wasm testing is
|
157
|
+
only supported under CMake (not via Make).
|
158
|
+
|
159
|
+
## Enabling wasm AOT
|
160
|
+
|
161
|
+
If you want to test ahead-of-time code generation (and you almost certainly
|
162
|
+
will), you need to install Emscripten locally.
|
163
|
+
|
164
|
+
- The simplest way to install is probably via the Emscripten emsdk
|
165
|
+
(https://emscripten.org/docs/getting_started/downloads.html).
|
166
|
+
|
167
|
+
- To run the AOT tests, set `HL_TARGET=wasm-32-wasmrt` (possibly adding
|
168
|
+
`wasm_simd128`) and run CMake/CTest normally. Note that wasm testing is
|
169
|
+
only supported under CMake (not via Make).
|
170
|
+
|
171
|
+
# Running benchmarks
|
172
|
+
|
173
|
+
The `test_performance` benchmarks are misleading (and thus useless) for Wasm, as
|
174
|
+
they include JIT overhead as described elsewhere. Suitable benchmarks for Wasm
|
175
|
+
will be provided at a later date. (See
|
176
|
+
https://github.com/halide/Halide/issues/5119 and
|
177
|
+
https://github.com/halide/Halide/issues/5047 to track progress.)
|
178
|
+
|
179
|
+
# Using Threads
|
180
|
+
|
181
|
+
You can use the `wasm_threads` feature to enable use of a normal pthread-based
|
182
|
+
thread pool in Halide code, but with some careful caveats:
|
183
|
+
|
184
|
+
- This requires that you use a wasm runtime environment that provides
|
185
|
+
pthread-compatible wrappers. At this time of this writing, the only
|
186
|
+
environment known to support this well is Emscripten (when using the
|
187
|
+
`-pthread` flag, and compiling for a Web environment). In this
|
188
|
+
configuration, Emscripten goes to great lengths to make WebWorkers available
|
189
|
+
via the pthreads API. (You can see an example of this usage in
|
190
|
+
apps/HelloWasm.) Note that not all wasm runtimes support WebWorkers;
|
191
|
+
generally, you need a full browser environment to make this work (though
|
192
|
+
some versions of some shell tools may also support this, e.g. nodejs).
|
193
|
+
- There is currently no support for using threads in a WASI environment, due
|
194
|
+
to current limitations in the WASI specification. (We hope that this will
|
195
|
+
improve in the future.)
|
196
|
+
- There is no support for using threads in the Halide JIT environment, and no
|
197
|
+
plans to add them anytime in the near-term future.
|
198
|
+
|
199
|
+
# Known Limitations And Caveats
|
200
|
+
|
201
|
+
- Current trunk LLVM (as of July 2020) doesn't reliably generate all of the
|
202
|
+
Wasm SIMD ops that are available; see
|
203
|
+
https://github.com/halide/Halide/issues/5130 for tracking information as
|
204
|
+
these are fixed.
|
205
|
+
- Using the JIT requires that we link the `wasm-ld` tool into libHalide; with
|
206
|
+
some work this need could possibly be eliminated.
|
207
|
+
- OSX and Linux-x64 have been tested. Windows hasn't; it should be supportable
|
208
|
+
with some work. (Patches welcome.)
|
209
|
+
- None of the `apps/` folder has been investigated yet. Many of them should be
|
210
|
+
supportable with some work. (Patches welcome.)
|
211
|
+
- We currently use v8/d8 as a test environment for AOT code; we may want to
|
212
|
+
consider using Node or (better yet) headless Chrome instead (which is
|
213
|
+
probably required to allow for using threads in AOT code).
|
214
|
+
|
215
|
+
# Known TODO:
|
216
|
+
|
217
|
+
- There's some invasive hackiness in Codgen_LLVM to support the JIT
|
218
|
+
trampolines; this really should be refactored to be less hacky.
|
219
|
+
- Can we rework JIT to avoid the need to link in wasm-ld? This might be
|
220
|
+
doable, as the wasm object files produced by the LLVM backend are close
|
221
|
+
enough to an executable form that we could likely make it work with some
|
222
|
+
massaging on our side, but it's not clear whether this would be a bad idea
|
223
|
+
or not (i.e., would it be unreasonably fragile).
|
224
|
+
- Buffer-copying overhead in the JIT could possibly be dramatically improved
|
225
|
+
by modeling the copy as a "device" (i.e. `copy_to_device()` would copy from
|
226
|
+
host -> wasm); this would make the performance benchmarks much more useful.
|
227
|
+
- Can we support threads in the JIT without an unreasonable amount of work?
|
228
|
+
Unknown at this point.
|
@@ -0,0 +1,128 @@
|
|
1
|
+
# WebGPU support for Halide
|
2
|
+
|
3
|
+
Halide has work-in-progress support for generating and running WebGPU shaders.
|
4
|
+
This can be used in conjunction with the WebAssembly backend to bring
|
5
|
+
GPU-accelerated Halide pipelines to the web.
|
6
|
+
|
7
|
+
As the first version of the WebGPU standard is itself still being developed,
|
8
|
+
Halide's support has some limitations and may only work with certain browsers
|
9
|
+
and versions of Emscripten.
|
10
|
+
|
11
|
+
## Known limitations
|
12
|
+
|
13
|
+
The following is a non-comprehensive list of known limitations:
|
14
|
+
|
15
|
+
- Only 32-bit integers and floats have efficient support.
|
16
|
+
* 8-bit and 16-bit integers are implemented using emulation. Future
|
17
|
+
extensions to WGSL will allow them to be implemented more efficiently.
|
18
|
+
* 64-bit integers and floats will likely remain unsupported until WGSL gains
|
19
|
+
extensions to support them.
|
20
|
+
- Wrapping native device buffer handles is not yet implemented.
|
21
|
+
- You must use CMake/CTest to build/test Halide for WebGPU; using the Makefile
|
22
|
+
is not supported for WebGPU testing (and probably never will be).
|
23
|
+
|
24
|
+
In addition to these functional limitations, the performance of the WebGPU
|
25
|
+
backend has not yet been evaluated, and so optimizations in the runtime or
|
26
|
+
device codegen may be required before it becomes profitable to use.
|
27
|
+
|
28
|
+
## Running with WebAssembly via Emscripten: `HL_TARGET=wasm-32-wasmrt-webgpu`
|
29
|
+
|
30
|
+
> _Tested with top-of-tree Emscripten as of 2023-02-23, against Chrome v113._
|
31
|
+
|
32
|
+
Halide can generate WebGPU code that can be integrated with WASM code using
|
33
|
+
Emscripten.
|
34
|
+
|
35
|
+
When invoking `emcc` to link Halide-generated objects, include these flags:
|
36
|
+
`-s USE_WEBGPU=1 -s ASYNCIFY`.
|
37
|
+
|
38
|
+
Tests that use AOT compilation can be run using a native WebGPU implementation
|
39
|
+
that has Node.js bindings, such as [Dawn](https://dawn.googlesource.com/dawn/).
|
40
|
+
You must set an environment variable named `HL_WEBGPU_NODE_BINDINGS` that
|
41
|
+
has an absolute path to the bindings to run these tests, e.g. `HL_WEBGPU_NODE_BINDINGS=/path/to/dawn.node`.
|
42
|
+
|
43
|
+
See [below](#setting-up-dawn) for instructions on building the Dawn Node.js
|
44
|
+
bindings.
|
45
|
+
|
46
|
+
JIT compilation is not supported when using WebGPU with WASM.
|
47
|
+
|
48
|
+
## Running natively: `HL_TARGET=host-webgpu`
|
49
|
+
|
50
|
+
> _Tested with top-of-tree Dawn as of 2023-11-27 [commit b5d38fc7dc2a20081312c95e379c4a918df8b7d4]._
|
51
|
+
|
52
|
+
For testing purposes, Halide can also target native WebGPU libraries, such as
|
53
|
+
[Dawn](https://dawn.googlesource.com/dawn/) or
|
54
|
+
[wgpu](https://github.com/gfx-rs/wgpu).
|
55
|
+
This is currently the only path that can run the JIT correctness tests.
|
56
|
+
See [below](#setting-up-dawn) for instructions on building Dawn.
|
57
|
+
|
58
|
+
> Note that as of 2023-11-27, wgpu is not supported due to
|
59
|
+
> [lacking `override` support for WGSL](https://github.com/gfx-rs/wgpu/issues/1762)
|
60
|
+
> which we require > in order to set GPU block sizes.
|
61
|
+
|
62
|
+
When targeting WebGPU with a native target, Halide defaults to looking for a
|
63
|
+
build of Dawn (with several common names and suffixes); you can override this
|
64
|
+
by setting the `HL_WEBGPU_NATIVE_LIB` environment variable to the absolute path
|
65
|
+
to the library you want.
|
66
|
+
|
67
|
+
Note that it is explicitly legal to define both `HL_WEBGPU_NATIVE_LIB` and
|
68
|
+
`HL_WEBGPU_NODE_BINDINGS` at the same time; the correct executable environment
|
69
|
+
will be selected based on the Halide target specified.
|
70
|
+
|
71
|
+
Note that it is explicitly legal to specify both WEBGPU_NATIVE_LIB and
|
72
|
+
WEBGPU_NODE_BINDINGS for the same build; the correct executable environment
|
73
|
+
will be selected based on the Halide target specified.
|
74
|
+
|
75
|
+
## Setting up Dawn
|
76
|
+
|
77
|
+
Building Dawn's Node.js bindings currently requires using CMake.
|
78
|
+
|
79
|
+
First, [install `depot_tools`](https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html#_setting_up) and add it to the
|
80
|
+
`PATH` environment variable.
|
81
|
+
|
82
|
+
Next, get Dawn and its dependencies:
|
83
|
+
|
84
|
+
# Clone the repo
|
85
|
+
git clone https://dawn.googlesource.com/dawn
|
86
|
+
cd dawn
|
87
|
+
|
88
|
+
# Bootstrap the gclient configuration with Node.js bindings enabled
|
89
|
+
cp scripts/standalone-with-node.gclient .gclient
|
90
|
+
|
91
|
+
# Fetch external dependencies and toolchains with gclient
|
92
|
+
gclient sync
|
93
|
+
|
94
|
+
# Other dependencies that must be installed manually:
|
95
|
+
# - golang
|
96
|
+
|
97
|
+
Finally, build Dawn, enabling both the Node.js bindings and shared libraries:
|
98
|
+
|
99
|
+
mkdir -p <build_dir>
|
100
|
+
cd <build_dir>
|
101
|
+
|
102
|
+
cmake <dawn_root_dir> -G Ninja \
|
103
|
+
-DCMAKE_BUILD_TYPE=Release \
|
104
|
+
-DDAWN_BUILD_NODE_BINDINGS=1 \
|
105
|
+
-DDAWN_ENABLE_PIC=1 \
|
106
|
+
-DBUILD_SHARED_LIBS=ON
|
107
|
+
|
108
|
+
ninja dawn.node webgpu_dawn
|
109
|
+
|
110
|
+
This will produce the following artifacts:
|
111
|
+
- Node.js bindings: `<build_dir>/dawn.node`
|
112
|
+
- Native library: `<build_dir>/src/dawn/native/libwebgpu_dawn.{so,dylib,dll}`
|
113
|
+
|
114
|
+
These paths can then be used for the `HL_WEBGPU_NODE_BINDINGS` and
|
115
|
+
`HL_WEBGPU_NATIVE_LIB` environment variables when using Halide.
|
116
|
+
|
117
|
+
## Updating mini_webgpu.h
|
118
|
+
|
119
|
+
The recommended method for updating `mini_webgpu.h` is to copy the
|
120
|
+
`gen/include/dawn/webgpu.h` file from the Dawn build directory, then:
|
121
|
+
- Restore the `// clang-format {off,on}` lines.
|
122
|
+
- Comment out the `#include <std*>` lines.
|
123
|
+
- Remove the `void` parameter from the `WGPUProc` declaration.
|
124
|
+
|
125
|
+
This guarantees a version of the WebGPU header that is compatible with Dawn.
|
126
|
+
When the native API eventually stabilizes, it should be possible to obtain a
|
127
|
+
header from the `webgpu-native` GitHub organization that will be compatible
|
128
|
+
with Dawn, wgpu, and Emscripten.
|