PyPI - torchbvh - Versions diffs - 0.1.0__tar.gz - Mend

torchbvh 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (68) hide show

torchbvh-0.1.0/LICENSE +21 -0
torchbvh-0.1.0/MANIFEST.in +14 -0
torchbvh-0.1.0/PKG-INFO +104 -0
torchbvh-0.1.0/README.md +79 -0
torchbvh-0.1.0/docs/algorithms.md +121 -0
torchbvh-0.1.0/docs/api_reference.md +458 -0
torchbvh-0.1.0/docs/examples.md +17 -0
torchbvh-0.1.0/docs/index.md +71 -0
torchbvh-0.1.0/docs/lifecycle_and_gradients.md +133 -0
torchbvh-0.1.0/docs/performance.md +65 -0
torchbvh-0.1.0/docs/testing.md +511 -0
torchbvh-0.1.0/docs/user_guide.md +155 -0
torchbvh-0.1.0/examples/basic_bvh_knn.ipynb +63 -0
torchbvh-0.1.0/examples/batched_displaced_query.ipynb +103 -0
torchbvh-0.1.0/examples/fps_downsampling_geometry.ipynb +132 -0
torchbvh-0.1.0/examples/mls_interpolation.ipynb +136 -0
torchbvh-0.1.0/pyproject.toml +7 -0
torchbvh-0.1.0/setup.cfg +4 -0
torchbvh-0.1.0/setup.py +90 -0
torchbvh-0.1.0/tests/test_batched_interpolate.py +163 -0
torchbvh-0.1.0/tests/test_batched_knn.py +325 -0
torchbvh-0.1.0/tests/test_batched_python_api.py +124 -0
torchbvh-0.1.0/tests/test_benchmark_flowers_grid_sample.py +87 -0
torchbvh-0.1.0/tests/test_benchmark_knn.py +536 -0
torchbvh-0.1.0/tests/test_bvh_build.py +320 -0
torchbvh-0.1.0/tests/test_bvh_class.py +298 -0
torchbvh-0.1.0/tests/test_displaced_query.py +395 -0
torchbvh-0.1.0/tests/test_fps.py +741 -0
torchbvh-0.1.0/tests/test_implicit_tree.py +123 -0
torchbvh-0.1.0/tests/test_knn.py +410 -0
torchbvh-0.1.0/tests/test_mls_interpolate.py +147 -0
torchbvh-0.1.0/tests/test_morton.py +151 -0
torchbvh-0.1.0/tests/test_noncontiguous_public_api.py +242 -0
torchbvh-0.1.0/tests/test_python_api.py +216 -0
torchbvh-0.1.0/tests/test_ragged_knn.py +408 -0
torchbvh-0.1.0/tests/test_ragged_python_api.py +205 -0
torchbvh-0.1.0/tests/test_smoke.py +12 -0
torchbvh-0.1.0/tests/test_stage7_gradient_boundaries.py +139 -0
torchbvh-0.1.0/tests/test_training_step_proxy.py +54 -0
torchbvh-0.1.0/torchbvh/__init__.py +76 -0
torchbvh-0.1.0/torchbvh/_bvh_class.py +125 -0
torchbvh-0.1.0/torchbvh/_constants.py +5 -0
torchbvh-0.1.0/torchbvh/_fps.py +487 -0
torchbvh-0.1.0/torchbvh/_handles.py +144 -0
torchbvh-0.1.0/torchbvh/_mls.py +644 -0
torchbvh-0.1.0/torchbvh/_multihead.py +190 -0
torchbvh-0.1.0/torchbvh/_prototypes/__init__.py +5 -0
torchbvh-0.1.0/torchbvh/_query.py +396 -0
torchbvh-0.1.0/torchbvh/_reorder.py +97 -0
torchbvh-0.1.0/torchbvh/_tree.py +66 -0
torchbvh-0.1.0/torchbvh/_validation.py +96 -0
torchbvh-0.1.0/torchbvh/csrc/bindings.cpp +644 -0
torchbvh-0.1.0/torchbvh/csrc/bvh_build.cu +554 -0
torchbvh-0.1.0/torchbvh/csrc/fps_sample.cu +2233 -0
torchbvh-0.1.0/torchbvh/csrc/geometry.cuh +37 -0
torchbvh-0.1.0/torchbvh/csrc/implicit_tree.cuh +131 -0
torchbvh-0.1.0/torchbvh/csrc/knn_query.cu +935 -0
torchbvh-0.1.0/torchbvh/csrc/mls_fused.cu +739 -0
torchbvh-0.1.0/torchbvh/csrc/morton.cuh +71 -0
torchbvh-0.1.0/torchbvh/csrc/morton_sort.cu +195 -0
torchbvh-0.1.0/torchbvh/csrc/smoke.cu +51 -0
torchbvh-0.1.0/torchbvh/ops.py +98 -0
torchbvh-0.1.0/torchbvh.egg-info/PKG-INFO +104 -0
torchbvh-0.1.0/torchbvh.egg-info/SOURCES.txt +73 -0
torchbvh-0.1.0/torchbvh.egg-info/dependency_links.txt +1 -0
torchbvh-0.1.0/torchbvh.egg-info/not-zip-safe +1 -0
torchbvh-0.1.0/torchbvh.egg-info/requires.txt +1 -0
torchbvh-0.1.0/torchbvh.egg-info/top_level.txt +1 -0

torchbvh-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Roberto Hart-Villamil
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

torchbvh-0.1.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,14 @@
+include README.md
+include LICENSE
+include pyproject.toml
+recursive-include torchbvh/csrc *.cpp *.cu *.cuh
+recursive-include docs *.md
+recursive-include examples *.ipynb
+global-exclude __pycache__
+global-exclude *.py[cod]
+global-exclude *.pyd
+global-exclude *.so
+global-exclude *.dll
+global-exclude *.obj
+global-exclude *.exp
+global-exclude *.lib

torchbvh-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,104 @@
+Metadata-Version: 2.1
+Name: torchbvh
+Version: 0.1.0
+Summary: GPU-native BVH, k-NN, MLS interpolation, and FPS primitives for PyTorch.
+Home-page: https://github.com/Robh96/torchbvh
+License: MIT
+Project-URL: Documentation, https://torchbvh.readthedocs.io/
+Project-URL: Source, https://github.com/Robh96/torchbvh
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Topic :: Scientific/Engineering :: Mathematics
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: torch>=2.0
+# torchbvh
+GPU-native geometry primitives for PyTorch point-cloud workflows: BVH construction,
+exact k-NN search, MLS interpolation, displaced-query helpers, and FPS downsampling.
+## Performance
+k-NN at N=10k, 3D, k=8 (RTX 3500 Ada, uniform distribution):
+| | Build + query |
+|---|---|
+| `scipy_cKD-Tree` CPU | ~23 ms |
+| `torch_cluster` GPU | ~6.8 ms |
+| `cupy_knn` GPU | ~4.1 ms |
+| `torchbvh` GPU | ~1.4 ms |
+FPS at B=16, N=10k, 25% selection (RTX 3500 Ada):
+| | Time |
+|---|---|
+| `fpsample` CPU | ~833 ms |
+| `torch_fpsample` h=7 (CPU, fastest setting) | ~37 ms |
+| `torchbvh` GPU | ~21 ms |
+See `benchmarks/third_party_algorithm_comparison.ipynb` for more detailed comparisons.
+## Install
+```bash
+pip install torchbvh
+```
+`torchbvh` builds a PyTorch CUDA extension. Source installs require PyTorch, a
+compatible CUDA toolkit/NVCC, and a supported host compiler in the build environment.
+## Docs
+Documentation can be found at [torchbvh.readthedocs.io](https://torchbvh.readthedocs.io/).
+## Quickstart
+```python
+import torch
+import torchbvh as tb
+points = torch.randn(1024, 3, device="cuda")
+bvh = tb.BVH(points)
+# k-NN
+idx, dists = bvh.knn(points, k=8)         # (N,8) int64, (N,8) float32
+# MLS interpolation — gradients flow through features
+feat = torch.randn(1024, 16, device="cuda", requires_grad=True)
+out = bvh.interpolate(points, feat, k=8)  # (N, 16)
+# FPS downsampling geometry
+fps = tb.fps(points, target_tokens=256)
+# fps.indices, fps.points, fps.nearest_anchor, fps.anchor_radius, ...
+# Batched: pass (B, N, D) → returns (B, N, k)
+```
+Supports `D in {2, 3}`, `k in {4, 8, 16}`, CUDA float32 contiguous inputs.
+## References
+`torchbvh` builds an implicit bounding volume hierarchy over 2-D or 3-D points.
+The BVH layout follows the ostensibly-implicit tree formulation of Chitalu, Dubach, and
+Komura, and the Python/CUDA implementation was ported from the Julia
+`ImplicitBVH.jl` implementation.
+- Floyd M. Chitalu, Christophe Dubach, and Taku Komura. "Binary
+  Ostensibly-Implicit Trees for Fast Collision Detection." Computer Graphics Forum,
+  39(2), 509-521, 2020. DOI:
+  [10.1111/cgf.13948](https://doi.org/10.1111/cgf.13948).
+- `ImplicitBVH.jl`, StellaOrg. Julia implementation of the implicitly indexed BVH
+  formulation from which the `torchbvh` BVH code was ported:
+  [github.com/StellaOrg/ImplicitBVH.jl](https://github.com/StellaOrg/ImplicitBVH.jl).

torchbvh-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,79 @@
+# torchbvh
+GPU-native geometry primitives for PyTorch point-cloud workflows: BVH construction,
+exact k-NN search, MLS interpolation, displaced-query helpers, and FPS downsampling.
+## Performance
+k-NN at N=10k, 3D, k=8 (RTX 3500 Ada, uniform distribution):
+| | Build + query |
+|---|---|
+| `scipy_cKD-Tree` CPU | ~23 ms |
+| `torch_cluster` GPU | ~6.8 ms |
+| `cupy_knn` GPU | ~4.1 ms |
+| `torchbvh` GPU | ~1.4 ms |
+FPS at B=16, N=10k, 25% selection (RTX 3500 Ada):
+| | Time |
+|---|---|
+| `fpsample` CPU | ~833 ms |
+| `torch_fpsample` h=7 (CPU, fastest setting) | ~37 ms |
+| `torchbvh` GPU | ~21 ms |
+See `benchmarks/third_party_algorithm_comparison.ipynb` for more detailed comparisons.
+## Install
+```bash
+pip install torchbvh
+```
+`torchbvh` builds a PyTorch CUDA extension. Source installs require PyTorch, a
+compatible CUDA toolkit/NVCC, and a supported host compiler in the build environment.
+## Docs
+Documentation can be found at [torchbvh.readthedocs.io](https://torchbvh.readthedocs.io/).
+## Quickstart
+```python
+import torch
+import torchbvh as tb
+points = torch.randn(1024, 3, device="cuda")
+bvh = tb.BVH(points)
+# k-NN
+idx, dists = bvh.knn(points, k=8)         # (N,8) int64, (N,8) float32
+# MLS interpolation — gradients flow through features
+feat = torch.randn(1024, 16, device="cuda", requires_grad=True)
+out = bvh.interpolate(points, feat, k=8)  # (N, 16)
+# FPS downsampling geometry
+fps = tb.fps(points, target_tokens=256)
+# fps.indices, fps.points, fps.nearest_anchor, fps.anchor_radius, ...
+# Batched: pass (B, N, D) → returns (B, N, k)
+```
+Supports `D in {2, 3}`, `k in {4, 8, 16}`, CUDA float32 contiguous inputs.
+## References
+`torchbvh` builds an implicit bounding volume hierarchy over 2-D or 3-D points.
+The BVH layout follows the ostensibly-implicit tree formulation of Chitalu, Dubach, and
+Komura, and the Python/CUDA implementation was ported from the Julia
+`ImplicitBVH.jl` implementation.
+- Floyd M. Chitalu, Christophe Dubach, and Taku Komura. "Binary
+  Ostensibly-Implicit Trees for Fast Collision Detection." Computer Graphics Forum,
+  39(2), 509-521, 2020. DOI:
+  [10.1111/cgf.13948](https://doi.org/10.1111/cgf.13948).
+- `ImplicitBVH.jl`, StellaOrg. Julia implementation of the implicitly indexed BVH
+  formulation from which the `torchbvh` BVH code was ported:
+  [github.com/StellaOrg/ImplicitBVH.jl](https://github.com/StellaOrg/ImplicitBVH.jl).

torchbvh-0.1.0/docs/algorithms.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Algorithms
+This page describes the public algorithms behind `torchbvh` operations. It focuses on the algorithm steps and observable behavior, not CUDA implementation details.
+## BVH Construction
+`torchbvh` builds an implicit bounding volume hierarchy over 2-D or 3-D points.
+The BVH layout follows the ostensibly-implicit tree formulation of Chitalu, Dubach, and
+Komura, and the Python/CUDA implementation was ported from the Julia
+`ImplicitBVH.jl` implementation.
+1. Compute the scene axis-aligned bounding box for the input points.
+2. Normalize each point into that scene box and assign it a Morton code.
+3. Sort source point indices by Morton code. The source point tensor itself stays in original order.
+4. Treat the sorted points as leaves of an implicit binary tree. If the leaf count is not a power of two, virtual leaves fill the rightmost missing positions.
+5. Store each real leaf's AABB as the point coordinate repeated as lower and upper bounds.
+6. Build internal node AABBs bottom-up by merging real child AABBs. If an internal node has only one real child, its AABB equals that child's AABB.
+7. Keep `sorted_indices` so leaf positions can map back to original source indices.
+Fixed-size batched BVHs repeat the same process independently for each sample in the batch. Ragged BVHs apply the same single-sample process to each packed segment described by `batch_offsets`.
+Why it is fast: The construction avoids serial tree insertion and pointer-heavy node allocation. Morton sorting converts spatial hierarchy construction into a parallel sort-plus-reduction problem, and the compact implicit tree stores only real node AABBs
+and source-index mappings. The main bottleneck in pointer or CPU tree builders is irregular allocation and recursive dependency; this design replaces it with contiguous tensor work that can be built and traversed with predictable memory access.
+## k-NN Query
+k-NN is an exact branch-and-bound search over the BVH. Returned distances are squared Euclidean distances sorted from nearest to farthest.
+For each query point:
+1. Start with an empty sorted neighbor list of length `k`, initialized to infinite distances.
+2. Start traversal at the BVH root.
+3. For a candidate node, compute the minimum possible squared distance from the query point to that node's AABB.
+4. If that minimum distance is greater than the current kth-best distance, skip the whole node.
+5. If the node is a leaf, compute the exact squared distance to its source point and insert the point into the sorted neighbor list if it improves the current list.
+6. If the node is internal, evaluate both real children and visit the closer child first.
+7. Continue until no reachable node can improve the neighbor list.
+8. Return original source indices and their matching squared distances.
+Self-neighbors are included when source points are queried against themselves. When two or more points are exactly tied, the returned tied indices are valid exact neighbors, but their order is not part of the public contract.
+For fixed-size batches, each sample is searched independently and indices are local to that sample. For ragged batches, each packed segment is searched independently and indices are local to the segment, not global packed-row indices.
+Why it is fast: Brute-force k-NN evaluates every query against every source point, so its distance-work scales as `M * N`. BVH traversal replaces most of those exact point distance evaluations with cheap AABB lower-bound tests. Once a query has `k` good neighbors, any subtree whose nearest possible point is farther than the current k-th neighbor is skipped entirely. The remaining per-query work is independent, so many queries can run in parallel without CPU-side tree traversal or Python loops.
+## MLS Interpolation
+Moving least squares interpolation fits a local linear field around each query position using k-NN neighborhoods.
+For each displaced query point:
+1. Run exact k-NN against the source geometry to get neighbor indices, squared distances, and neighbor positions. This selection is discrete and non-differentiable.
+2. Gather neighbor features with the returned indices.
+3. If one or more neighbors have squared distance at or below the exact-hit epsilon, use the unweighted mean of those exact-hit neighbor features as the interpolated value and return a zero spatial field gradient.
+4. Otherwise, compute offsets from the query to each neighbor position.
+5. Choose a local bandwidth from the sorted neighbor distances and clamp it away from zero.
+6. Assign each neighbor a smooth weight from its squared offset distance and the local bandwidth.
+7. Fit a regularized weighted linear model in local coordinates:
+   value = constant term + spatial slope dot offset.
+8. Return the fitted constant term as the interpolated feature.
+9. When `return_grad=True`, return the fitted spatial slope as the field gradient.
+Gradients flow through the MLS solve to `features` and live `displaced_points`.
+Gradients do not flow through BVH construction, Morton sorting, k-NN selection, integer indices, squared distances, or detached neighbor positions.
+Why it is fast: a dense differentiable interpolation would either compare each query to all source points or build large intermediate tensors for weights and gradients. MLS uses the exact k-NN result to restrict the solve to `k` local samples, then solves only a
+small regularized linear system per query. The discrete geometry search is detached, so autograd tracks the continuous MLS solve for `features` and `displaced_points` without recording the BVH traversal, sorting, or integer neighbor selection.
+## FPS
+Farthest point sampling selects anchor points that cover the source geometry and returns assignment metadata for the selected anchors.
+All FPS modes maintain this state:
+1. `indices`: selected source point indices in selection order.
+2. `nearest_anchor`: for each source point, the selection-order anchor currently closest to it.
+3. `nearest_anchor_dist_sq`: the squared distance from each source point to that nearest anchor.
+### Exact FPS
+The exact modes follow the standard farthest point sampling rule.
+1. Choose the first anchor from `seed`. If `seed=-1`, choose the source point nearest the input AABB center.
+2. Initialize every source point's nearest-anchor distance to its squared distance from the first anchor.
+3. Select the next anchor as a point with maximum current nearest-anchor distance.
+4. Update every source point: if the new anchor is closer than its current nearest anchor, replace the stored distance and assignment.
+5. Repeat selection and update until `target_tokens` anchors have been selected.
+6. Gather selected anchor coordinates.
+7. Compute per-anchor assignment counts and per-anchor radius from the final nearest-anchor assignments.
+8. Compute `coarse_order`, which orders selected anchors by BVH leaf position for locality-aware downstream gathering.
+`mode="exact_full_scan"` expresses this rule directly. `mode="exact_bucketed"` preserves the same exact result while using BVH/Morton bucket structure to organize the work.
+Why it is fast: standard exact FPS is dominated by the repeated global update and maximum search over all `N` points for each of `M` anchors. `exact_full_scan` keeps that state on the GPU and avoids CPU round trips. `exact_bucketed` keeps the exact selection rule but groups points by BVH/Morton buckets, so bucket maxima identify farthest candidates and AABB-based pruning can skip bucket refreshes that cannot improve any point's current nearest-anchor distance. This reduces memory traffic while preserving the exact FPS sequence.
+### Approximate Bucketed FPS
+`mode="approx_bucketed"` keeps the same output metadata but relaxes the anchor selection rule for speed.
+1. Build the same BVH/Morton spatial structure used by the exact bucketed path.
+2. Choose the first anchor with the same seed policy.
+3. Maintain exact nearest-anchor assignments and distances for the anchors that have already been selected.
+4. Track candidate farthest points from spatial buckets instead of scanning the full point set for one global maximum every round.
+5. Generate a small candidate set, reject duplicates, and commit one or more anchors per round according to the `r`, `c`, and `alpha` settings.
+6. After committed anchors are chosen, update every affected source point's exact nearest-anchor distance and assignment against those committed anchors.
+7. Refresh bucket state and repeat until `target_tokens` anchors have been selected.
+8. Compute the same public metadata as exact FPS.
+The approximate mode is useful when throughput is more important than matching the exact serial FPS sequence. Its returned assignments are exact with respect to the anchors it selected.
+Why it is fast: the fundamental bottleneck in exact FPS is the serial dependency that selects one anchor, updates all distances, then selects the next anchor. The approximate bucketed mode breaks part of that dependency by proposing farthest candidates from spatial buckets and committing multiple anchors per round. Distance assignments are then updated against the committed anchor set in one pass. The algorithm trades exact global anchor order for fewer full distance-update and bucket-refresh cycles, while keeping the final nearest-anchor metadata exact for the anchors it actually selected.
+## References
+- Floyd M. Chitalu, Christophe Dubach, and Taku Komura. "Binary
+  Ostensibly-Implicit Trees for Fast Collision Detection." Computer Graphics Forum,
+  39(2), 509-521, 2020. DOI:
+  [10.1111/cgf.13948](https://doi.org/10.1111/cgf.13948).
+- `ImplicitBVH.jl`, StellaOrg. Julia implementation of the implicitly indexed BVH
+  formulation from which the `torchbvh` BVH code was ported:
+  [github.com/StellaOrg/ImplicitBVH.jl](https://github.com/StellaOrg/ImplicitBVH.jl).