support TORCH_CUDA_ARCH_LIST and avoid link against libcuda.so at compile time #245
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
support TORCH_CUDA_ARCH_LIST so we can compile wheel in a non-GPU environment, making it more convenient to working with CI/CD and prebuilt binary distirbution
now we can build SageAttention in a cpu-only environment using (as an example):
CUDA_HOME=/path/to/cuda-x.y TORCH_CUDA_ARCH_LIST='8.0;9.0+PTX' python3 setup.py bdist_wheel
libcuda.so (the driver API) is not available in a non-GPU environment, so we should avoid directly link against it at compile time.
NVIDIA has offered a standard way to access driver API via
cudaGetDriverEntryPointByVersion
(or previouslycudaGetDriverEntryPoint
) to dynamically load and call driver API