Skip to content

Conversation

@joaander
Copy link
Member

@joaander joaander commented Nov 20, 2025

Description

  • Remove the vendored hip and hipcub headers. Instead, use find_package(hip) to find external headers.
  • Change compiler definitions to support modern versions of hip.

Motivation and context

CUDA 13 contains many breaking changes and the vendored headers do not support it.
By using external hip libraries, HOOMD-blue will gain support for new versions of CUDA as soon as upstream adds support (at this time the latest release of hipcub does not support CUDA 13).

conda-forge does not support HIP_PLATFORM=nvidia (conda-forge/hip-feedstock#9) in hip-devel and lacks a hipcub package entirely. Therefore, users that build HOOMD from source for NVIDIA GPUs will need to install hip and hipcub headers:

How has this been tested?

HOOMD-blue compiles and passes tests with CUDA 12.9, rocm-systems:hip-version_7.2.53220, and rocm-libraries:rocm-7.1.0 locally. CI checks have been updated accordingly. Patches to hip and hipcub fix build errors with CUDA 12.5–12.8.

Checklist:

  • I have reviewed the Contributor Guidelines.
  • I agree with the terms of the HOOMD-blue Contributor Agreement.
  • My name is on the list of contributors (sphinx-doc/credits.rst) in the pull request source branch.
  • I have summarized these changes in CHANGELOG.rst following the established format.

@joaander joaander marked this pull request as draft November 20, 2025 15:13
@joaander joaander added the release Build and unit test all support compiler/python configurations label Nov 20, 2025
@joaander
Copy link
Member Author

@mphoward, what do you think of using an external HIP library even when HOOMD_GPU_PLATFORM=CUDA?

Ideally we could avoid this by using hipper everywhere, but that would be a lot more work.

In any case, I won't merge this until hipcub supports CUDA 13. Until then, these changes introduce additional complexity for developers with no benefit.

@joaander joaander deleted the branch trunk November 21, 2025 22:03
@joaander joaander closed this Nov 21, 2025
@mphoward
Copy link
Collaborator

Hmm, it isn't ideal to require CUDA users to download and compile HIP, but I agree this fix is substantially less work than getting hipper updated to modern rocm and implemented everywhere in HOOMD. (This work is still on my todo list for a long time, but I've been unable to get to it.)

Do you have a rough sense of when hipcub will support for CUDA 13 and this will be ready to merge? We could always temporarily require HIP, then relax this requirement later if we did complete the extra work.

@joaander joaander reopened this Dec 1, 2025
@joaander joaander changed the base branch from trunk-major to trunk December 1, 2025 13:14
@joaander
Copy link
Member Author

joaander commented Dec 1, 2025

Do you have a rough sense of when hipcub will support for CUDA 13 and this will be ready to merge? We could always temporarily require HIP, then relax this requirement later if we did complete the extra work.

I do not. When working on this branch, I was surprised to find that hipcub does not already support CUDA 13 given that it has been out for several months. I also found that the development version of hip supports only CUDA 12.9 and 13 (I had to patch hip to support all CUDA 12.4--12.8). There are no issues, discussions, or open pull requests regarding CUDA 13: https://github.com/ROCm/rocm-libraries

I am concerned that HIP/CUDA interoperation is no longer actively supported by AMD. They seem to measure success by "can it run PyTorch?" and have even made the effort to create a custom build system that builds ROCm, HIP, and PyTorch: https://github.com/ROCm/TheRock

@mphoward
Copy link
Collaborator

mphoward commented Dec 1, 2025

I was surprised to find that hipcub does not already support CUDA 13 given that it has been out for several months. I also found that the development version of hip supports only CUDA 12.9 and 13 (I had to patch hip to support all CUDA 12.4--12.8).

I am concerned that HIP/CUDA interoperation is no longer actively supported by AMD.

I agree, this is all quite concerning. It might also explain why they have more closely matched the HIP kernel launch / device code interface with CUDA in recent major releases (it is a replacement, not a compatibility layer).

Lack of proper CUDA support would be a good reason to put in the effort to update and use hipper as our own compatibility layer. I can prioritize working on that, but the soonest I can have a look would be in ~2 weeks (after the semester ends). It will require breaking changes to hipper first, and I am good with jumping to whatever the minimum versions of CUDA and ROCm that we need for HOOMD to keep the work as limited as possible.

@joaander
Copy link
Member Author

joaander commented Dec 1, 2025

We'll give AMD some time and see if they add CUDA 13 support.

If you do plan an eventual hipper refactor, the minimum CUDA I need is 12.8. NCSA Delta has just updated to that version. When I submitted this PR, the system was still on CUDA 12.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release Build and unit test all support compiler/python configurations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants