We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTED
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, the following code is broken right after a3d8d68
import cupy as cp from mscclpp import TcpBootstrap, Communicator, Transport from mscclpp._mscclpp import RawGpuBuffer bootstrap = TcpBootstrap.create(0, 1) comm = Communicator(bootstrap) cp.cuda.Device(0).use() memory = RawGpuBuffer(1024 * 1024) data_ptr = memory.data() comm.register_memory(data_ptr, 1024 * 1024, Transport.IB0)
Traceback (most recent call last): File "/root/bug.py", line 10, in <module> comm.register_memory(data_ptr, 1024 * 1024, Transport.IB0) mscclpp._mscclpp.IbError: (14, 'ibv_reg_mr failed (errno 14) (Ib failure: Bad address)')
Environment:
(base) root@xxx:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0 (base) root@xxx:~# nvidia-smi Sun Apr 6 15:37:56 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.216.03 Driver Version: 535.216.03 CUDA Version: 12.4 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 80GB HBM3 On | 00000000:19:00.0 Off | 0 | | N/A 29C P0 71W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ ...8xH100
The text was updated successfully, but these errors were encountered:
Thanks for reporting, looks like related to this issue: NVIDIA/gdrcopy#266, will try ibv_reg_dmabuf_mr API
ibv_reg_dmabuf_mr
Sorry, something went wrong.
No branches or pull requests
Hi, the following code is broken right after a3d8d68
Environment:
The text was updated successfully, but these errors were encountered: