-
Notifications
You must be signed in to change notification settings - Fork 444
fabtests/efa: cuda dmabuf validation logic #11443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Piggybacking on @jiaxiyan's comments here (#11437 (comment)): we don't necessarily need to split commits based on which parts of the codebase are touched, but we do want to keep commits as atomic as possible. |
c128ea1
to
14a4dea
Compare
14a4dea
to
394485f
Compare
394485f
to
0ac312e
Compare
Can you rephrase your commit title to be consistent with the PR title |
fabtests/efa: cuda dmabuf validation logic to the top of the commit message and then keep everything else below it? |
yes |
test: Adding cuda dmabuf validation logic Problem: - Users was submitting fabtests without the --do-dmabuf-reg-for-hmem flag Solution: - Added hmem cuda logic changes based on nvidia manual [https://docs.nvidia.com/cuda/gpudirect-rdma/ here] - Added check_dmabuf to init cuda internals and get dmabuf support information - Added conftest to validate these checks if a user specifies cuda command in fabtests Testing: - Validated tests with --do-dmabuf-reg-for-hmem flag and without it on p6-gb200 cluster - All tests skipped with cuda command - make -j install && python3 install/bin/runfabtests.py --expression "cuda" -vvv -p /home/nmazzill/libfabric/fabtests/install/bin/ --junit-xml ft_`git branch --show-current`_`git rev-parse HEAD`_all_junit.xml --nworkers 16 -b efa 10.0.123.149 10.0.121.190 | tee ft_`git branch --show-current`_`git rev-parse HEAD`_all_stdout - All tests passed with this cuda command - make -j install && python3 install/bin/runfabtests.py --expression "cuda" --do-dmabuf-reg-for-hmem -vvv -p /home/nmazzill/libfabric/fabtests/install/bin/ --junit-xml ft_`git branch --show-current`_`git rev-parse HEAD`_all_junit.xml --nworkers 16 -b efa 10.0.123.149 10.0.121.190 | tee ft_`git branch --show-current`_`git rev-parse HEAD`_all_stdout Sim Issue: - N/A Signed-off-by: Nick Mazzilli <[email protected]>
0ac312e
to
2672cef
Compare
@j-xiong can u review the fabtests common code change, thanks |
has_cuda_mark = any(mark.name == 'cuda_memory' for mark in request.node.iter_markers()) | ||
|
||
if has_cuda_mark: | ||
print("Running CUDA validation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the print
in this function? It will show up in all the test outputs, which is too verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the comments below, I would suggest breaking this into two commits -- one for the common code changes and one for the efa test changes.
/* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */ | ||
/* SPDX-FileCopyrightText: Copyright Amazon.com, Inc. or its affiliates. All rights reserved. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the convention of multiline comments here.
@@ -0,0 +1,35 @@ | |||
/* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */ | |||
/* SPDX-FileCopyrightText: Copyright Amazon.com, Inc. or its affiliates. All rights reserved. */ | |||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an empty line before this.
CUDA_MEMORY_SUPPORT__NOT_INITIALIZED = -1, | ||
CUDA_MEMORY_SUPPORT__NOT_SUPPORTED = 0, | ||
CUDA_MEMORY_SUPPORT__DMA_BUF_ONLY = 1, | ||
CUDA_MEMORY_SUPPORT__GDR_ONLY= 2, | ||
CUDA_MEMORY_SUPPORT__DMABUF_GDR_BOTH = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment is off, most like due to space being used instead of tabs.
CUDA_MEMORY_SUPPORT__DMA_BUF_ONLY = 1, | ||
CUDA_MEMORY_SUPPORT__GDR_ONLY= 2, | ||
CUDA_MEMORY_SUPPORT__DMABUF_GDR_BOTH = 3, | ||
} cuda_memory_support_e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use typedef here. Just define the enum, and use enum enum_name var_name
to define variables.
I would use FT_CUDA_xxx
instead of CUDA_xxx
to avoid name space confusion. So for the lowercase names.
return CUDA_MEMORY_SUPPORT__NOT_SUPPORTED; | ||
} | ||
|
||
cuda_memory_support_e cuda_memory_support = dmabuf_viable_and_supported(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call ft_cuda_memory_support()
directly?
if (!gdr_supported && !dmabuf_supported) { | ||
cuda_memory_support = CUDA_MEMORY_SUPPORT__NOT_SUPPORTED; | ||
} else if (gdr_supported && dmabuf_supported) { | ||
cuda_memory_support = CUDA_MEMORY_SUPPORT__DMABUF_GDR_BOTH; | ||
} else if (dmabuf_supported) { | ||
cuda_memory_support = CUDA_MEMORY_SUPPORT__DMA_BUF_ONLY; | ||
} else { | ||
cuda_memory_support = CUDA_MEMORY_SUPPORT__GDR_ONLY; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all the braces here.
Problem:
Solution:
Testing:
git branch --show-current
git rev-parse HEAD
all_junit.xml --nworkers 16 -b efa 10.0.123.149 10.0.121.190 | tee ftgit branch --show-current
git rev-parse HEAD
_all_stdoutgit branch --show-current
git rev-parse HEAD
all_junit.xml --nworkers 16 -b efa 10.0.123.149 10.0.121.190 | tee ftgit branch --show-current
git rev-parse HEAD
_all_stdoutSim Issue: