Skip to content

Switch to use CUDA driver APIs in Device constructor #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 7, 2025

Conversation

leofang
Copy link
Member

@leofang leofang commented Feb 21, 2025

Before this PR:

In [3]: %timeit Device()
660 ns ± 2.01 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit Device(0)
644 ns ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

With this PR:

In [3]: %timeit Device()
396 ns ± 1.78 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit Device(0)
165 ns ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

(Bindings are built from the main branch.)

Copy link
Contributor

copy-pr-bot bot commented Feb 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang self-assigned this Feb 22, 2025
@leofang leofang added the blocked This task is currently blocked by other tasks label Feb 22, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module and removed blocked This task is currently blocked by other tasks labels Apr 5, 2025
@leofang leofang added this to the cuda.core beta 4 milestone Apr 5, 2025
@leofang leofang changed the title WIP: Switch to use CUDA driver APIs in Device constructor Switch to use CUDA driver APIs in Device constructor Apr 6, 2025
@leofang
Copy link
Member Author

leofang commented Apr 6, 2025

/ok to test

This comment has been minimized.

@leofang leofang requested review from rwgk and ksimpson-work April 7, 2025 17:39
@leofang leofang marked this pull request as ready for review April 7, 2025 17:39
@leofang leofang marked this pull request as draft April 7, 2025 22:19
@leofang leofang marked this pull request as ready for review May 24, 2025 02:16
@leofang
Copy link
Member Author

leofang commented May 24, 2025

/ok to test c9fac0b

@leofang
Copy link
Member Author

leofang commented May 28, 2025

This is ready.

rwgk
rwgk previously approved these changes May 28, 2025
@leofang
Copy link
Member Author

leofang commented Jun 6, 2025

/ok to test d70ec24

@leofang
Copy link
Member Author

leofang commented Jun 6, 2025

/ok to test d279e50

@leofang

This comment was marked as resolved.

@leofang leofang added blocked This task is currently blocked by other tasks and removed blocked This task is currently blocked by other tasks labels Jun 7, 2025
@leofang
Copy link
Member Author

leofang commented Jun 7, 2025

/ok to test 708fd70

This reverts commit d279e50.
@leofang
Copy link
Member Author

leofang commented Jun 7, 2025

/ok to test 4015f9c

@github-project-automation github-project-automation bot moved this from Todo to In Review in CCCL Jun 7, 2025
@leofang leofang enabled auto-merge (squash) June 7, 2025 02:41
@leofang leofang merged commit 0fe2309 into NVIDIA:main Jun 7, 2025
53 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jun 7, 2025
@leofang leofang deleted the reduce_cudart branch June 7, 2025 02:48
@leofang
Copy link
Member Author

leofang commented Jun 7, 2025

Thanks, Ralf/Keith!

Copy link

github-actions bot commented Jun 7, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants