Skip to content

[Question]: optimization for block_radix_rank #4438

Closed Answered by elstehle
hlyix asked this question in CUB
Discussion options

You must be logged in to vote

Thank you for your suggestion, @hlyix!

While an early return might seem appealing for optimization, warp divergence becomes a concern in this context. If threads within the same warp take divergent execution paths (some returning early while others proceed), this serializes instruction execution across the warp, potentially negating any performance gains and even causing regressions.

More importantly, the proposed change would compromise correctness: The ExclusiveDownsweep operation isn't just a conditional scan, but it also integrates the exclusive_partial value into the thread's items.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by oleksandr-pavlyk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
CUB
Labels
feature request New feature or request.
2 participants
Converted from issue

This discussion was converted from issue #4407 on April 14, 2025 14:37.