Skip to content

Conversation

@hzhou
Copy link
Contributor

@hzhou hzhou commented Oct 22, 2025

Pull Request Description

Some system may encounter progress starvation when multiple processes on a node enter busy polling progress loop and NIC is unable to update the event due to PCIe atomic contention issue. Previously, we added a progress throttle using usleep. It is effective but may be too blunt to negatively impact normal performance.

This PR adds a few more knobs to fine-tune the progress throttle.

  • MPIR_CVAR_CH4_PROGRESS_THROTTLE_MIN_PROCS to only throttle when sufficient number of processes enter the throttle state
  • MPIR_CVAR_CH4_PROGRESS_THROTTLE_NUM_PAUSES if we choose to use thread_yield() in stead of usleep.

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou
Copy link
Contributor Author

hzhou commented Oct 22, 2025

image The effect of tuning `MPIR_CVAR_CH4_PROGRESS_THROTTLE_MIN_PROCS` using `usleep` as the throttling method. At above 89 processes, the time of alltoallv quickly increases due to progress starvation.

hzhou added 2 commits October 22, 2025 13:24
Sometime, especially during troubleshooting, we may get confused. Print
MPICH version string with MPIR_CVAR_DEBUG_SUMMARY as a simple way to
verify.
When the no_progress_counter reach
MPIR_CVAR_CH4_PROGRESS_THROTTLE_NO_PROGRESS_COUNT, it enters the
throttle state and it should remain in this state untile progress are
made. Resetting the counter removes the process from throttle when the
need to throttle still exist.
@hzhou hzhou force-pushed the 2510_progress_throttle branch from 66d1393 to f78a270 Compare October 22, 2025 18:24
@hzhou hzhou added the aurora label Oct 22, 2025
hzhou added 2 commits October 22, 2025 13:41
Only perform progress throttle when minimum of
MPIR_CVAR_CH4_PROGRESS_THROTTLE_MIN_PROCS enter the throttle state.
Add alternative method to throttle, e.g. sched_yield or PAUSE, and use
MPIR_CVAR_CH4_PROGRESS_THROTTLE_NUM_PAUSES to control the amount of
throttle.
@hzhou hzhou force-pushed the 2510_progress_throttle branch from f78a270 to c41032d Compare October 22, 2025 18:41
@hzhou hzhou marked this pull request as draft October 23, 2025 14:41
@sonjahapp
Copy link
Contributor

Small comment on usleep use: In other parts of MPICH the use of usleep is protected with the HAVE_USLEEP macro because this function is not available in all cases. While you are still working on the fine-tuning of the new throttling mechanism, this is perhaps worth a quick look to prevent future compilation issues - maybe I'm overlooking something. 🙈

@hzhou
Copy link
Contributor Author

hzhou commented Oct 23, 2025

Small comment on usleep use: In other parts of MPICH the use of usleep is protected with the HAVE_USLEEP macro because this function is not available in all cases. While you are still working on the fine-tuning of the new throttling mechanism, this is perhaps worth a quick look to prevent future compilation issues - maybe I'm overlooking something. 🙈

Good point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants