Skip to content

Conversation

@yfguo
Copy link
Contributor

@yfguo yfguo commented Oct 23, 2025

CMA should be enabled by default because test result show CMA outperform POSIX in mid to large message size.

The CMA threshold is set to 8K to match the eager threshold. 8KB message using POSIX pipeline is about 10% faster than using CMA in intra-NUMA case on Cascade Lake (fig 1), but not in all other cases. So setting the CMA threshold to the same as the eager threshold and completely bypass the POSIX pipeline seems a good idea.

Also, CMA seems not affected by across NUMA access and way outperformed POSIX pipeline.

@intel may need to change this if using eager module other than iqueue.

On Intel Xeon Gold 6226R (Cascade Lake)

image image

On AMD, this seems to be a OK setting. Although AMD has this weird drop at 16KB message size in almost all cases.

image image

TODO: add numbers for Aurora.

Pull Request Description

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

CMA should be enabled by default because test result show CMA
outperform POSIX in mid to large message size. The default CMA
threshold is set to 8KB which matches the eager threshold determined
by iqueue size.
Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hzhou
Copy link
Contributor

hzhou commented Oct 23, 2025

test:mpich/ch4/most

@yfguo
Copy link
Contributor Author

yfguo commented Oct 23, 2025

Will merge this after getting Aurora numbers and verify the latency numbers as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants