Skip to content

Conversation

snadampal
Copy link
Contributor

Fixed the smr buffer pool alignment to match cache line size on x86 and aarch64 cpus. This has improved the p99 latencies and also throughput for nccl collective tests.

Fixed the smr buffer pool alignment to match cache line size on
x86 and aarch64 cpus. This has improved the p99 latencies and
also throughput for nccl collective tests.

Signed-off-by: Sunita Nadampalli <[email protected]>
@aingerson
Copy link
Contributor

@snadampal Can you share some benchmarks and numbers your ran?

/*
* Set alignment to aarch64 and x86 cache line size.
*/
#define SHM_SMR_BUFPOOL_ALIGNMENT (64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove shm prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aingerson thanks for the prompt review. I'm running some internal benchmarks, will try to collect data that I can share here.

remove shm prefix

:) I started with SMR_BUFPOOL_ALIGNMENT name and in the last minute I've added SHM prefix to make it provider specific. sure, I will remove the prefix it to match the other definitions related to smr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snadampal SMR is the prefix for the shm provider. It is used instead of SHM to avoid any potential conflict with UNIX/Linux IPC "shm".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants