Skip to content

Conversation

nsarka
Copy link
Member

@nsarka nsarka commented Sep 2, 2025

Sample output, luna80g partition on dlcluster:

nsarkauskas@luna-prod-1315-au:/opt/pytorch/Fuser$ mpirun -np 2 -H luna-prod-1315-au:2 ./python/build/nvfuser_p2p_communication_bench
Starting P2P communication benchmark...
Repetitions per size: 100
Number of devices: 2
Testing tensor sizes from 2^10 to 2^26 elements

Message Size   Elements    Latency (μs)  Bandwidth (GB/s)
------------------------------------------------------------
4 KB           1024        123.50         0.03
8 KB           2048        125.11         0.06
16 KB          4096        124.52         0.12
32 KB          8192        125.16         0.24
64 KB          16384       163.03         0.37
128 KB         32768       121.25         1.01
256 KB         65536       121.82         2.00
512 KB         131072      137.84         3.54
1 MB           262144      121.56         8.03
2 MB           524288      123.86         15.77
4 MB           1048576     130.98         29.82
8 MB           2097152     167.01         46.78
16 MB          4194304     278.03         56.20
32 MB          8388608     504.67         61.92
64 MB          16777216    872.91         71.60
128 MB         33554432    1523.83        82.03
256 MB         67108864    2981.99        83.84

The test will create a P2PCommunication with the cuda ipc backend, put it inside a HostIrEvaluator, then run it. The timer measuring the latency is std::chrono::high_resolution_clock.

@nsarka nsarka requested a review from wujingyue September 2, 2025 13:27
@nsarka nsarka force-pushed the nsarka/cuda-ipc-benchmark branch from 51a90f2 to 3fd9af2 Compare September 2, 2025 13:31
Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@wujingyue wujingyue requested a review from Priya2698 September 2, 2025 13:37
Copy link
Collaborator

@Priya2698 Priya2698 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM.

@nsarka nsarka force-pushed the nsarka/cuda-ipc-benchmark branch from 2805c45 to 88b9974 Compare September 2, 2025 20:35
Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Defer approval to @Priya2698

@nsarka nsarka force-pushed the nsarka/cuda-ipc-benchmark branch from 88b9974 to fa71128 Compare September 3, 2025 14:51
@nsarka nsarka force-pushed the nsarka/cuda-ipc-benchmark branch from fbab35a to 874eef9 Compare September 5, 2025 22:23
@nsarka nsarka requested a review from Priya2698 September 5, 2025 22:24
@nsarka
Copy link
Member Author

nsarka commented Sep 5, 2025

!test

Copy link
Collaborator

@Priya2698 Priya2698 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Can you add the SOL bandwidth expected for the results as a reference to the PR description?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants