Implements latency test #114

astroC86 · 2025-08-21T11:32:13Z

Motivation

Closes #12

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

mawad-amd · 2025-08-21T11:53:59Z

Thanks for the PR. I will check it out later today (I understand it’s a draft so I may leave high level comments). You will have to exit gracefully when ranks are not 2 because CI will try all number of ranks. You can ignore the failing label action.

astroC86 · 2025-08-21T12:03:06Z

You're most welcome, I just did it as two ranks for testing (as well as writing to the file), would it be better to have it as a matrix, similar to the way we do the bandwith test?

mawad-amd · 2025-08-21T12:08:58Z

A matrix would be nicer.

mawad-amd · 2025-08-31T20:14:13Z

tests/examples/test_load_latency.py

+        else:
+            while tl.load(flag, cache_modifier=".cv", volatile=True) != token_first_done:
+                pass
+            iris.put(data + offsets, data + offsets, curr_rank, peer_rank, heap_bases, mask=data_mask)


I am curious about the results you are getting. Also, do you think the load and store API would be better here to avoid the local load and store?

i am unable to allocate 8 mi300x's at the moment on amd cloud. I agree with you on the load store part 👍

this maybe relevent, using nvbandwidth on H100 i get the following :

Running device_to_device_latency_sm. Device to Device Latency SM GPU(row) <-> GPU(column) (ns) 0 1 2 3 0 N/A 549.34 545.77 545.35 1 550.28 N/A 660.95 659.08 2 548.35 547.22 N/A 544.26 3 545.49 831.96 543.86 N/A

the output from the trition code is:

R0 R1 R2 R3

R0 0.000000 722.239990 701.119995 683.200012

R1 722.559998 0.000000 736.000000 727.679993

R2 701.440002 735.679993 0.000000 712.640015

R3 683.200012 727.679993 712.960022 0.000000

%error ((triton - nvband)*100/nvband):

R0 R1 R2 R3

R0 N/A 31.47% 28.46% 25.28%

R1 31.31% N/A 11.35% 10.41%

R2 27.92% 34.44% N/A 30.94%

R3 25.25% -12.53% 31.09% N/A

Very interesting. How does it compare after using the load/store?

Actually, how did that even work? read_realtime is using AMD GCN assembly. Did you change that to the equivalent PTX?

Also, I just tested this on MI300X and it seems to deadlock.

Yeah, this was what I initially had in mind for the microbenchmark. I am surprised you didn’t need to accumulate the result here. I remember everything was getting optimized away when we wrote similar code for the all load benchmark.

We wanted to add the cache modifiers and volatile arguments for a while but we haven’t yet. Let me think about this a bit more.

I am surprised you didn’t need to accumulate the result here. I remember everything was getting optimized away when we wrote similar code for the all load benchmark.

yea without the cache modifier and volatile it gets optimized away

We wanted to add the cache modifiers and volatile arguments for a while but we haven’t yet. Let me think about this a bit more.

No worries

These numbers are not in nanoseconds, these are in clock cycles, yes?
See ISA.

We have been using this functions to find the clock.

Could you also push your PTX to the CUDA port branch as well. You can just comment out the CDNA assembly over there for now.

ok will do
this is the ptx i have been using:

@triton.jit def read_realtime(): tmp = tl.inline_asm_elementwise( asm="mov.u64 $0, %globaltimer;", constraints=("=l"), args=[], dtype=tl.int64, is_pure=False, pack=1, ) return tmp

mawad-amd

Looks mostly good. Just a couple of comments.

tests/examples/test_load_latency.py

astroC86 and others added 2 commits August 21, 2025 13:31

inital latency test

5fabfef

Apply Ruff auto-fixes

a187495

astroC86 requested review from mawad-amd, neoblizz and BKP as code owners August 21, 2025 11:32

astroC86 marked this pull request as draft August 21, 2025 11:32

astroC86 changed the title ~~inital latency test~~ Implements latency test Aug 21, 2025

mawad-amd added benchmarks iris Iris project issue labels Aug 21, 2025

initial impl of latency

ad03093

astroC86 force-pushed the astroC86/load-store-latency branch from 459f636 to dd7038d Compare August 31, 2025 19:45

astroC86 marked this pull request as ready for review August 31, 2025 19:51

Apply Ruff auto-fixes

e72704a

astroC86 force-pushed the astroC86/load-store-latency branch from 1fbcf55 to e72704a Compare August 31, 2025 19:52

astroC86 added 3 commits August 31, 2025 22:03

got rid of atomic timestamp store

f4adf5f

increase warmup time

4b8bc7a

cleanup

5eff862

mawad-amd reviewed Aug 31, 2025

View reviewed changes

tests/examples/test_load_latency.py Outdated Show resolved Hide resolved

tests/examples/test_load_latency.py Outdated Show resolved Hide resolved

astroC86 force-pushed the astroC86/load-store-latency branch from 09668ba to bf9a37c Compare September 1, 2025 02:16

addressing comments

606853e

astroC86 force-pushed the astroC86/load-store-latency branch from 5a51b54 to 606853e Compare September 1, 2025 03:04

Fix deadlock

a3e9023

astroC86 force-pushed the astroC86/load-store-latency branch from 356c147 to a3e9023 Compare September 1, 2025 04:27

Apply Ruff auto-fixes

003b273

astroC86 marked this pull request as draft September 2, 2025 01:24

Merge branch 'main' into astroC86/load-store-latency

f537a20

Rewrote latency test

56ad603

astroC86 force-pushed the astroC86/load-store-latency branch 3 times, most recently from 9910619 to 1509fd5 Compare September 6, 2025 19:02

Fix latency time measurement

b301385

astroC86 force-pushed the astroC86/load-store-latency branch from f53375a to b301385 Compare September 6, 2025 20:02

Apply Ruff auto-fixes

8620fa3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implements latency test #114

Implements latency test #114

Uh oh!

astroC86 commented Aug 21, 2025 •

edited

Loading

Uh oh!

mawad-amd commented Aug 21, 2025 •

edited

Loading

Uh oh!

astroC86 commented Aug 21, 2025 •

edited

Loading

Uh oh!

mawad-amd commented Aug 21, 2025

Uh oh!

mawad-amd Aug 31, 2025

Uh oh!

astroC86 Aug 31, 2025 •

edited

Loading

Uh oh!

mawad-amd Sep 1, 2025

Uh oh!

mawad-amd Sep 1, 2025

Uh oh!

mawad-amd Sep 1, 2025

Uh oh!

mawad-amd Sep 4, 2025

Uh oh!

astroC86 Sep 4, 2025 •

edited

Loading

Uh oh!

mawad-amd Sep 5, 2025

Uh oh!

mawad-amd Sep 5, 2025 •

edited

Loading

Uh oh!

astroC86 Sep 5, 2025

Uh oh!

mawad-amd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	R0	R1	R2	R3
R0	0.000000	722.239990	701.119995	683.200012
R1	722.559998	0.000000	736.000000	727.679993
R2	701.440002	735.679993	0.000000	712.640015
R3	683.200012	727.679993	712.960022	0.000000

	R0	R1	R2	R3
R0	N/A	31.47%	28.46%	25.28%
R1	31.31%	N/A	11.35%	10.41%
R2	27.92%	34.44%	N/A	30.94%
R3	25.25%	-12.53%	31.09%	N/A

Implements latency test #114

Are you sure you want to change the base?

Implements latency test #114

Uh oh!

Conversation

astroC86 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

mawad-amd commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astroC86 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mawad-amd commented Aug 21, 2025

Uh oh!

mawad-amd Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

astroC86 Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

astroC86 Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astroC86 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

mawad-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

astroC86 commented Aug 21, 2025 •

edited

Loading

mawad-amd commented Aug 21, 2025 •

edited

Loading

astroC86 commented Aug 21, 2025 •

edited

Loading

astroC86 Aug 31, 2025 •

edited

Loading

astroC86 Sep 4, 2025 •

edited

Loading

mawad-amd Sep 5, 2025 •

edited

Loading