Skip to content

Conversation

@djramic
Copy link
Contributor

@djramic djramic commented Oct 10, 2025

Motivation

Resolve: https://ontrack-internal.amd.com/browse/SWDEV-559813

Technical Details

This change modifies the lit configuration to select only the first detected GPU agent when initializing config.arch.
Previously, all agents were concatenated into a single string (e.g. "gfx1201,gfx1100"), which caused issues on systems with multiple different GPUs since our tools expect a single target.

Copy link
Contributor

@justinrosner justinrosner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run this on a machine that has two different GPU archs?

config.features, config.arch_support_mfma, config.arch_support_wmma = get_arch_features(x)
config.substitutions.append(('%features', config.features))
if not config.arch:
if agents:
Copy link
Contributor

@justinrosner justinrosner Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know here is probably not the place to do this, but is there somewhere where we can emit a warning to the user that we are only using one of the available archs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best place for that is. This seems like it might make most sense to me:
https://github.com/ROCm/rocMLIR/blob/develop/mlir/test/common_utils/common.py#L58

@dhernandez0
Copy link
Contributor

I think we do use multiple GPUs when using MITuna, I think we can't do this. We need to figure out what GPU MITuna is using.

@djramic
Copy link
Contributor Author

djramic commented Oct 13, 2025

We need to figure out what GPU MITuna is using

This is related to systems that have multiple different GPUs. On MITuna it uses multiple identical GPUs, so I'm not sure how Tuna would behave with different GPUs. Since these changes only affect the lit config, they shouldn't impact Tuna in any way.

config.substitutions.append(('%features', config.features))
if not config.arch:
if agents:
config.arch = sorted(agents)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we sorting? let's imagine the user sets HIP_VISIBLE_DEVICES="1,0", I guess HIP will return gpu 1 first? we want to run on that device, right?

@dhernandez0
Copy link
Contributor

We need to figure out what GPU MITuna is using

This is related to systems that have multiple different GPUs. On MITuna it uses multiple identical GPUs, so I'm not sure how Tuna would behave with different GPUs. Since these changes only affect the lit config, they shouldn't impact Tuna in any way.

Is there any guarantee we won't find different GPUs on the CI machines? I think at least we want to have a warning if that's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants