Skip to content

Conversation

@mirza-halilcevic
Copy link
Contributor

@mirza-halilcevic mirza-halilcevic commented Sep 19, 2025

Copy link
Member

@umangyadav umangyadav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but see if you can address comment about auto generating some of the stuffs.

@mirza-halilcevic mirza-halilcevic marked this pull request as ready for review October 21, 2025 00:32
return ArrayRef<ParamsType>(it->second.first, it->second.second);
}

auto archFamily = getArchName(arch).drop_back(2).str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to always drop two? Take into account gfx9 typically has 3 numbers only (gfx950, gfx942, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I changed the fallback lookup significantly. Let me know what you think.

for (const auto &entry : table) {
if (entry.first.find(archFamily) != std::string::npos &&
entry.first.find(makeSuffix(op, dataType)) != std::string::npos) {
fallbackKey = std::max(fallbackKey, entry.first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does std::max() of two strings do? take the longest string?

Copy link
Contributor Author

@mirza-halilcevic mirza-halilcevic Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare lexicographically. The idea is that it would give me the latest arch string, e.g. gfx1201 is preferred to gfx1200.

llvm_unreachable("Invalid architecture string");
}
auto remaining = arch.substr(gfxPos);
auto endPos = remaining.find_first_not_of("0123456789a", 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's gfx1030w apparently as well. Please can you check all possible codes somewhere? (maybe llvm)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now looks for the first non-alphanumeric. That should be flexible enough yet not too verbose.

llvm::raw_string_ostream os(dataTypeStr);
os << dataType;

if (dataTypeStr == "bf16") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment about why we do this? we are assuming f16/bf16 have the same optimal perfconfigs, right? same for f8

Copy link
Contributor

@dhernandez0 dhernandez0 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider doing dataType.getIntOrFloatBitWidth() == 8 && isa < FloatType > (dataType), instead of matching strings. I think that's more robust for future datatypes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion.

We don't tune for bf16, so the f16 lists are used instead. It is an assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking more into it, there seem to be more edge cases who went under the radar with the previous approach. Previously, because of the if-else chains, the selection of the quick tune lists was much less strict, and the else statements were acting like catch-alls. For example, the gfx1200 quick tune list would be selected for a gfx942 non-accel problem, because we don't have any non-accel list for gfx942. A lot of tests fail now because of this. Also, for any non-accel case we would return f32 lists, because we don't have non-accel lists for other types. The non-accel cases seem to happen because of the following line:

convGenerator.flipAccel();

Any thoughts? On one hand it is incorrect to return a list for the wrong arch and/or data type, but on the other hand we have tests where this happens. Maybe it doesn't matter that much in these non-accel cases, and we can just fall back to any list?

@umangyadav Can you also chime in?

@mirza-halilcevic mirza-halilcevic force-pushed the quick_tune_code_review branch 3 times, most recently from 3d29f0f to 3bbfa38 Compare October 25, 2025 16:04
@mirza-halilcevic mirza-halilcevic force-pushed the quick_tune_code_review branch 3 times, most recently from 6c389c2 to c6f386d Compare October 27, 2025 16:13
@umangyadav umangyadav requested a review from Copilot October 29, 2025 14:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors tuning parameter lookup in the Rock dialect by introducing a template-based ParamLookupTable class to replace hardcoded architecture-specific logic. The changes also improve the quickTuningGen.py script to support dynamic lookup table generation and fix various parameter naming inconsistencies.

  • Replaces large switch statements in C++ with a centralized lookup table mechanism
  • Adds dynamic lookup table generation and insertion logic to the Python tuning script
  • Standardizes parameter naming conventions (e.g., no_splitkno_splitk throughout)

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
mlir/lib/Dialect/Rock/Tuning/GridwiseGemmParams.cpp Introduces ParamLookupTable template class to replace hardcoded parameter lookups with table-driven approach
mlir/utils/performance/analysis/quickTuningGen.py Adds lookup table generation, refactors parameter name logic, fixes naming inconsistencies (splitK → splitk)
mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc Adds lookup table entries for NonAccel and Accel parameter mappings
mlir/include/mlir/Dialect/Rock/Tuning/GridwiseGemmParams.h Adds friend declarations for ParamLookupTable to access private static members
mlir/utils/performance/tuningRunner.py Adds index=False parameter to CSV export for cleaner debug output
mlir/test/Dialect/Rock/affix_tuning_params.mlir Updates test expectations with new tuning parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 58 to 59
replacment = f'{begin_marker}\n{new_content}\n{end_marker}'
content = pattern.sub(replacment, content)
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'replacment' to 'replacement'.

Suggested change
replacment = f'{begin_marker}\n{new_content}\n{end_marker}'
content = pattern.sub(replacment, content)
replacement = f'{begin_marker}\n{new_content}\n{end_marker}'
content = pattern.sub(replacement, content)

Copilot uses AI. Check for mistakes.
with open(file_path, 'r') as file:
content = file.read()

pattern = re.compile(f'{mapping}', re.DOTALL)
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern is treating mapping as a literal regex pattern instead of escaping it. This will fail if mapping contains special regex characters like brackets or braces. Use re.escape(mapping) to treat it as a literal string.

Suggested change
pattern = re.compile(f'{mapping}', re.DOTALL)
pattern = re.compile(re.escape(mapping), re.DOTALL)

Copilot uses AI. Check for mistakes.
usage: quickTunerGen.py [-h] --input-dir INPUT_DIR --op {gemm,conv} [--th TH] --arch ARCH [--update] [--no-splitK]
usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitK
usage: quickTunerGen.py [-h] --input-dir INPUT_DIR --op {gemm,conv} [--th TH] --arch ARCH [--update] [--no-splitk]
usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'exsample' to 'example'.

Suggested change
usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk
usage example: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk

Copilot uses AI. Check for mistakes.
default=False,
action='store_true',
help='Removing the spliK factor from the generated list')
help='Removing the Split-K factor from the generated list')
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Help text uses present participle 'Removing' instead of imperative form. Change to 'Remove the Split-K factor from the generated list' for consistency with typical argparse help conventions.

Suggested change
help='Removing the Split-K factor from the generated list')
help='Remove the Split-K factor from the generated list')

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +133
// Find the lexicographically closest relative
static std::string findFallback(const std::string &target) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docstring on what this is trying to find ?

@umangyadav
Copy link
Member

Would it be possible to add some tests ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants