-
Couldn't load subscription status.
- Fork 50
Address outstanding code review comments #1994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but see if you can address comment about auto generating some of the stuffs.
ff2eb71 to
fecd61a
Compare
77f8a26 to
71b0b00
Compare
| return ArrayRef<ParamsType>(it->second.first, it->second.second); | ||
| } | ||
|
|
||
| auto archFamily = getArchName(arch).drop_back(2).str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to always drop two? Take into account gfx9 typically has 3 numbers only (gfx950, gfx942, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I changed the fallback lookup significantly. Let me know what you think.
| for (const auto &entry : table) { | ||
| if (entry.first.find(archFamily) != std::string::npos && | ||
| entry.first.find(makeSuffix(op, dataType)) != std::string::npos) { | ||
| fallbackKey = std::max(fallbackKey, entry.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does std::max() of two strings do? take the longest string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare lexicographically. The idea is that it would give me the latest arch string, e.g. gfx1201 is preferred to gfx1200.
| llvm_unreachable("Invalid architecture string"); | ||
| } | ||
| auto remaining = arch.substr(gfxPos); | ||
| auto endPos = remaining.find_first_not_of("0123456789a", 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's gfx1030w apparently as well. Please can you check all possible codes somewhere? (maybe llvm)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now looks for the first non-alphanumeric. That should be flexible enough yet not too verbose.
| llvm::raw_string_ostream os(dataTypeStr); | ||
| os << dataType; | ||
|
|
||
| if (dataTypeStr == "bf16") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment about why we do this? we are assuming f16/bf16 have the same optimal perfconfigs, right? same for f8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider doing dataType.getIntOrFloatBitWidth() == 8 && isa < FloatType > (dataType), instead of matching strings. I think that's more robust for future datatypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion.
We don't tune for bf16, so the f16 lists are used instead. It is an assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking more into it, there seem to be more edge cases who went under the radar with the previous approach. Previously, because of the if-else chains, the selection of the quick tune lists was much less strict, and the else statements were acting like catch-alls. For example, the gfx1200 quick tune list would be selected for a gfx942 non-accel problem, because we don't have any non-accel list for gfx942. A lot of tests fail now because of this. Also, for any non-accel case we would return f32 lists, because we don't have non-accel lists for other types. The non-accel cases seem to happen because of the following line:
rocMLIR/mlir/tools/rocmlir-gen/rocmlir-gen.cpp
Line 4350 in 68727b9
| convGenerator.flipAccel(); |
Any thoughts? On one hand it is incorrect to return a list for the wrong arch and/or data type, but on the other hand we have tests where this happens. Maybe it doesn't matter that much in these non-accel cases, and we can just fall back to any list?
@umangyadav Can you also chime in?
3d29f0f to
3bbfa38
Compare
6c389c2 to
c6f386d
Compare
digits of identifier for gfx1* architectures.
getArchName so it handles gfx90a.
2e174d0 to
a20b5da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors tuning parameter lookup in the Rock dialect by introducing a template-based ParamLookupTable class to replace hardcoded architecture-specific logic. The changes also improve the quickTuningGen.py script to support dynamic lookup table generation and fix various parameter naming inconsistencies.
- Replaces large switch statements in C++ with a centralized lookup table mechanism
- Adds dynamic lookup table generation and insertion logic to the Python tuning script
- Standardizes parameter naming conventions (e.g.,
no_splitk→no_splitkthroughout)
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/lib/Dialect/Rock/Tuning/GridwiseGemmParams.cpp | Introduces ParamLookupTable template class to replace hardcoded parameter lookups with table-driven approach |
| mlir/utils/performance/analysis/quickTuningGen.py | Adds lookup table generation, refactors parameter name logic, fixes naming inconsistencies (splitK → splitk) |
| mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc | Adds lookup table entries for NonAccel and Accel parameter mappings |
| mlir/include/mlir/Dialect/Rock/Tuning/GridwiseGemmParams.h | Adds friend declarations for ParamLookupTable to access private static members |
| mlir/utils/performance/tuningRunner.py | Adds index=False parameter to CSV export for cleaner debug output |
| mlir/test/Dialect/Rock/affix_tuning_params.mlir | Updates test expectations with new tuning parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| replacment = f'{begin_marker}\n{new_content}\n{end_marker}' | ||
| content = pattern.sub(replacment, content) |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'replacment' to 'replacement'.
| replacment = f'{begin_marker}\n{new_content}\n{end_marker}' | |
| content = pattern.sub(replacment, content) | |
| replacement = f'{begin_marker}\n{new_content}\n{end_marker}' | |
| content = pattern.sub(replacement, content) |
| with open(file_path, 'r') as file: | ||
| content = file.read() | ||
|
|
||
| pattern = re.compile(f'{mapping}', re.DOTALL) |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern is treating mapping as a literal regex pattern instead of escaping it. This will fail if mapping contains special regex characters like brackets or braces. Use re.escape(mapping) to treat it as a literal string.
| pattern = re.compile(f'{mapping}', re.DOTALL) | |
| pattern = re.compile(re.escape(mapping), re.DOTALL) |
| usage: quickTunerGen.py [-h] --input-dir INPUT_DIR --op {gemm,conv} [--th TH] --arch ARCH [--update] [--no-splitK] | ||
| usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitK | ||
| usage: quickTunerGen.py [-h] --input-dir INPUT_DIR --op {gemm,conv} [--th TH] --arch ARCH [--update] [--no-splitk] | ||
| usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'exsample' to 'example'.
| usage exsample: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk | |
| usage example: python3 quickTuningGen.py --input-dir tunedData --op conv --arch gfx90a --update --no-splitk |
| default=False, | ||
| action='store_true', | ||
| help='Removing the spliK factor from the generated list') | ||
| help='Removing the Split-K factor from the generated list') |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Help text uses present participle 'Removing' instead of imperative form. Change to 'Remove the Split-K factor from the generated list' for consistency with typical argparse help conventions.
| help='Removing the Split-K factor from the generated list') | |
| help='Remove the Split-K factor from the generated list') |
| // Find the lexicographically closest relative | ||
| static std::string findFallback(const std::string &target) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some docstring on what this is trying to find ?
|
Would it be possible to add some tests ? |
Uh oh!
There was an error while loading. Please reload this page.