-
Notifications
You must be signed in to change notification settings - Fork 244
[CK_BUILDER] First fwd convolution builder implementation #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- Add experimental/builder directory with README documentation. - Create initial test infrastructure with CMakeLists.txt and placeholder test. - Update root CMakeLists.txt to support CK_EXPERIMENTAL_BUILDER option. - Update .gitignore to not treat `experimental/builder` as a CMake build directory. This establishes the directory structure for a high-level builder pattern that will provide a semantically-clear interface for constructing CK operations, with initial focus on convolution kernels for MIOpen integration.
- Add experimental/builder CMakeLists.txt with proper subdirectory structure - Add placeholder include/ck_tile/builder CMakeLists.txt for header installation - Fix gtest.cmake to use include_guard to prevent multiple inclusions - Update root CMakeLists.txt to include full builder directory instead of just tests
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…r-first-fwd-conv-builder
| case DataType::FP32: return "FP32"; | ||
| case DataType::BF16: return "BF16"; | ||
| case DataType::FP8: return "FP8"; | ||
| case DataType::I8: return "I8"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "S8" more common (signed eight bit integer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to S8. Although we are not yet using the type.
| { | ||
| switch(layout) | ||
| { | ||
| case GroupConvLayout::CHANNELS_FIRST: return "Channels-first (NCHW)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to figure this out. I really like channels-first and channels-last for convolutions, but it looks like grouped convolution turn this into an alphabet soup. I think the filter in grouped convolutions can still be channels-first or channels-last, but the image and feature map (input and output) tensors appear to have a lot of different conventions where the group index fits in the layout. This is OK for now, but we probably want to think about how best to describe these layouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised the implementation and introduced separate conv layouts enums GroupConvLayout1D, GroupConvLayout2D, and GroupConvLayout3D. They indicate explicitly the layout of the input, filter, and output tensors. I'm wondering if they should be merged into a single enum since they carry a bit redundant dimension information. What do you think?
| case ConvFwdSpecialization::FILTER_3x3: | ||
| return ck::tensor_operation::device::ConvolutionForwardSpecialization::Filter3x3; | ||
| case ConvFwdSpecialization::DEFAULT: | ||
| default: return ck::tensor_operation::device::ConvolutionForwardSpecialization::Default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robin suggested removing the "default:", making the function consteval, and throwing an error string for unexpected values. That way we get a compile time error if we have an unsupported input value. In principle the compiler will also error if we omit "default:" and miss a value, but the consteval + throw pattern is robust if we are only compile-time usage. The downside is the function can only be used at compile time, but that may be correct for this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make all factory internal function consteval rather than constexpr? At least the ones that use switch?
- Add missing 2D layouts: GNHWC_GKYXC_GNHWK, NGCHW_GKCYX_NGKHW - Add missing 3D layout: GNDHWC_GKZYXC_GNDHWK - Add 1D layouts (NWGC, NGCW, GNWC, NGCW_GKCX) for future support - Add 3 tests for new 2D/3D layouts - All tests pass (5/5)
- Add test for 2D NGCHW_GKYXC_NGKHW (channels-first) with Filter1x1Stride1Pad0 - Add test for 3D NDHWGC_GKZYXC_NDHWGK (channels-last) - All 7 tests pass (complete coverage for all 2D/3D forward layouts)
…r-first-fwd-conv-builder
| { t.src_vector_dim } -> std::convertible_to<size_t>; | ||
| { t.src_scalar_per_vector } -> std::convertible_to<size_t>; | ||
| { t.dest_scalar_per_vector_k1 } -> std::convertible_to<size_t>; | ||
| { t.add_extra } -> std::convertible_to<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be name lds_padding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice
| concept InputVectorTransferDescriptor = requires(T t) { | ||
| { t.src_vector_dim } -> std::convertible_to<size_t>; | ||
| { t.src_scalar_per_vector } -> std::convertible_to<size_t>; | ||
| { t.dest_scalar_per_vector_k1 } -> std::convertible_to<size_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lds_dst_scalar_per_vector is better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed as suggested.
| template <typename T> | ||
| concept InputVectorTransferDescriptor = requires(T t) { | ||
| { t.src_vector_dim } -> std::convertible_to<size_t>; | ||
| { t.src_scalar_per_vector } -> std::convertible_to<size_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add boolean IsDirectLoad? It will be introduced in #3082
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added requirement for is_direct_load member variable.
| { | ||
| return ck::tensor_operation::device::ConvolutionForwardSpecialization::Filter1x1Stride1Pad0; | ||
| } | ||
| else if constexpr(specialization == ConvFwdSpecialization::ODD_C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can delete this specialization since this is duplication of ODD_C. These instances should be dleted in #2281
| FP16, | ||
| BF16, | ||
| FP8, | ||
| S8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I8 instead of S8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially I8, but @shumway was thinking that S8 is more standard for signed 8-bit integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general convention is I for signed, U for unsigned
| enum class BlockGemmPipelineVersion | ||
| { | ||
| V1, | ||
| V3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed V2
| // Fused element-wise operations. | ||
| enum class ElementwiseOperation | ||
| { | ||
| BIAS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed BIAS_BNORM_CLAMP
| concept ConvSignatureDescriptor = requires(T t) { | ||
| { t.spatial_dim } -> std::convertible_to<unsigned int>; | ||
| { t.direction } -> std::convertible_to<ConvDirection>; | ||
| requires std::convertible_to<decltype(t.layout), GroupConvLayout1D> || | ||
| std::convertible_to<decltype(t.layout), GroupConvLayout2D> || | ||
| std::convertible_to<decltype(t.layout), GroupConvLayout3D>; | ||
| { t.data_type } -> std::convertible_to<DataType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about fused (or actually elementwise, cause it can be applied on inputs as well) op? Should we actually check it ?
Proposed changes
Added concepts and builder functionality for building forward convolutions from CK library. The limitation of the current implementation is that the convolution specialization is hard-coded to
DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3. The implementation will be generalized to other specializations in a later PR. Added unit tests to verify that the builder is building valid instances. Most of the checking for parameters is done at compile time.The implementation in this PR builds upon the prototype from this branch: https://github.com/ROCm/composable_kernel/tree/jshumway/convolution-builder
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion