-
Notifications
You must be signed in to change notification settings - Fork 0
[WIP] optimize padding with buffer_load_if/buffer_store_if #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fmha_attemp_async_copy_unify
Are you sure you want to change the base?
Conversation
| index_t src_thread_element_offset, | ||
| index_t src_element_space_size) | ||
| index_t src_element_space_size, | ||
| index_t is_valid_element = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is_valid_element actually a bool parameter? (I guess you only use index_t for POC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are right, this should be a bool
| template <> struct t2s<ck::bf8_t> { static constexpr const char * name = "bf8"; }; | ||
| // clang-format on | ||
|
|
||
| __host__ static std::string GetName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GetName() can be implemented by calling something like miopen::get_type_name(). And we need another name for this function (maybe GetEncodedName()?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the naming inside c++ is can print out something can help debug. This is not the symbol name(though we can mock a symbol name inside generate.py). And the name should have all the information to distinguish between different type of kernels, so it could have lot of code. The pro inside this kernel template is we can reuse this if not using our generate.py system.
And yes, if using GetEncodedName() is OK
| make_tuple(Number<FmhaPipeline::kM0>{}, Number<FmhaPipeline::kN1>{}), | ||
| {i_m0, i_n1}); | ||
|
|
||
| // o_dram_window.foo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, will remove it :)
No description provided.