Skip to content

Conversation

olia110
Copy link
Contributor

@olia110 olia110 commented Aug 25, 2025

This Pull request:

An optimized code path has been added for the common case where the Softmax axis is the last dimension (axis == size - 1).

@olia110 olia110 requested a review from lmoneta as a code owner August 25, 2025 15:53
@sanjibansg sanjibansg self-assigned this Aug 25, 2025
Copy link
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR,
Just some corrections for the case the shapes are not fully specified as integer values, but parametrised (dynamic)

auto stride = UTILITY::ComputeStrideFromShape(fShape);
size_t size = fShape.size();
auto length_str = ConvertDimShapeToLength(fShape);
size_t length = std::stoul(length_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to convert to an integer, since in some case it might fail : e.g length = " N * 32"

// Check if this is the special case where memory is contiguous.
if (axis == static_cast<int>(size - 1)) {
size_t axis_size = std::stoul(fShape[axis].GetVal());
size_t num_rows = length / axis_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of length is not an integer , this could be written as following:

std::string axis_size = fShape[axis].GetVal());
std::string num_rows; 
if (IsInteger(length) && IsInteger(axis_size)) { 
  num_rows = std::to_string( std::stoul(length_str)/ std::stoul(fShape[axis].GetVal()));
} else  {
  num_rows = "(" + length_str + ") /" + fShape[axis].GetVal():
}
    

@olia110 olia110 force-pushed the feature/contiguous-memory-fast-path branch from c51c2d9 to 5960569 Compare August 26, 2025 15:47
Copy link
Contributor

@sanjibansg sanjibansg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this!

Copy link
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just a couple of small comments.
Thank you for this improvement!

auto length = ConvertDimShapeToLength(fShape);
auto stride = UTILITY::ComputeStrideFromShape(fShape);
size_t size = fShape.size();
auto length_str = ConvertDimShapeToLength(fShape);
int axis = fAttrAxis < 0 ? size + fAttrAxis : fAttrAxis;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could have "size_t" for axis instead of int

out << SP << SP << "tensor_" << fNY << "[i] /= sum;\n";

// Check if this is the special case where memory is contiguous.
if (axis == static_cast<int>(size - 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we define axis as size_t we don;t need the casting here

}

out << "\n" << SP << "//------ SOFTMAX - " << size << " " << length_str << " " << axis << "\n";
out << SP << "for (size_t i = 0; i < " << num_rows << "; ++i) {\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use here int in the generated code to save memory, as it is done in the general case below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants