Skip to content

Conversation

@poznano-amd
Copy link
Contributor

When looking at perf reports of vLLM workloads, we noticed it was challenging to understand which GPU kernels were the most important. This was due to the fact that our cpu_ops had a list of GPU kernels, and we were not able to easily view them in the spreadsheet.

Therefore, we added an optional flag:

  --enable_kernel_summary
                        Enable kernel summary sheet in the report. Disabled by default.

This new kernel_summary sheet is has a list of kernels sorted by the % of total time.

Here's an example spreadsheet: [available by request]

@poznano-amd poznano-amd requested a review from ajassani October 15, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants