Skip to content

Conversation

@corneliusroemer
Copy link
Member

With lto on, I didn't get any insight into what was going on inside sort, everything was apparently inlined by LTO.

With lto off, I now see what's going on.

With lto on, I didn't get any insight into what was going on inside `sort`, everything was apparently inlined by LTO.

With lto off, I now see what's going on.
@github-actions
Copy link

github-actions bot commented Jun 9, 2025

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Jun 10, 2025

I now see

Profiling a build that is not built the same way as the binary that end-user runs (which is release binary with LTO), means that you might "see things" that don't exist. Enabling debug symbols, reducing optimization level and disabling LTO provides an approximation. It can be useful sometimes. But you'd typically profile and benchmark both, LTO and non-LTO builds, whichever is more convenient at the time, so I don't see much value PR'ing this toggle back and forth. But it's not hard to turn it back on, so feel free to merge if you need it.

P.S. LTO does way more than inlining. Inlining is done by the compiler. LTO enables optimization across module and library boundaries, which is way more intrusive. It could rearrange large chunks of code in very surprising ways and then slap on top of that the optimizations which were not possible in the non-LTO build. Though if there are blatant perf neglections (which I have committed plenty :), especially memory- and threading-related, then LTO won't change much. That's the type of problems where turning off LTO can be useful to see the call stack more clearly, but it's important to understand the limits of this approximation and to always verify your optimizations in a release build.

@corneliusroemer
Copy link
Member Author

You just don't see useful things in stacktrace with lto on - maybe we could have 2 profiles, one with lto one without.

I also noticed we might actually get faster performance with thin lto rather than fat!

@ivan-aksamentov
Copy link
Member

Yep, feel free to setup the way you like it. I don't feel strongly.

Thin LTO, PGO, O3 vs Os, and other tunables are also fun to explore. Multiply this by number of platforms and architectures and it's now a full-time job :)

But in all cases it's important to make sure that we optimize for what user is getting and not just chasing pink elephants that don't exist in prod (e.g. a useful call stack - it's just not there anymore)

@corneliusroemer
Copy link
Member Author

But in all cases it's important to make sure that we optimize for what user is getting and not just chasing pink elephants that don't exist in prod (e.g. a useful call stack - it's just not there anymore)

I'm not chasing pink elephants :) - I just use profiling to find likely culprits, and test with release to confirm the speedups work.

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Jun 10, 2025

I'm not chasing pink elephants :) - I just use profiling to find likely culprits, and test with release to confirm the speedups work.

This is a correct and effective approach! 🐘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants