-
Notifications
You must be signed in to change notification settings - Fork 66
chore: switch off LTO for profiling profile #1648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
With lto on, I didn't get any insight into what was going on inside `sort`, everything was apparently inlined by LTO. With lto off, I now see what's going on.
Profiling a build that is not built the same way as the binary that end-user runs (which is release binary with LTO), means that you might "see things" that don't exist. Enabling debug symbols, reducing optimization level and disabling LTO provides an approximation. It can be useful sometimes. But you'd typically profile and benchmark both, LTO and non-LTO builds, whichever is more convenient at the time, so I don't see much value PR'ing this toggle back and forth. But it's not hard to turn it back on, so feel free to merge if you need it. P.S. LTO does way more than inlining. Inlining is done by the compiler. LTO enables optimization across module and library boundaries, which is way more intrusive. It could rearrange large chunks of code in very surprising ways and then slap on top of that the optimizations which were not possible in the non-LTO build. Though if there are blatant perf neglections (which I have committed plenty :), especially memory- and threading-related, then LTO won't change much. That's the type of problems where turning off LTO can be useful to see the call stack more clearly, but it's important to understand the limits of this approximation and to always verify your optimizations in a release build. |
|
You just don't see useful things in stacktrace with lto on - maybe we could have 2 profiles, one with lto one without. I also noticed we might actually get faster performance with thin lto rather than fat! |
|
Yep, feel free to setup the way you like it. I don't feel strongly. Thin LTO, PGO, O3 vs Os, and other tunables are also fun to explore. Multiply this by number of platforms and architectures and it's now a full-time job :) But in all cases it's important to make sure that we optimize for what user is getting and not just chasing pink elephants that don't exist in prod (e.g. a useful call stack - it's just not there anymore) |
I'm not chasing pink elephants :) - I just use profiling to find likely culprits, and test with release to confirm the speedups work. |
This is a correct and effective approach! 🐘 |
With lto on, I didn't get any insight into what was going on inside
sort, everything was apparently inlined by LTO.With lto off, I now see what's going on.