cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

illwieckz · 2025-03-05T15:06:56Z

cmake: use -flto=auto compiler flag when supported

Use -flto=auto compiler flag when supported, this silence this
GCC warning:

lto-wrapper: warning: using serial compilation of # LTRANS jobs

This also greatly speeds-up the linkage time as it enables LTO
multithreading in GCC (either by using Make jobserver if detected,
either by detecting CPU cores).

Also always set LTO if enabled.

Similar to:

cmake: use -flto=auto compiler flag when supported Daemon#1571

cmake: rework the fast-math enablement and force the disablement

Some compilers may enable fast-math by default (example: ICC).
Some contractions are still safe and can be enabled.

cmake: fix typo

Use -flto=auto compiler flag when supported, this silence this GCC warning: > lto-wrapper: warning: using serial compilation of # LTRANS jobs This also greatly speeds-up the linkage time as it enables LTO multithreading in GCC (either by using Make jobserver if detected, either by detecting CPU cores). Also always set LTO if enabled.

Some compilers may enable fast-math by default (example: ICC). Some contractions are still safe and can be enabled.

slipher

Any numbers on performance impact of the strict math flags? Reproducible floating-point operations are of somewhat niche interest and have limited chance to be reliably achieved between differing compilers and architectures, so I think that would be a bad default if it costs more than a few percent of speed.

slipher · 2025-03-19T10:11:10Z

CMakeLists.txt

@@ -88,12 +98,8 @@ if (MSVC)
 	endif()

 	if (USE_LTO)
-		set_cxx_flag("/GL" RELEASE)
-		set_cxx_flag("/GL" RELWITHDEBINFO)


The point of these is to have different flags per configuration in a multi-configuration build, so that you can test a Release build with LTO and a Debug build without it from the same IDE instance.

It's OK to remove this for the GCC-style options since it gets in the way of flag tests, but there is no need to remove it for MSVC.

OK, this looked overly verbose to me but I didn't know MSVC required such explicit boilerplate.
Now reverted.

Forgot to push, now pushed. 😆

illwieckz · 2025-07-17T00:09:02Z

Any numbers on performance impact of the strict math flags? Reproducible floating-point operations are of somewhat niche interest and have limited chance to be reliably achieved between differing compilers and architectures, so I think that would be a bad default if it costs more than a few percent of speed.

It looks very niche to me to care about fast math in an offline conversion tool, the option can still be enabled for those wanting to feature it as a lib. In fact it's even safer for those building against crnlib to not have the fast math being enabled for their whole program in their back.

Speed difference between fast math or not is negligible (see below).

While reproducibility is fragile when fast math is disabled, it makes it impossible when it is enabled, I don't see the benefit of it.

The time difference may be exaggerated by slower image decoding (like slower PNG decoding).

Benchmarks

Environment

Test machine: 16 cores, 32 threads AMD Zen 2 CPU, DDR4 RAM with enough space to fit the whole benchmark corpus in cache.

Protocol

Images are put in memory cache before the runs and during the run images are read from memory cache and written to /dev/null. Crunch is a release build built with LTO enabled.

32 crunch executions are done in parallel (one crunch per CPU thread, one image per crunch execution), and each crunch execution uses 15 threads (because it uses max((nproc - 1), 15) by default).

About 10 runs are done sequentially for every configuration, the 5 best ones are kept (to filter out runs that may be disturbed by random unrelated tasks spawning in the background).

# Print number of images to convert.
find *.???dir \( -name '*.jpg' -o -name '*.png' -o -name '*.tga' \) | wc -l

# Preload image files into memory.
find *.???dir \( -name '*.jpg' -o -name '*.png' -o -name '*.tga' \) -print0 \
| parallel --will-cite -0 -I{} -P"$(nproc)" dd if={} of=/dev/null bs=1M status=none

# Convert images files.
time find *.???dir \( -name '*.jpg' -o -name '*.png' -o -name '*.tga' \) -print0 \
| parallel --will-cite --halt 0 -0 -I{} -P"$(nproc)" crunch -quiet -file {} -fileformat crn -out /dev/null

I only added --halt 0 to skip a Xonotic jpg that is not accepted by crunch (I will report), no image is skipped in other corpus.

Results

Corpus: UnvanquishedAssets

Files: 2656 images; 153 jpg, 2486 png, 17 tga.

Fast math times:

real	2m40,024s
real	2m40,504s
real	2m40,575s
real	2m40,610s
real	2m40,692s

Strict math times:

real	2m46,021s
real	2m46,026s
real	2m46,105s
real	2m46,517s
real	2m46,680s

Average time difference per image: 0.002 s

Corpus: InterstellarOasis

Files: 5160 images; 3097 jpg, 2063 png, 0 tga.

Fast math times:

real	1m20,441s
real	1m20,481s
real	1m20,557s
real	1m20,617s
real	1m20,531s

Strict math times:

real	1m23,275s
real	1m23,369s
real	1m23,084s
real	1m23,050s
real	1m22,908s

Average time difference per image: 0.0005 s

Corpus: Xonotic

Files: 10041 images; 2351 jpg, 101 png, 7589 tga.

Fast math times:

real	5m48,586s
real	5m51,403s
real	5m51,649s
real	5m51,784s
real	5m51,985s

Strict math times:

real	5m57,966s
real	5m58,812s
real	5m59.601s
real	6m0,113s
real	6m0,327s

Average time difference per image: 0.0008 s

slipher · 2025-07-26T01:48:59Z

CMakeLists.txt

 		# By default, GCC uses -std=gnu* and then enables -ffp-contract=fast even if -ffast-math is not enabled.
-		set_cxx_flag("-ffp-contract=off")
+		# See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
+		# GCC fast contractions (-ffp-contract=fast) should be safe, but aren't on arm64 with GCC 12.


What do you mean by "aren't safe"?

I should have been more verbose at the time because I forgot the exact issue.

I guess it meant it doesn't produce the same result as compiling without any fast stuff.

Given the purpose of my patch is to maximize the chance the files are reproducible, I probably noticed that on ARM such option broke the reproducibility. It's probably a similar problem than using x87 instead of SSE on x86, maybe some ARM fused operations break IEEE compliance.

For sure, since I mentioned a specific GCC version, that was the result of me testing that specific compiler on the said hardware, and I was testing the reproducibility of converted files. GCC 12 is the Debian Bookworm GCC, and I use Debian Bookworm on my Arm boards.

That's unobvious so it should be explained in the comment.

As we've discussed in the past, I don't think floating point reproducibility is a good goal -- the language and compilers don't make any attempt to provide such guarantees. Platforms using x87 floating point are an easy example where you're not going to achieve it. But for GCC/Clang those options to disable fast math are good in any case, since fast math makes the software too unreliable.

slipher · 2025-07-26T01:56:56Z

CMakeLists.txt

@@ -95,6 +95,9 @@ if (MSVC)
 		# and https://devblogs.microsoft.com/cppblog/the-fpcontract-flag-and-changes-to-fp-modes-in-vs2022/
 		# By default, MSVC doesn't enable the /fp:fast option.
 		set_cxx_flag("/fp:fast")
+	else()
+		# Precise model (/fp:precise) should do safe contractions, but we should not trust that (see below).
+		set_cxx_flag("/fp:strict")


/fp:precise would be a better default for MSVC. The documentation says /fp:strict is mostly needed if you want floating point exceptions. I did a little test with 57 PNG files and got 246.162s (self-reported) with master and 259.846s with this branch.

illwieckz added 3 commits March 5, 2025 14:45

cmake: rework the fast-math enablement and force the disablement

59c53e9

Some compilers may enable fast-math by default (example: ICC). Some contractions are still safe and can be enabled.

cmake: fix typo

1baede8

illwieckz force-pushed the illwieckz/cmake branch from 166155c to 867022a Compare March 5, 2025 15:09

illwieckz changed the title ~~Illwieckz/cmake~~ cmake: use -flto=auto compiler flag when supported Mar 17, 2025

illwieckz force-pushed the illwieckz/cmake branch from 867022a to 160640b Compare March 17, 2025 12:21

illwieckz changed the title ~~cmake: use -flto=auto compiler flag when supported~~ cmake: use -flto=auto compiler flag when supported, rework fast-math disablement Mar 17, 2025

illwieckz force-pushed the illwieckz/cmake branch from 867022a to 629da09 Compare March 17, 2025 17:01

ci: fast-math is now disabled by default

9174b64

illwieckz force-pushed the illwieckz/cmake branch 3 times, most recently from 2abc19b to 27c13ac Compare March 17, 2025 17:45

slipher reviewed Mar 19, 2025

View reviewed changes

illwieckz force-pushed the illwieckz/cmake branch from 27c13ac to 9174b64 Compare July 18, 2025 09:12

slipher reviewed Jul 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

Uh oh!

illwieckz commented Mar 5, 2025 •

edited

Loading

Uh oh!

slipher left a comment

Uh oh!

slipher Mar 19, 2025

Uh oh!

illwieckz Jul 16, 2025

Uh oh!

illwieckz Jul 18, 2025

Uh oh!

illwieckz commented Jul 17, 2025

Uh oh!

slipher Jul 26, 2025

Uh oh!

illwieckz Jul 26, 2025

Uh oh!

illwieckz Jul 26, 2025

Uh oh!

illwieckz Jul 26, 2025

Uh oh!

slipher Jul 26, 2025 •

edited

Loading

Uh oh!

slipher Jul 26, 2025

Uh oh!

Uh oh!

cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

Are you sure you want to change the base?

cmake: use -flto=auto compiler flag when supported, rework fast-math disablement #80

Uh oh!

Conversation

illwieckz commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!