-
Notifications
You must be signed in to change notification settings - Fork 923
Description
Describe the bug
Hello, I noticed that Wasmer's LLVM backend performs poorly when executing the f64.min instruction.
The specific timing data is as follows:
| time | |
|---|---|
| wasmer_llvm | 1.85466 |
| wasmedge_jit | 0.018681 |
| wamr_llvm_jit | 0.01863 |
The data is in seconds, and each data is the result of ten executions and averages.
Environment
The runtime tools are all built on release and use JIT mode.
- wasmer: 6.0.1
- WAMR: iwasm 2.4.0
- wasmedge: 0.15.0
- wabt: 1.0.27
- llvm: 18.1.8
- Host OS: Ubuntu 22.04.5 LTS x64
- CPU: 11th Gen Intel® Core™ i7-11700 @ 2.50GHz × 16
- rustc: rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
Steps to reproduce
test_case.wat
(module
(type (;0;) (func (param i32)))
(type (;1;) (func))
(import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
(func (;1;) (type 1)
(local i32)
(local.set 0
(i32.const 0))
(loop
(drop
(f64.min
(f64.const 0x1.1ff8b184f99c5p+1020 (;=1.26388e+307;))
(f64.const 0x1.1ff8b184f99c5p+1020 (;=1.26388e+307;))))
(local.set 0
(i32.add
(local.get 0)
(i32.const 1)))
(br_if 0
(i32.ne
(local.get 0)
(i32.const 0))))
(call 0
(i32.const 0))
(unreachable))
(export "_start" (func 1))
(memory (;0;) 1)
(export "memory" (memory 0)))
wat2wasm test_case.wat -o test_case.wasm
# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm --llvm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm
Expected behavior
In the test case, I placed the f64.min instruction inside a loop in order to amplify performance differences.
Perhaps Wasmer's LLVM backend has differences in the implementation or optimization of the f64.min instruction compared to other runtime tools, resulting in suboptimal performance.
Actual behavior
By the way, I also tested the performance of the Cranelift backend with this test case, and the results showed no statistical difference compared to the wasmtime tool.
The specific timing data is as follows:
| time | |
|---|---|
| wasmer_cranelift | 0.93366 |
| wasmtime | 0.98707 |
It can be seen that the time for the LLVM backend is also greater than that of the Cranelift backend, which further deepens my doubts about the performance of the LLVM backend when dealing with the f64.min instruction.
Additional context
If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!