Skip to content

The f64.min instruction causes a performance degradation in wasmer's LLVM backend #5771

@gaaraw

Description

@gaaraw

Describe the bug

Hello, I noticed that Wasmer's LLVM backend performs poorly when executing the f64.min instruction.

The specific timing data is as follows:

time
wasmer_llvm 1.85466
wasmedge_jit 0.018681
wamr_llvm_jit 0.01863

The data is in seconds, and each data is the result of ten executions and averages.

Environment

The runtime tools are all built on release and use JIT mode.

  • wasmer: 6.0.1
  • WAMR: iwasm 2.4.0
  • wasmedge: 0.15.0
  • wabt: 1.0.27
  • llvm: 18.1.8
  • Host OS: Ubuntu 22.04.5 LTS x64
  • CPU: 11th Gen Intel® Core™ i7-11700 @ 2.50GHz × 16
  • rustc: rustc 1.87.0 (17067e9ac 2025-05-09)
    binary: rustc
    commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
    commit-date: 2025-05-09
    host: x86_64-unknown-linux-gnu
    release: 1.87.0
    LLVM version: 20.1.1

Steps to reproduce

test_case.wat
(module
  (type (;0;) (func (param i32)))
  (type (;1;) (func))
  (import "wasi_snapshot_preview1" "proc_exit" (func (;0;) (type 0)))
  (func (;1;) (type 1)
    (local i32)
    (local.set 0
      (i32.const 0))
    (loop

      (drop
        (f64.min
          (f64.const 0x1.1ff8b184f99c5p+1020 (;=1.26388e+307;))
          (f64.const 0x1.1ff8b184f99c5p+1020 (;=1.26388e+307;))))
          
      (local.set 0
        (i32.add
          (local.get 0)
          (i32.const 1)))
      (br_if 0
        (i32.ne
          (local.get 0)
          (i32.const 0))))
    (call 0
      (i32.const 0))
    (unreachable))
  (export "_start" (func 1))
  (memory (;0;) 1)
  (export "memory" (memory 0)))
wat2wasm test_case.wat -o test_case.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmer run test_case.wasm --llvm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit test_case.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm test_case.wasm

Expected behavior

In the test case, I placed the f64.min instruction inside a loop in order to amplify performance differences.

Perhaps Wasmer's LLVM backend has differences in the implementation or optimization of the f64.min instruction compared to other runtime tools, resulting in suboptimal performance.

Actual behavior

By the way, I also tested the performance of the Cranelift backend with this test case, and the results showed no statistical difference compared to the wasmtime tool.

The specific timing data is as follows:

time
wasmer_cranelift 0.93366
wasmtime 0.98707

It can be seen that the time for the LLVM backend is also greater than that of the Cranelift backend, which further deepens my doubts about the performance of the LLVM backend when dealing with the f64.min instruction.

Additional context

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions