Describe the bug
Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0.
To Reproduce
docker image has been upgraded to 0.12.1, which is much slower when running Qwen1.5-14B-Chat-GPTQ-Int4 compared to 0.11.0.
Expected behavior
The number of tockens per second after the upgrade is the same as that before the upgrade.
Additional context
Our startup parameter configuration:
