Skip to content

Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0 #1650

@WholeWorld-Timothy

Description

@WholeWorld-Timothy

Describe the bug

Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0.

To Reproduce

docker image has been upgraded to 0.12.1, which is much slower when running Qwen1.5-14B-Chat-GPTQ-Int4 compared to 0.11.0.

Expected behavior

The number of tockens per second after the upgrade is the same as that before the upgrade.

Additional context

Our startup parameter configuration:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions