Skip to content

Commit e1d820f

Browse files
authored
add qwen2.5 recipe and refine readme (#338)
* edit profile readme Signed-off-by: Zhang, Weiwei1 <[email protected]> * add Qwen2.5 model recipes Signed-off-by: Zhang, Weiwei1 <[email protected]> * refine doc Signed-off-by: Zhang, Weiwei1 <[email protected]> * fixtypo Signed-off-by: Zhang, Weiwei1 <[email protected]> --------- Signed-off-by: Zhang, Weiwei1 <[email protected]>
1 parent 7a28ab3 commit e1d820f

File tree

5 files changed

+635
-30
lines changed

5 files changed

+635
-30
lines changed

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,13 @@ more accuracy data and recipes across various models.
2626
<div align="left">
2727

2828
## What's New
29-
* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md).
29+
* [2024/11] We provide experimental support for VLLM quantization, please check out [MLLM README](./auto_round/mllm/README.md)
30+
* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md)
3031
* [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out
3132
their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1)
3233
* [2024/10] Important update: We now support full-range symmetric quantization and have made it the default
3334
configuration. This configuration is typically better or comparable to asymmetric quantization and significantly
3435
outperforms other symmetric variants, especially at low bit-widths like 2-bit, check out [some accuracy data](./docs/full_range_sym.md).
35-
* [2024/09] AutoRound format supports several LVM models, check out the
36-
examples [Qwen2-Vl](./examples/multimodal-modeling/Qwen-VL),[Phi-3-vision](./examples/multimodal-modeling/Phi-3-vision), [Llava](./examples/multimodal-modeling/Llava)
3736
* [2024/08] AutoRound format supports Intel Gaudi2 devices. Please refer
3837
to [Intel/Qwen2-7B-int4-inc](https://huggingface.co/Intel/Qwen2-7B-int4-inc).
3938
* [2024/08] AutoRound introduces several experimental features, including fast tuning of norm/bias parameters (for 2-bit
@@ -317,7 +316,11 @@ release most of the models ourselves.
317316
| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct_sym.md) |
318317
| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct_sym.md) |
319318
| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b_sym.md) |
320-
| Qwen/Qwen2.5-7B-Instruct | [recipe](./docs/Qwen2.5-7B-Instruct_sym.md) |
319+
| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct_sym.md) |
320+
| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct_sym.md) |
321+
| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct_sym.md) |
322+
| Qwen/Qwen2.5-Coder-32B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit) |
323+
| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct_sym.md) |
321324
| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) |
322325
| meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) |
323326
| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) |
@@ -373,3 +376,4 @@ If you find AutoRound useful for your research, please cite our paper:
373376
}
374377
```
375378

379+

auto_round/mllm/README.md

Lines changed: 19 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,14 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
2525
autoround.save_quantized(output_dir, format='auto_round', inplace=True)
2626
```
2727

28+
- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k and llava_instruct_150k. It can be a custom one. Please note that the effectiveness of the Llava calibration dataset has only been validated on five models so far.
29+
30+
- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component.
31+
32+
- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
33+
34+
for more hyperparameters introduction, please refer [Homepage Detailed Hyperparameters](../../README.md#api-usage-gaudi2cpugpu)
35+
2836
<details>
2937
<summary style="font-size:17px;">Basic Usage (Gaudi2/CPU/GPU)</summary>
3038
A user guide detailing the full list of supported arguments is provided by calling ```auto-round-mllm -h``` on the terminal. Alternatively, you can use ```auto_round_mllm``` instead of ```auto-round-mllm```. Set the format you want in `format` and
@@ -40,11 +48,6 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
4048
--output_dir ./tmp_autoround
4149
```
4250

43-
- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k. It can be a custom one.
44-
45-
- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component.
46-
47-
- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
4851

4952
</details>
5053

@@ -56,19 +59,6 @@ For mllm, we used **text-only** calibration dataset (NeelNanda/pile-10k) as our
5659

5760
Through argument --dataset(text file), user can use other datasets such as "liuhaotian/llava_conv_58k" "liuhaotian/llava_instruct_80k", "liuhaotian/llava_instruct_150k" or a file path to use local file.
5861

59-
60-
### Support List
61-
62-
The llava calibration dataset supports the five existing MLLMs.
63-
64-
|Model |Eval Lib |calibration dataset|Feasibility of quantification|
65-
|---------------|-----------|-------------------|--------------------|
66-
|Qwen/Qwen2-VL-Instruct |vlmeval |llava ||
67-
|meta-llama/Llama-3.2-11B-Vision |vlmeval/lmms_eval |llava ||
68-
|microsoft/Phi-3.5-vision-instruct |vlmeval |llava ||
69-
|liuhaotian/llava-v1.5-7b |lmms_eval |llava ||
70-
|THUDM/cogvlm2-llama3-chat-19B |lmms_eval |llava ||
71-
7262
</details>
7363

7464

@@ -78,15 +68,17 @@ The llava calibration dataset supports the five existing MLLMs.
7868

7969
### Support Matrix
8070

81-
The design of the MLLM model API is not uniform, and some models do not support the quantization nontext module. Quantization of the vision components of Llama-3.2-11B-Vision, Phi-3.5-vision-instruct and llava-v1.5-7b is currently supported.
71+
For typical VLLMs, we assume that the default quantization, which excludes quantizing the visual component, is supported. The design of vision components in MLLM model APIs is not standardized, and some models do not support the quantization of non-text modules.
8272

83-
|Model |Eval Lib |quant nontext module|
84-
|---------------|-----------|-------------------|
85-
|Qwen/Qwen2-VL-Instruct |vlmeval |- |
86-
|meta-llama/Llama-3.2-11B-Vision |lmms_eval ||
87-
|microsoft/Phi-3.5-vision-instruct |vlmeval ||
88-
|liuhaotian/llava-v1.5-7b |lmms_eval |- |
89-
|THUDM/cogvlm2-llama3-chat-19B |lmms_eval ||
73+
Currently, the quantization of vision components is supported for Llama-3.2-11B-Vision, Phi-3.5-Vision-Instruct, and Llava-v1.5-7B.
74+
75+
| Model | Eval Lib | calibration dataset | quant nontext module |
76+
|--------------|-----------|---------------------|----------------------|
77+
| Qwen2-VL | vlmeval | pile/llava | - |
78+
| Llama-Vision | lmms_eval | llava ||
79+
| Phi3-Vision | vlmeval | pile/llava ||
80+
| Llava-v1.5 | lmms_eval | pile/llava | - |
81+
| CogVLM2 | lmms_eval | pile/llava ||
9082

9183

9284

@@ -140,3 +132,4 @@ For more details on quantization, inference, evaluation, and environment, see th
140132

141133

142134

135+

0 commit comments

Comments
 (0)