You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k and llava_instruct_150k. It can be a custom one. Please note that the effectiveness of the Llava calibration dataset has only been validated on five models so far.
29
+
30
+
-`quant_nontext_module`: whether to quantize non-text module, e.g. vision component.
31
+
32
+
-`extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
33
+
34
+
for more hyperparameters introduction, please refer [Homepage Detailed Hyperparameters](../../README.md#api-usage-gaudi2cpugpu)
A user guide detailing the full list of supported arguments is provided by calling ```auto-round-mllm -h``` on the terminal. Alternatively, you can use ```auto_round_mllm``` instead of ```auto-round-mllm```. Set the format you want in `format` and
@@ -40,11 +48,6 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
40
48
--output_dir ./tmp_autoround
41
49
```
42
50
43
-
-`dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k. It can be a custom one.
44
-
45
-
-`quant_nontext_module`: whether to quantize non-text module, e.g. vision component.
46
-
47
-
-`extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
48
51
49
52
</details>
50
53
@@ -56,19 +59,6 @@ For mllm, we used **text-only** calibration dataset (NeelNanda/pile-10k) as our
56
59
57
60
Through argument --dataset(text file), user can use other datasets such as "liuhaotian/llava_conv_58k" "liuhaotian/llava_instruct_80k", "liuhaotian/llava_instruct_150k" or a file path to use local file.
58
61
59
-
60
-
### Support List
61
-
62
-
The llava calibration dataset supports the five existing MLLMs.
63
-
64
-
|Model |Eval Lib |calibration dataset|Feasibility of quantification|
@@ -78,15 +68,17 @@ The llava calibration dataset supports the five existing MLLMs.
78
68
79
69
### Support Matrix
80
70
81
-
The design of the MLLM model API is not uniform, and some models do not support the quantization nontext module. Quantization of the vision components of Llama-3.2-11B-Vision, Phi-3.5-vision-instruct and llava-v1.5-7b is currently supported.
71
+
For typical VLLMs, we assume that the default quantization, which excludes quantizing the visual component, is supported. The design of vision components in MLLM model APIs is not standardized, and some models do not support the quantization of non-text modules.
82
72
83
-
|Model |Eval Lib |quant nontext module|
84
-
|---------------|-----------|-------------------|
85
-
|Qwen/Qwen2-VL-Instruct |vlmeval |- |
86
-
|meta-llama/Llama-3.2-11B-Vision |lmms_eval |✔ |
87
-
|microsoft/Phi-3.5-vision-instruct |vlmeval |✔ |
88
-
|liuhaotian/llava-v1.5-7b |lmms_eval |- |
89
-
|THUDM/cogvlm2-llama3-chat-19B |lmms_eval |✔ |
73
+
Currently, the quantization of vision components is supported for Llama-3.2-11B-Vision, Phi-3.5-Vision-Instruct, and Llava-v1.5-7B.
0 commit comments