I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

TByte007 · 2025-05-09T02:27:36Z

TByte007
May 9, 2025

I comparing GPTQv2 vs BnB using vLLM. For BnB I just use the on the fly quantization. I tried all kinds of combination of options for GPTQv2 and it always comes out worse at accuracy. Even the 8bit is worse then 4bit BnB.
I tested it using "anli" and "hellaswag" on "Nitral-AI/Violet_Magcap-12B" model.
This is part of the results:
ANLI:

GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
9600/9600 [07:25<00:00, 21.55it/s]
2025-05-09:04:47:54
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1|      1|none  |     0|acc   |↑  |0.4830|±  |0.0158|
|anli_r2|      1|none  |     0|acc   |↑  |0.4710|±  |0.0158|
|anli_r3|      1|none  |     0|acc   |↑  |0.4583|±  |0.0144|

GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
9600/9600 [08:39<00:00, 18.47it/s]
2025-05-09:05:10:15
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1|      1|none  |     0|acc   |↑  |0.5160|±  |0.0158|
|anli_r2|      1|none  |     0|acc   |↑  |0.4800|±  |0.0158|
|anli_r3|      1|none  |     0|acc   |↑  |0.4858|±  |0.0144|

Bitsandbytes 4bit
9600/9600 [07:27<00:00, 21.45it/s]
2025-05-09:04:58:01
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1|      1|none  |     0|acc   |↑  |0.5220|±  |0.0158|
|anli_r2|      1|none  |     0|acc   |↑  |0.5050|±  |0.0158|
|anli_r3|      1|none  |     0|acc   |↑  |0.5017|±  |0.0144|

Hellswag:

BitsAndBytes 4bit
40168/40168 [31:39<00:00, 21.15it/s]
2025-05-02:05:24:22
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=20,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6262|±  |0.0048|
|         |       |none  |     0|acc_norm|↑  |0.8091|±  |0.0039|

GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText
40168/40168 [27:57<00:00, 23.94it/s]
2025-05-08:14:41:55
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6061|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7971|±  |0.0040|

GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64
40168/40168 [28:17<00:00, 23.66it/s]
2025-05-08:19:10:18
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6067|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7977|±  |0.0040|

GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: -1
40168/40168 [21:51<00:00, 30.64it/s]
2025-05-08:20:30:24
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6072|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7973|±  |0.0040|

GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1
40168/40168 [28:09<00:00, 23.78it/s]
2025-05-08:22:01:39
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6061|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7976|±  |0.0040|

GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
40168/40168 [28:15<00:00, 23.69it/s]
2025-05-09:02:34:33
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.6062|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7976|±  |0.0040|

GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
40168/40168 [24:10<00:00, 27.69it/s]
2025-05-09:04:27:41
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.5946|±  |0.0049|
|         |       |none  |     0|acc_norm|↑  |0.7872|±  |0.0041|

Those are just some of the tests. I basically tested all the combinations. Non-symmetrical was even worse. I know that I'm probably doing something very wrong but I have no idea what.
Any ideas are be appreciated! (I'm using the GPTQmodel 3.1.0.dev0 from the github from few days ago.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

Uh oh!

Uh oh!

TByte007 May 9, 2025

Replies: 0 comments

TByte007
May 9, 2025