Add conversion from ModelCloud Quantizations (GPTQ, GPTQ-v2, QQQ + Rotation) to GGUF #1544
Replies: 1 comment 4 replies
-
@joseph777111 It is entirely possible but the conversion is not native and if not aligned, may have secondary loss. We only provided the feature for MLX for academic reasons and have not actually benchmarked if there is a loss. The concept is simple, we dequantize the weights back to bfloat16 and have gguf go at it. Some people might think this is so naive but those who know how quantization works, knows this that exactly how quantization works. Quantization is not about directly going from bfloat16 to 4bits. The primary pass is actually to make sure the bfloat16 is |
Beta Was this translation helpful? Give feedback.
-
I really like what you guys have done with this project and your quantization schemes. And, I am happy we can convert ModelCloud GPTQ to MLX. But, Is it at all possible to get a conversion from ModelCloud quants (GPTQ, GPQ-v2, and, QQQ + Rotation) to GGUF? Please 🙏 - thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions