Add conversion from ModelCloud Quantizations (GPTQ, GPTQ-v2, QQQ + Rotation) to GGUF #1544

joseph777111 · 2025-04-15T23:36:28Z

joseph777111
Apr 15, 2025

I really like what you guys have done with this project and your quantization schemes. And, I am happy we can convert ModelCloud GPTQ to MLX. But, Is it at all possible to get a conversion from ModelCloud quants (GPTQ, GPQ-v2, and, QQQ + Rotation) to GGUF? Please 🙏 - thank you in advance.

Qubitium · 2025-04-17T18:08:40Z

Qubitium
Apr 17, 2025
Maintainer

@joseph777111 It is entirely possible but the conversion is not native and if not aligned, may have secondary loss. We only provided the feature for MLX for academic reasons and have not actually benchmarked if there is a loss. The concept is simple, we dequantize the weights back to bfloat16 and have gguf go at it. Some people might think this is so naive but those who know how quantization works, knows this that exactly how quantization works. Quantization is not about directly going from bfloat16 to 4bits. The primary pass is actually to make sure the bfloat16 is smoothed so the group_size/block quantizations can work effectively in packing stage.

4 replies

joseph777111 Apr 17, 2025
Author

Thank you! I was thinking about the group_size/block... This is exactly what I was hoping for. If you don't mind having this as an experimental/academic feature, could this please be added? It would be very nice to be able to experiment with your quantizations with GGUF. And, being able to quantize and export the de-quantized model weights that are already "set up", using GPTQ, GPTQ-v2, etc, and converting and quantizing them to GGUF would make this a reality.

Back in the LLaMA-2 days, I was able to use OmniQuant's de-quantized ("fake") model-weights successfully in conversion and quantization to GGUF - also: at that time, GGUF had a built-in AWQ conversion script, so we also were able to convert models to GGUF using AWQ scales (this unfortunately is no longer supported, as no one used the feature but me...). But, I digress...

I know GGUF is compatible with other quantization schemes like GPTQ and AWQ as long as the group_size and block size are the same as that which GGUF uses. Being able to use GPTQ, GPTQ-v2, etc with GGUF would open the doors to much more accurate GGUF models. I really hope this can be added as an experimental/academic feature. Because, I really like llama.cpp's system agnostic approach, but I crave more accurate quantizations like GPTQ, GPTQ-v2, etc, and I know I am not alone. 😋

Qubitium Apr 17, 2025
Maintainer

Exactly, as long as GGUF quantization (packing) is aligned with the gptq format, then I see very low to no loss. For example, don't convert gptq 4bit group size=128 to gguf 3bit and group_size=32. The bit and group_size should be aligned as much possible.

joseph777111 Apr 17, 2025
Author

So, just to clarify: we can export the de-quantized GPTQ model-weights to BF16 via saving/exporting to MLX to use with other compatible quantization frameworks, such as GGUF. So, the academic feature request is, technically, already implicitly supported? 🤔👀

Qubitium Apr 18, 2025
Maintainer

@joseph777111 If you are up to the task, please check the mlx export code. We already do full BF16 dequantzization. So you just need to copy the mlx export code and replace the mlx part with gguf equivalent! This would be a great feature/PR to have. Let me know. I can assist with you on the PR if you run into any problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conversion from ModelCloud Quantizations (GPTQ, GPTQ-v2, QQQ + Rotation) to GGUF #1544

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Add conversion from ModelCloud Quantizations (GPTQ, GPTQ-v2, QQQ + Rotation) to GGUF #1544

joseph777111 Apr 15, 2025

Replies: 1 comment · 4 replies

Qubitium Apr 17, 2025 Maintainer

joseph777111 Apr 17, 2025 Author

Qubitium Apr 17, 2025 Maintainer

joseph777111 Apr 17, 2025 Author

Qubitium Apr 18, 2025 Maintainer

joseph777111
Apr 15, 2025

Replies: 1 comment 4 replies

Qubitium
Apr 17, 2025
Maintainer

joseph777111 Apr 17, 2025
Author

Qubitium Apr 17, 2025
Maintainer

joseph777111 Apr 17, 2025
Author

Qubitium Apr 18, 2025
Maintainer