Skip to content

Releases: ModelCloud/GPTQModel

GPT-QModel v5.4.2

15 Nov 07:58
6c3e279

Choose a tag to compare

Notable Changes:

What's Changed

Full Changelog: v5.4.0...v5.4.2

GPT-QModel v5.4.0

09 Nov 02:28
e0da12a

Choose a tag to compare

Notable Changes:

What's Changed

Full Changelog: v5.2.0...v5.4.0

GPT-QModel v5.2.0

02 Nov 17:14
baf9674

Choose a tag to compare

Notable Changes:

  • Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
  • AWQ quantization now out of beta and now fully integrated into life cycle
  • New VramStrategy.Balanced property to spread MoE modules to different gpus
  • New pure torch AWQ kernel
  • New calibration_concat_separator property
  • Fixed HF bug that did not save mtp layers for GLM 4.5/4.6 (air) models.
  • Fixed multi-gpu cuda asserts due to stream/sync

What's Changed

New Contributors

Full Changelog: v5.0.0...v5.2.0

GPT-QModel v5.0.0

24 Oct 04:31
45ab616

Choose a tag to compare

Notable Changes:

  • New Data-parallel quant support for MoE models on multi-gpu using nogil Python (Python >= 3.13t with PYTHON_GIL=0 env).
  • New offload_to_disk support enabled by default to massively reduce cpu ram usage.
  • New Intel optimized and Amd compatible cpu hw accelerated TorchFused kernel.
  • Packing stage is now 4x faster and now inlined with quantization.
  • Vram pressure for large models reduced during quantization.
  • act_group_aware is now 16k+ times faster and the default when desc_act=False for higher quality recovery without inference penalty of desc_act=True.
  • New beta quality AWQ support with full GEMM, GEMM_Fast, Marlin kernel support.
  • New LFM, Ling, Qwen3 Omni model support.
  • Bitblas kernel updated to support Bitblas 0.1.0.post1 reelase.
  • Quantization is now faster with reduced vram usage. Enhanced logging support with LogBar.
  • And much much more...

What's Changed

Read more

GPT-QModel v4.2.5

16 Sep 13:19
db41ae4

Choose a tag to compare

What's Changed

Full Changelog: v4.2.0...v4.2.5

GPT-QModel v4.2.0

12 Sep 08:02
c0c3569

Choose a tag to compare

Notable Changes

What's Changed

Full Changelog: v4.1.0...v4.2.0

GPT-QModel v4.1.0

04 Sep 20:18
4ab07b5

Choose a tag to compare

Notable Changes:

What's Changed

New Contributors

Full Changelog: v4.0.0...v4.1.0

GPT-QModel v4.0.0

22 Aug 14:25
40759cd

Choose a tag to compare

Notable Changes

Other Changed

New Contributors

Read more

GPT-QModel v3.0.0

14 Apr 14:15
a0c7753

Choose a tag to compare

🎉 New ground-breaking GPTQ v2 quantization option for improved model quantization accuracy validated by GSM8K_PLATINUM benchmarks vs original gptq.
✨ New Phi4-MultiModal model support.
✨ New Nvidia Nemotron Ultra model support.
✨ New Dream model support. New experimental multi-gpu quantization support. Reduced vram usage. Faster quantization.

What's Changed

New Contributors

Full Changelog: v2.2.0...v3.0.0

GPTQModel v2.2.0

03 Apr 02:18
ca9d634

Choose a tag to compare

What's Changed

✨ New Qwen 2.5 VL model support. Prelim Qwen 3 model support.
✨ New samples log column during quantization to track module activation in MoE models.
✨ Loss log column now color-coded to highlight modules that are friendly/resistant to quantization.
✨ Progress (per-step) stats during quantization now streamed to log file.
✨ Auto bfloat16 dtype loading for models based on model config.
✨ Fix kernel compile for Pytorch/ROCm.
✨ Slightly faster quantization and auto-resolve some low-level oom issues for smaller vram gpus.

Full Changelog: v2.1.0...v2.2.0