Skip to content

GPT-QModel v5.0.0

Choose a tag to compare

@Qubitium Qubitium released this 24 Oct 04:31
· 84 commits to main since this release
45ab616

Notable Changes:

  • New Data-parallel quant support for MoE models on multi-gpu using nogil Python (Python >= 3.13t with PYTHON_GIL=0 env).
  • New offload_to_disk support enabled by default to massively reduce cpu ram usage.
  • New Intel optimized and Amd compatible cpu hw accelerated TorchFused kernel.
  • Packing stage is now 4x faster and now inlined with quantization.
  • Vram pressure for large models reduced during quantization.
  • act_group_aware is now 16k+ times faster and the default when desc_act=False for higher quality recovery without inference penalty of desc_act=True.
  • New beta quality AWQ support with full GEMM, GEMM_Fast, Marlin kernel support.
  • New LFM, Ling, Qwen3 Omni model support.
  • Bitblas kernel updated to support Bitblas 0.1.0.post1 reelase.
  • Quantization is now faster with reduced vram usage. Enhanced logging support with LogBar.
  • And much much more...

What's Changed

Full Changelog: v4.2.5...v5.0.0