15 Nov 07:58

Qubitium

6c3e279

GPT-QModel v5.4.2 Latest

Latest

Notable Changes:

Fix double fwd regression by @Qubitium in #2198
Add cli: gptqmodel env by @ZX-ModelCloud in #2192
[CI] compile wheel with python -m build by @CSY-ModelCloud in #2193

What's Changed

Start v5.5.0 devel branch (odd version) by @Qubitium in #2191
Update version from 5.5.0 to 5.4.2 patch release by @Qubitium in #2199
[CI] copy wheel to local dir instead of using http server by @CSY-ModelCloud in #2200

Full Changelog: v5.4.0...v5.4.2

Contributors

Qubitium, ZX-ModelCloud, and CSY-ModelCloud

Assets 34

gptqmodel-5.4.2+cu126torch2.8-cp310-cp310-linux_x86_64.whl

sha256:fffe27b11c59fd4c7a3bcf2135275c4a807d590b9d3cc086c51d7ce39e68fcf8

118 MB 2025-11-15T09:34:42Z
gptqmodel-5.4.2+cu126torch2.8-cp311-cp311-linux_x86_64.whl

sha256:c81f5577d4337987e9aa9633da5401c7248e4c66122b9efed7256dd4df1fc6a2

118 MB 2025-11-15T09:33:46Z
gptqmodel-5.4.2+cu126torch2.8-cp312-cp312-linux_x86_64.whl

sha256:d4c1bc47ce070ce22faac9c12b5aad4007c3413d04ec4cb32219f853a30e1ad7

118 MB 2025-11-15T09:33:49Z
gptqmodel-5.4.2+cu126torch2.8-cp313-cp313-linux_x86_64.whl

sha256:2fb34660456671459bde83f1e3816b9a56ba4b4d5d6a66a826d52886cde44826

118 MB 2025-11-15T09:20:01Z
gptqmodel-5.4.2+cu126torch2.8-cp313-cp313t-linux_x86_64.whl

sha256:09dc165813019298fb4942aefe449153359ad871184722f4509a15ed415abbe0

118 MB 2025-11-15T09:18:35Z
gptqmodel-5.4.2+cu126torch2.9-cp310-cp310-linux_x86_64.whl

sha256:6929c8087adc6099f8183e32fc4333573e7b3f2e9818e60f0f98f83001b8ddb1

119 MB 2025-11-15T09:02:39Z
gptqmodel-5.4.2+cu126torch2.9-cp311-cp311-linux_x86_64.whl

sha256:1b4e40134c2f74ca9ceeadc43b2f4f4e486166e909b21a3f55952126cfd49cfd

119 MB 2025-11-15T08:59:53Z
gptqmodel-5.4.2+cu126torch2.9-cp312-cp312-linux_x86_64.whl

sha256:07ee53edd672db9b846a684b481df097804a781b0a2181a359752ec8d8423998

119 MB 2025-11-15T08:59:35Z
gptqmodel-5.4.2+cu126torch2.9-cp313-cp313-linux_x86_64.whl

sha256:abc1cf383d147c1c64f4d7c9f69c0d1ca4c53139c23b1af0edda3228df82a871

119 MB 2025-11-15T08:58:09Z
gptqmodel-5.4.2+cu126torch2.9-cp313-cp313t-linux_x86_64.whl

sha256:93ef686f9503a6b7a02c5ae00e87460dc47b5ff6f7475b04cab78600a0a9e54c

119 MB 2025-11-15T08:58:26Z
Source code (zip)

2025-11-14T07:56:01Z
Source code (tar.gz)

2025-11-14T07:56:01Z

09 Nov 02:28

Qubitium

v5.4.0

e0da12a

GPT-QModel v5.4.0

Notable Changes:

AWQ Torch Fused Kernel by @Qubitium in #2190
Make torch fused op compilable by @jiqing-feng in #2182
[FIX] AWQ MoE by @ZX-ModelCloud in #2171
add :? capture only syntax by @Qubitium in #2173

What's Changed

Update latest news section in README.md by @Qubitium in #2166
run forward pass even for empty subset to produce correct layer outputs by @avtc in #2161
Reduce AWQ memory usage by @Qubitium in #2167
Awq update by @Qubitium in #2168
Retry partial to to fix accelerate invalid argument for first moe layer (reapply) by @avtc in #2169
Awq update by @Qubitium in #2172
adjust retry partial.to by @avtc in #2175
cleanup awq_get_modules_for_scaling() by @ZX-ModelCloud in #2179
[FIX] qwen3 moe sparse moe block by @ZX-ModelCloud in #2184
Add module convert by @LRL2-ModelCloud in #2183
Cleanup by @Qubitium in #2185
Update pypcre version to 0.2.5 by @LRL2-ModelCloud in #2186
Update pypcre version to 0.2.5 by @Qubitium in #2189
[FIX] version("triton") crash on torch+xpu by @ZX-ModelCloud in #2188

Full Changelog: v5.2.0...v5.4.0

Contributors

Qubitium, avtc, and 3 other contributors

Assets 34

02 Nov 17:14

Qubitium

v5.2.0

baf9674

GPT-QModel v5.2.0

Notable Changes:

Minimax M2, Granite Nano, Qwen3-VL, Brumpy model support
AWQ quantization now out of beta and now fully integrated into life cycle
New VramStrategy.Balanced property to spread MoE modules to different gpus
New pure torch AWQ kernel
New calibration_concat_separator property
Fixed HF bug that did not save mtp layers for GLM 4.5/4.6 (air) models.
Fixed multi-gpu cuda asserts due to stream/sync

What's Changed

try not adding mem guards for marlin kernel launch protection by @Qubitium in https://github.com/ModelCloud/GPTQModel/*pull/2108
MoE vram by @Qubitium in #2110
Fix GLM 4.5/4.6 and AIr not saving mtp layer after save (HF bug) by @LRL2-ModelCloud in #2109
torchao 0.14.1 update by @Qubitium in #2111
Test refractor by @Qubitium in #2113
Bump the github-actions group with 2 updates by @dependabot[bot] in #2120
[FIX] xpu unit test by @ZX-ModelCloud in #2122
modular by @Qubitium in #2123
update scores by @Qubitium in #2124
Fp8 dequant by @Qubitium in #2125
Model dequant by @Qubitium in #2126
Fp4 e2m1 by @Qubitium in #2127
[FIX] ovis2, compatible with transformers v4.57.1 by @ZX-ModelCloud in #2129
fix cols padding by @LRL2-ModelCloud in #2130
[FIX] ovis_1_6 quantization by @ZX-ModelCloud in #2131
Minimax m2 by @Qubitium in #2128
Fix awq marlin kernel for bf16 by @Qubitium in #2135
[FIX] incorrect AWQ NODES by @ZX-ModelCloud in #2133
add support_offload_to_disk check by @LRL2-ModelCloud in #2134
Add Awq torch kernel by @Qubitium in #2137
Marin by @Qubitium in #2139
Marin scores by @Qubitium in #2141
Fix triton version detection in nogil patcher by @amd-vlarakic in #2144
Fix qwen2 omni by @LRL2-ModelCloud in #2140
[MODEL] Add GraniteMoEHybrid by @ZX-ModelCloud in #2142
Fold AWQ into proper Looper/Layer/Subset Lifecycle by @Qubitium in #2138
Refine GPT-QModel description in README by @Qubitium in #2145
fix device_map by @LRL2-ModelCloud in #2146
[MODEL] Add Qwen3-VL by @techshoww in #2136
Add calibration_concat_separator by @Qubitium in #2148
add test_qwen3_vl.py by @LRL2-ModelCloud in #2147
Fix triton monkeypatch by @Qubitium in #2149
[MODEL] Add Brumby by @Qubitium in #2150
Dedup/Cleanup by @Qubitium in #2151
Prep for 5.2 release by @Qubitium in #2152
Dedup3 by @Qubitium in #2153
add missing file by @Qubitium in #2154
GPTAQ rename by @Qubitium in #2155
fix ci test by @Qubitium in #2158
fix setup license by @Qubitium in #2160
FIx snapshot_download receiving unsupported kwargs by @Qubitium in #2162
Retry partial.to to fix accelerate invalid argument error for first moe layer for >4 GPU setups by @avtc in #2163
Comments + Sync by @Qubitium in #2164
Stats/Logs by @Qubitium in #2165

New Contributors

@amd-vlarakic made their first contribution in #2144
@techshoww made their first contribution in #2136

Full Changelog: v5.0.0...v5.2.0

Contributors

Qubitium, avtc, and 5 other contributors

Assets 34

24 Oct 04:31

Qubitium

v5.0.0

45ab616

GPT-QModel v5.0.0

Notable Changes:

New Data-parallel quant support for MoE models on multi-gpu using nogil Python (Python >= 3.13t with PYTHON_GIL=0 env).
New offload_to_disk support enabled by default to massively reduce cpu ram usage.
New Intel optimized and Amd compatible cpu hw accelerated TorchFused kernel.
Packing stage is now 4x faster and now inlined with quantization.
Vram pressure for large models reduced during quantization.
act_group_aware is now 16k+ times faster and the default when desc_act=False for higher quality recovery without inference penalty of desc_act=True.
New beta quality AWQ support with full GEMM, GEMM_Fast, Marlin kernel support.
New LFM, Ling, Qwen3 Omni model support.
Bitblas kernel updated to support Bitblas 0.1.0.post1 reelase.
Quantization is now faster with reduced vram usage. Enhanced logging support with LogBar.
And much much more...

What's Changed

rename torch_dtype to dtype to sync with hf transformers by @Qubitium in #1804
drop support for python < 3.11 by @CSY-ModelCloud in #1805
hard deprecated ipex in favor of torch_fused by @Qubitium in #1807
update pyproject.toml by @CSY-ModelCloud in #1808
[CI] release with 3.13t by @CSY-ModelCloud in #1811
[QUANTIZATION] Add AWQ support by @ZX-ModelCloud in #1703
find mapping by @LRL-ModelCloud in #1812
Update README.md by @Qubitium in #1813
Update version.py by @Qubitium in #1814
Turtle in a half shell by @Qubitium in #1809
note about memory saving by @Qubitium in #1817
move fail_safe by @LRL-ModelCloud in #1818
rename turtle method by @Qubitium in #1820
add threads by @Qubitium in #1821
remove AWQ mod defs by @ZX-ModelCloud in #1822
[CI] use new docker by @CSY-ModelCloud in #1823
Fix awq quantize by @LRL-ModelCloud in #1824
[CI] use new docker for release source by @CSY-ModelCloud in #1825
fix awq pack by @LRL-ModelCloud in #1826
fix loading autoawq models and hf/vllm/sglang loading of newly awq qu… by @Qubitium in #1827
wrong arg check by @Qubitium in #1828
fix thread task var scoping by @Qubitium in #1829
fix call param by @Qubitium in #1830
fix threads > 1 not considered (unsafe) by @Qubitium in #1832
cleanup by @Qubitium in #1833
fix gptqmodel offload paths conflict by @Qubitium in #1834
Ci test by @Qubitium in #1835
eora: always diff in fp32 + cleanup by @Qubitium in #1836
add register_buffer/parameter to NamedModule class by @Qubitium in #1837
typo by @Qubitium in #1839
add thread safety to all classes by @Qubitium in #1840
fix fail_safe by @LRL-ModelCloud in #1844
update marlin kernel by @ZX-ModelCloud in #1838
fix fp32 reduce on/off by @Qubitium in #1845
bypass marlin kernel bias issue by @Qubitium in #1846
disable marlin atomics by default as it failed ci accuracy test by @Qubitium in #1847
[FIX] awq marlin by @ZX-ModelCloud in #1816
cleanup var names by @Qubitium in #1849
pack per module by @LRL-ModelCloud in #1842
[CI] use new docker by @CSY-ModelCloud in #1850
tweak eora test by @Qubitium in #1851
wait for thread tasks only when every module has completed. by @Qubitium in #1852
[FIX] Compatible with vllm v0.10.2 by @ZX-ModelCloud in #1855
move req.txt into toml by @CSY-ModelCloud in #1858
do not create buffers only to overite them by @Qubitium in #1857
pop states after use by @Qubitium in #1859
[FIX] multiple "register_buffers" parameters by @ZX-ModelCloud in #1860
Low memory pack by @Qubitium in #1861
fix packing ci test by @Qubitium in #1862
simplify by @Qubitium in #1853
Fix 3bit packing regression in previous commit by @Qubitium in #1863
remove deprecated parallel_packing property by @Qubitium in #1864
Fix qqq quant/offloading by @Qubitium in #1866
temp disable awq gemm kernel due to failing ci by @Qubitium in #1867
update vllm compat by @Qubitium in #1869
fix regression by @Qubitium in #1870
fix setup.py crashed because torch may not support float8_e8m0fnu by @CSY-ModelCloud in #1871
[FIX] AwqGEMMQuantLinear skip gptq_v1 convert to v2 by @ZX-ModelCloud in #1872
Fix awq gemm auto kernel selection order by @Qubitium in #1873
Update README.md by @Qubitium in #1874
reduce forwarding to minimal by @Qubitium in #1876
Update README.md by @Qubitium in #1877
fix exllama tests by @Qubitium in #1879
debug print all params/buffers by @Qubitium in #1880
skip internal loading of non-pkg compatible quantization models, i.e.… by @Qubitium in #1881
Loader by @Qubitium in #1882
Cleanup awq by @Qubitium in #1883
remove broken test by @Qubitium in #1884
[CI] remove old cuda/torch support for release by @CSY-ModelCloud in #1885
fix loader by @LRL-ModelCloud in #1886
fix nvcc warnings about pending cuda > 13.x compat by @Qubitium in #1887
fix packing speed test by @Qubitium in #1889
fix licenses warning by @CSY-ModelCloud in #1888
set licenses to apache by @CSY-ModelCloud in #1890
[FIX] AwqGEMMQuantLinear should is PackableQuantLinear by @ZX-ModelCloud in #1891
skip modules that have no parameters and no buffers since they can't be offloaded by @LRL-ModelCloud in #1892
skip modules that have no parameters and no buffers since they can't offload by @LRL-ModelCloud in #1894
Fix device check by @Qubitium in #1896
[CI] disable test install by @CSY-ModelCloud in #1895
remove hash feature by @Qubitium in #1897
fix cuda ext cannot be loaded by @Qubitium in #1898
lock numpy to 2.2.6 by @CSY-ModelCloud in #1899
[FIX] test_lm_eval.py by @ZX-ModelCloud in #1900
Patch fix model save by @Qubitium in #1901
Ugly patch save 2 by @Qubitium in #1902
fix potential leak by @Qubitium in #1904
[FIX] test_integration by @ZX-ModelCloud in #1903
fix build will uploaded a empty wheel by @CSY-ModelCloud in #1905
fix lm_head quant by @LRL-ModelCloud in #1906
batch tweaks by @Qubitium in #1907
[FIX] test_kernel_output_torch_fused by @ZX-ModelCloud in ...

Contributors

Qubitium, jiqing-feng, and 4 other contributors

Assets 34

16 Sep 13:19

Qubitium

v4.2.5

db41ae4

GPT-QModel v4.2.5

What's Changed

Cleanup hyb_act by @Qubitium in #1791
Remove torch import in setup.py by @Qubitium in #1729
Refractor: rename hyb_act to act_group_aware by @Qubitium in #1794
Cleanup by @Qubitium in #1795, #1796
[CI] Add torch 2.8.0 by @CSY-ModelCloud in #1797
[CI] torch-2.6.0+cu128-python-3.9 does not exist by @CSY-ModelCloud in #1798
Fix wf_unsqueeze_zero and wf_unsqueeze_neg_one by @LRL-ModelCloud in #1799
GAR field save to meta on quant save by @Qubitium in #1800
Add pyproject.toml by @CSY-ModelCloud in #1801
[CI] Don't detect arch list when it has already been set & fix build-system requirments by @CSY-ModelCloud in #1802

Full Changelog: v4.2.0...v4.2.5

Contributors

Qubitium, LRL-ModelCloud, and CSY-ModelCloud

Assets 45

12 Sep 08:02

Qubitium

v4.2.0

c0c3569

GPT-QModel v4.2.0

Notable Changes

Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
Add Apertus support by @LRL-ModelCloud in #1767
Add Kimi k2 support by @LRL-ModelCloud in #1768
Add Klear support by @LRL-ModelCloud in #1769
Add FastLLM support by @LRL-ModelCloud in #1771
Add Nemotron H support by @LRL-ModelCloud in #1773
Add fail_safe option by @LRL-ModelCloud in #1775
Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763

What's Changed

Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
[CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
ignore compile warns about var declared but not used by @Qubitium in #1760
allow prebuilt wheel path to be customized via env by @Qubitium in #1761
add build toggles for all cpp kernels by @Qubitium in #1764
fix multi gpu inference by @LRL-ModelCloud in #1762
[CI] reduce wheel download size by @CSY-ModelCloud in #1765
start 4.2.0-dev cycle by @Qubitium in #1766
fix klear by @LRL-ModelCloud in #1770
FIX transformers >= 4.56.1 force changed torch.default_dtype by @Qubitium in #1779
fix multi gpu fail_safe by @LRL-ModelCloud in #1780
fix device instance by @LRL-ModelCloud in #1783
prepare for 4.2 release by @Qubitium in #1785

Full Changelog: v4.1.0...v4.2.0

Contributors

Qubitium, avtc, and 3 other contributors

Assets 64

04 Sep 20:18

Qubitium

v4.1.0

4ab07b5

GPT-QModel v4.1.0

Notable Changes:

Add a config option: mock_quantization to simplify heavy computations… by @avtc in #1731
Add GLM-4.5-Air support by @avtc in #1730
Add GPT-OSS support by @LRL2-ModelCloud in #1737
Add LongCatFlashGPTQ by @LRL-ModelCloud in #1751
Add Llama 4 Support by @Qubitium in #1508

What's Changed

Minor Cleanup by @Qubitium in #1718
disable some compilation on torch 2.8 due to compat issues by @Qubitium in #1727
add glm4 moe test by @LRL2-ModelCloud in #1734
deprecate autoround by @Qubitium in #1735
[FIX] test_kernel_output with XPU by @ZX-ModelCloud in #1741
cleanup checks for GIL control, GIL=0, and python >= 3.13.3t by @Qubitium in #1743
update torch/transformer depends by @Qubitium in #1749
reduce pkg depend by @Qubitium in #1750
fix triton compat check for 3.13.3t by @Qubitium in #1752
Bump torch from 2.7.1 to 2.8.0 in /gptqmodel_ext/exllama_eora by @dependabot[bot] in #1755
pkg update: tokenicer 0.0.5 by @Qubitium in #1756

New Contributors

@avtc made their first contribution in #1730

Full Changelog: v4.0.0...v4.1.0

Contributors

Qubitium, avtc, and 4 other contributors

Assets 55

22 Aug 14:25

Qubitium

v4.0.0

40759cd

GPT-QModel v4.0.0

Notable Changes

Supprt add glm4 by @glide-the in #1559
Add Xiaomi MiMo model by @Qubitium in #1571
Free threading (GIL free) Quantization for Linear NxGPU scaling of Quantization by @Qubitium in #1581
feat: add Qwen-Omni support. by @tiger-of-shawn in #1613
add Qwen 2.5 Omni support by @Qubitium in #1615
[MODEL] ERNIE4.5 by @LRL-ModelCloud in #1645
[MODEL]support pangu_alpha model by @ZX-ModelCloud in #1646
new baidu ernie & huawei pangu model support by @Qubitium in #1647
[MODEL] Add falcon h1 support by @LRL-ModelCloud in #1621
feat(gemma3): also support larger gemma3 models and not only small te… by @joennlae in #1627
Add Group Aware Reordering (GAR) for Efficient Activation Reordering by @tgafni in #1656
Enable pytorch fused op on XPU by @jiqing-feng in #1660
[MODEL] Add Seed-OSS support by @LRL2-ModelCloud in #1702

Other Changed

[CI] add release source with github's vm by @CSY-ModelCloud in #1543
Set format/method to string, enum by @Qubitium in #1546
Fix rotation for tied embedding models by @smpanaro in #1550
Fix missing import by @smpanaro in #1551
Fix input processing for convolution by @Cecilwang in #1554
[FIX] moe model quant division by zero issue by @LRL-ModelCloud in #1565
[FIX] remove too short calib data by @LRL-ModelCloud in #1566
Update qwen3 support by @Qubitium in #1567
[FIX] hook_module and qwen3_moe by @LRL-ModelCloud in #1569
[FIX] hook linear and triton by @LRL-ModelCloud in #1570
[MISC] simplify model definition by @LRL-ModelCloud in #1572
[FIX]qwen2 moe loop module by @LRL-ModelCloud in #1574
Process threads by @Qubitium in #1576
cleanup names by @Qubitium in #1578
Api refractor by @Qubitium in #1579
[CI] fix unit test was unable to run by @CSY-ModelCloud in #1580
fix has_gil was not imported & device-smi api wrong by @CSY-ModelCloud in #1586
Fix compat by @Qubitium in #1587
fix older python didn't have EnumType by @CSY-ModelCloud in #1590
allow hinv none to continue by @Qubitium in #1588
[FIX] get_module_by_name_prefix by @LRL-ModelCloud in #1591
[CI] update release CI, add torch 2.7.0 by @CSY-ModelCloud in #1592
Update test_opt.py by @Qubitium in #1593
remove bad test attributes by @Qubitium in #1594
default damp way too low by @Qubitium in #1599
FIX mult-gpu quant by @Qubitium in #1600
Fix reset device next by @Qubitium in #1601
fix reset by @Qubitium in #1602
ctx should be target by @Qubitium in #1603
fix qwen2-moe mlp.gate not quantized by @Qubitium in #1604
disable streaming for now by @Qubitium in #1605
disable streaming for now by @Qubitium in #1606
addm falcon h1 notes by @Qubitium in #1622
[FIX] Qwen2.5 vl quant by @LRL-ModelCloud in #1623
Bump torch from 2.6.0 to 2.7.1 in /gptqmodel_ext/exllama_eora by @dependabot[bot] in #1628
fix bug for device error by @kaixuanliu in #1631
[FIX]config seq len by @LRL-ModelCloud in #1640
gemma3 4B specific compat fix by @Qubitium in #1641
register buffer for wf_unsqueeze_zero and wf_unsqueeze_neg_one to… by @kaixuanliu in #1642
set_postfix is a tqdm function, no need anymore by @CSY-ModelCloud in #1643
Alkali modified by @alkalimc in #1644
fix exception to avoid memory issue by @jiqing-feng in #1679
lm_head hooked by @Chunfei-He in #1673
Bump the github-actions group across 1 directory with 2 updates by @dependabot[bot] in #1677
fixed bugs when quantize lm_head by @528-dev in #1675
Add gpt-neo model definition by @smpanaro in #1683
Skip compile if MPS and < torch 2.8.0 by @smpanaro in #1684
Model config.use_cache not correctly used during inference for some models by @LRL2-ModelCloud in #1686
[FIX] transformers compat by @LRL2-ModelCloud in #1687
Update module_looper.py by @LRL2-ModelCloud in #1690
Update requirements.txt by @LRL2-ModelCloud in #1689
Update version.py by @Qubitium in #1691
add ACCEPT_USE_FLASH_ATTEN2_ARG by @LRL2-ModelCloud in #1693
Fix kwarg vs pos arg hidden states by @LRL2-ModelCloud in #1694
fix import Perplexity failed by @CSY-ModelCloud in #1695
[CI] fix CI installed wrong libs' version by @CSY-ModelCloud in #1696
[FIX] GIL Check by @ZX-ModelCloud in #1697
[FIX] minicpm test by @LRL2-ModelCloud in #1698
[FIX] use AutoModelForImageTextToText instead of AutoModelForVision2Seq by @ZX-ModelCloud in #1699
[CI] change qwen2.5-omni model path by @ZX-ModelCloud in #1701
[CI] install jieba for test_pangu_alpha by @CSY-ModelCloud in #1706
disable torch.compile by @LRL2-ModelCloud in #1707
FIX minicpm CI test by @LRL2-ModelCloud in #1708
[CI] update torch for build by @CSY-ModelCloud in #1709
[CI] update release matrix by @CSY-ModelCloud in #1710
[CI] install torch compiled with cuda 126 by @CSY-ModelCloud in #1711
use "attn_implementation" by @LRL2-ModelCloud in #1712
prepare for 4.0.0 release by @Qubitium in #1704
[CI] add 5090 support & install latest intel_extension_for_pytorch by @CSY-ModelCloud in #1713
[CI] don't compile 5090 for cuda < 12.8 by @CSY-ModelCloud in #1714
[CI] Update unit test docker by @CSY-ModelCloud in #1715
[CI] fix release ci by @CSY-ModelCloud in #1716
fix model path is not public by @CSY-ModelCloud in #1720
[CI] don't exit when package doesn't exist by @CSY-ModelCloud in #1719
[CI] no need install logbar manually by @CSY-ModelCloud in #1721
[CI] remove legacy tests & skip intel tests & disable flash_attn for some models by @CSY-ModelCloud in #1722
[CI] no need install uv by @CSY-ModelCloud in #1723
[CI] use new docker with uv binary to fix shim/uv didn't exist by @CSY-ModelCloud in #1724

New Contributors

@Cecilwang made their first contribution in #1554
@glide-the made their first contribution in #1559
@tiger-of-shawn made their first contribution in https://github.com/ModelClo...

Contributors

Qubitium, joennlae, and 16 other contributors

Assets 64

14 Apr 14:15

Qubitium

v3.0.0

a0c7753

GPT-QModel v3.0.0

🎉 New ground-breaking GPTQ v2 quantization option for improved model quantization accuracy validated by GSM8K_PLATINUM benchmarks vs original gptq.
✨ New Phi4-MultiModal model support.
✨ New Nvidia Nemotron Ultra model support.
✨ New Dream model support. New experimental multi-gpu quantization support. Reduced vram usage. Faster quantization.

What's Changed

Multi GPU Quantization by @Qubitium in #1502
experimental multi-gpu quantization by @Qubitium in #1503
reduce allocation by @Qubitium in #1504
revert add_ by @Qubitium in #1506
Switch to non-deprecated mlx.core.clear_cache() by @smpanaro in #1510
Dream Model Support by @Qubitium in #1512
fix disabling batch/mask for dream by @Qubitium in #1514
reduce tensor device movement by @Qubitium in #1516
fix deepseek v3 module order by @Qubitium in #1517
Nemotron Ultra Support by @Qubitium in #1518
faster process_batch by @Qubitium in #1519
Fix missing arg due to recent Processor api changes by @Qubitium in #1523
Fix gpt2 columns calculation by @Qubitium in #1524
temp damper should not overwrite damp cfg by @Qubitium in #1526
Replace module hooking with tree-defined targeting by @Qubitium in #1527
Fix compat with XPU by @Qubitium in #1535
Phi4 MultiModal by @Qubitium in #1511
disable selection of ExllamaV2 kernel for group_size=16 for now by @Qubitium in #1537
Add Gptqv2 by @yhhhli and @Qubitium in #1533

New Contributors

@smpanaro made their first contribution in #1510
@yhhhli made their first contribution in #1533

Full Changelog: v2.2.0...v3.0.0

Contributors

Qubitium, smpanaro, and yhhhli

Assets 2

03 Apr 02:18

Qubitium

v2.2.0

ca9d634

GPTQModel v2.2.0

What's Changed

✨ New Qwen 2.5 VL model support. Prelim Qwen 3 model support.
✨ New samples log column during quantization to track module activation in MoE models.
✨ Loss log column now color-coded to highlight modules that are friendly/resistant to quantization.
✨ Progress (per-step) stats during quantization now streamed to log file.
✨ Auto bfloat16 dtype loading for models based on model config.
✨ Fix kernel compile for Pytorch/ROCm.
✨ Slightly faster quantization and auto-resolve some low-level oom issues for smaller vram gpus.

Enable ipex tests for CPU/XPU by @jiqing-feng in #1460
test kernel accuracies with more shapes on cuda by @Qubitium in #1461
Fix rocm flags by @Qubitium in #1467
use table like logging format by @Qubitium in #1471
stream process log entries to persistent file by @Qubitium in #1472
fix some models need trust-remote-code arg by @Qubitium in #1474
Fix wq dtype by @Qubitium in #1475
add colors to quant loss column by @Qubitium in #1477
add prelim qwen3 support by @Qubitium in #1478
Update eora.py for further optimization by @nbasyl in #1488
faster cholesky inverse and avoid oom when possible by @Qubitium in #1494
[MODEL] supports qwen2_5_vl by @ZX-ModelCloud in #1493

Full Changelog: v2.1.0...v2.2.0

Contributors

Qubitium, nbasyl, and 2 other contributors

Assets 61

Releases: ModelCloud/GPTQModel

GPT-QModel v5.4.2

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.4.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v5.2.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v5.0.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.5

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.0

Notable Changes

What's Changed

Contributors

Uh oh!

GPT-QModel v4.1.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v4.0.0

Notable Changes

Other Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v3.0.0

What's Changed

New Contributors

Contributors

Uh oh!

GPTQModel v2.2.0

What's Changed

Contributors

Uh oh!