[Feature Idea] Support for 1.58-bit Ternary Quantization (BitNet) #1674

ratchanonth60 · 2025-06-10T13:07:04Z

ratchanonth60
Jun 10, 2025

Hi bitsandbytes team,

First of all, thank you for this incredible library. It has revolutionized how we access and train large models with limited resources.

I've been recently experimenting with the 1.58-bit BitNet architecture (from the paper "The Era of 1-bit LLMs") and have successfully created a Python-level implementation of a BitLinear layer for a computer vision model (MedMambaUNet for segmentation).

While my implementation proves the concept and shows potential for model size reduction, I understand from your work and documentation that true performance gains (especially in speed) come from custom, optimized CUDA kernels, such as the ones bitsandbytes provides for 4-bit and 8-bit quantization. My Python version suffers from significant overhead, as expected.

Are there any plans, or have there been any discussions, about potentially supporting 1.58-bit ternary quantization ({-1, 0, 1} scheme) in bitsandbytes in the future?

Having an optimized, kernel-based implementation for this "multiplication-free" paradigm within bitsandbytes would be incredibly valuable. It would allow the community to easily and fairly benchmark the trade-offs between information-rich quantization schemes like NF4 and the extreme efficiency of ternary models, all within the same trusted framework.

Thank you for your time and for all your amazing work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Idea] Support for 1.58-bit Ternary Quantization (BitNet) #1674

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Feature Idea] Support for 1.58-bit Ternary Quantization (BitNet) #1674

Uh oh!

ratchanonth60 Jun 10, 2025

Replies: 0 comments

ratchanonth60
Jun 10, 2025