[Feature Idea] Support for 1.58-bit Ternary Quantization (BitNet) #1674
Unanswered
ratchanonth60
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi bitsandbytes team,
First of all, thank you for this incredible library. It has revolutionized how we access and train large models with limited resources.
I've been recently experimenting with the 1.58-bit BitNet architecture (from the paper "The Era of 1-bit LLMs") and have successfully created a Python-level implementation of a BitLinear layer for a computer vision model (MedMambaUNet for segmentation).
While my implementation proves the concept and shows potential for model size reduction, I understand from your work and documentation that true performance gains (especially in speed) come from custom, optimized CUDA kernels, such as the ones bitsandbytes provides for 4-bit and 8-bit quantization. My Python version suffers from significant overhead, as expected.
Are there any plans, or have there been any discussions, about potentially supporting 1.58-bit ternary quantization ({-1, 0, 1} scheme) in bitsandbytes in the future?
Having an optimized, kernel-based implementation for this "multiplication-free" paradigm within bitsandbytes would be incredibly valuable. It would allow the community to easily and fairly benchmark the trade-offs between information-rich quantization schemes like NF4 and the extreme efficiency of ternary models, all within the same trusted framework.
Thank you for your time and for all your amazing work!
Beta Was this translation helpful? Give feedback.
All reactions