Quantize a single tensor obtained from a float32 model #1364
Unanswered
Boltzmachine
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a model consisting of two parts: the first is an encoder of float32, and the second is a quantized LLM.
If I load the LLM in bfloat16, I can do encoder(x).bfloat() and feed it into the LLM. But for LLM in 8bit, I cannot find a corresponding way to convert the output of the encoder.
Beta Was this translation helpful? Give feedback.
All reactions