Replies: 2 comments
-
llama.cpp also runs on android and iphone, ggml is being moved into the standard ML libraries of Android from what I've seen. |
Beta Was this translation helpful? Give feedback.
0 replies
-
目前不支持 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
it would be great if the original authors could chip in to help speed up local deployments ..
bnb 4bit is certainly a good start
but to make that available to the masses we would need to have that in a faster quant ( llama.cpp/gptq/awq) any of them will do and build the fundament so we could use that and port that to the other quants formats
i spoken with a few guys ( casper from autoawq / turboderp from exllama/v2) and they state its a huge effort to implement that 50h+
im not that deep on the architecture parts of the vision llms so i cant really judge that
but if we could get some hints / or some help from the authors that would certainly help ALOT
cogvlm is the best vision model we have so far - its just very restrictive in current quant formats for the normal enduser ( no access to a/h 100s)
ggml-org/llama.cpp#4387
there is some demand from the community .. but we really need some help here
Beta Was this translation helpful? Give feedback.
All reactions