> [!CAUTION] > > tinyllama is `Maykeye/TinyLlama-v0` Currently ggma supports only float models for decode. Let's add 4bit quantization (e.g. ggml q4). - [x] Write a tool to quantize f-circle to q-circle #16291 - [ ] Add 4bit kernel for Attention Op