🚀 The feature, motivation and pitch
Good first issue: Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium
v6e performs magnitudes better with FP8_e5m2 KV cache quantization as opposed to when --kv-cache-dtype=FP8
Make OOTB performance easier for users by handling this under the hood.
Alternatives
No response
Additional context
No response
Before submitting a new issue...