[Feature]: Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium

### 🚀 The feature, motivation and pitch

Good first issue: Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium

v6e performs magnitudes better with FP8_e5m2 KV cache quantization as opposed to when --kv-cache-dtype=FP8

Make OOTB performance easier for users by handling this under the hood. 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://github.com/vllm-project/tpu-inference/tree/main/docs), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium #1112

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Use FP8_e5m2 automatically when using quantized kv cache FP8 on trillium #1112

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions