Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,25 @@ See `bench.py` for benchmark.
| vLLM | 133,966 | 98.37 | 1361.84 |
| Nano-vLLM | 133,966 | 93.41 | 1434.13 |

## Run Nano-vllm on AMD GPU
### CDNA3 datacenter GPUs
Access AMD CDNA3 datacenter GPU through [AMD developer cloud] (https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html).It is stronly advised to use the pre-built vLLM GPU instance, which has already installed Triton and Rocm version flash attention library.if you have other MI300 GPU resource, you could try the docker image of rocm/vllm:latest to run nano-vllm

### RDNA3 Desktop GPUs
#### using pre-built rocm navi vllm docker image
in these [ROCM vllm navi images](https://hub.docker.com/r/rocm/vllm-dev/tags?name=navi), flash attention library has been installed already. you can choose the latest verison for your test.
#### using pre-built rocm pytorch docker image
1) use the pre-built rocm docker image to setup ROCM, like rocm/pytorch:latest
2) install the rocm-version flash triton library
```bash
git clone --recursive https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python setup.py install
```
4) install nano-vllm
5) run benchmark: FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python3 bench.py


## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm&type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm&Date)
[![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm&type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm&Date)