From 98a305ccda765c8b3f709d89c029ad8a0a982800 Mon Sep 17 00:00:00 2001
From: Zhang Jason <zhangnju@126.com>
Date: Fri, 25 Jul 2025 13:54:04 +0800
Subject: [PATCH 1/3] Update README.md to add AMD GPU instructions

---
 README.md | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index f56a7339..03209006 100644
--- a/README.md
+++ b/README.md
@@ -52,7 +52,19 @@ See `bench.py` for benchmark.
 | vLLM           | 133,966     | 98.37    | 1361.84               |
 | Nano-vLLM      | 133,966     | 93.41    | 1434.13               |
 
-
+## Run Nano-vllm on AMD GPU
+### CDNA3 datacenter GPUs
+Access AMD CDNA3 datacenter GPU through [AMD developer cloud] (https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html).It is stronly advised to use the pre-built vLLM GPU instance, which has already installed Triton and Rocm version flash attention library.if you have other MI300 GPU resource, you could try the docker image of rocm/vllm:latest to run nano-vllm
+
+### RDNA3/4 Desktop GPUs 
+1) use the pre-built rocm docker image to setup ROCM, like rocm/pytorch:latest
+2) install the rocm-version flash triton library
+   a. git clone --recursive https://github.com/Dao-AILab/flash-attention.git
+   b. cd flash-attention 
+   c. FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python setup.py install 
+3) install nano-vllm
+4) run benchmark: FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python3 bench.py
+   
 ## Star History
 
-[![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm&type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm&Date)
\ No newline at end of file
+[![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm&type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm&Date)

From a44042ab2813758c17137dae31890fc876bd5b05 Mon Sep 17 00:00:00 2001
From: Zhang Jason <zhangnju@126.com>
Date: Sat, 26 Jul 2025 09:51:19 +0800
Subject: [PATCH 2/3] Update README.md

update the format of markdown file
---
 README.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 03209006..34e14a11 100644
--- a/README.md
+++ b/README.md
@@ -56,14 +56,16 @@ See `bench.py` for benchmark.
 ### CDNA3 datacenter GPUs
 Access AMD CDNA3 datacenter GPU through [AMD developer cloud] (https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html).It is stronly advised to use the pre-built vLLM GPU instance, which has already installed Triton and Rocm version flash attention library.if you have other MI300 GPU resource, you could try the docker image of rocm/vllm:latest to run nano-vllm
 
-### RDNA3/4 Desktop GPUs 
+### RDNA3 Desktop GPUs 
 1) use the pre-built rocm docker image to setup ROCM, like rocm/pytorch:latest
 2) install the rocm-version flash triton library
-   a. git clone --recursive https://github.com/Dao-AILab/flash-attention.git
-   b. cd flash-attention 
-   c. FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python setup.py install 
-3) install nano-vllm
-4) run benchmark: FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python3 bench.py
+   ```bash
+   git clone --recursive https://github.com/Dao-AILab/flash-attention.git
+   cd flash-attention 
+   FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python setup.py install
+   ```
+4) install nano-vllm
+5) run benchmark: FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python3 bench.py
    
 ## Star History
 

From 476d27cc4949f9bd0ff287200d74e0dc785086c8 Mon Sep 17 00:00:00 2001
From: Zhang Jason <zhangnju@126.com>
Date: Sat, 26 Jul 2025 10:14:13 +0800
Subject: [PATCH 3/3] Update README.md

---
 README.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 34e14a11..8b6ea76f 100644
--- a/README.md
+++ b/README.md
@@ -57,15 +57,19 @@ See `bench.py` for benchmark.
 Access AMD CDNA3 datacenter GPU through [AMD developer cloud] (https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html).It is stronly advised to use the pre-built vLLM GPU instance, which has already installed Triton and Rocm version flash attention library.if you have other MI300 GPU resource, you could try the docker image of rocm/vllm:latest to run nano-vllm
 
 ### RDNA3 Desktop GPUs 
+#### using pre-built rocm navi vllm docker image
+in these [ROCM vllm navi images](https://hub.docker.com/r/rocm/vllm-dev/tags?name=navi), flash attention library has been installed already. you can choose the latest verison for your test. 
+#### using pre-built rocm pytorch docker image
 1) use the pre-built rocm docker image to setup ROCM, like rocm/pytorch:latest
 2) install the rocm-version flash triton library
-   ```bash
+```bash
    git clone --recursive https://github.com/Dao-AILab/flash-attention.git
    cd flash-attention 
    FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python setup.py install
-   ```
+```
 4) install nano-vllm
 5) run benchmark: FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" python3 bench.py
+
    
 ## Star History