Skip to content

Commit 8101bc3

Browse files
billishyahaoAlexHe99haichen07
committed
add rocm perf
Co-authored-by: AlexHe99 <[email protected]> Co-authored-by: haichen07 <[email protected]>
1 parent 38baf0b commit 8101bc3

File tree

2 files changed

+94
-2
lines changed

2 files changed

+94
-2
lines changed

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ A lightweight vLLM implementation built from scratch.
1414
pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
1515
```
1616

17+
### Installation on AMD GPUs
18+
```bash
19+
pip install --no-build-isolation git+https://github.com/GeeeekExplorer/nano-vllm.git
20+
```
21+
For more infomations, see [AMD's readme](./README_amd.md)
22+
1723
## Manual Download
1824

1925
If you prefer to download the model weights manually, use the following command:
@@ -40,18 +46,27 @@ outputs[0]["text"]
4046
See `bench.py` for benchmark.
4147

4248
**Test Configuration:**
43-
- Hardware: RTX 4070 Laptop (8GB)
49+
- Hardware:
50+
- Setup 1: Nvidia RTX 4070 Laptop (8GB)
51+
- Setup 2: AMD Radeon RX7900XT (20GB)
4452
- Model: Qwen3-0.6B
4553
- Total Requests: 256 sequences
4654
- Input Length: Randomly sampled between 100–1024 tokens
4755
- Output Length: Randomly sampled between 100–1024 tokens
4856

49-
**Performance Results:**
57+
**Setup 1 Performance Results:**
5058
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
5159
|----------------|-------------|----------|-----------------------|
5260
| vLLM | 133,966 | 98.37 | 1361.84 |
5361
| Nano-vLLM | 133,966 | 93.41 | 1434.13 |
5462

63+
**Setup 2 Performance Results:**
64+
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
65+
|----------------|-------------|----------|-----------------------|
66+
| vLLM | 133,966 | 61.80 | 2167.84 |
67+
| Nano-vLLM | 133,966 | 65.91 | 2032.36 |
68+
69+
5570

5671
## Star History
5772

README_amd.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Installation Guide on AMD GPUs
2+
3+
This guide shows nano-vllm users how to install nano-vllm and how it performs on AMD platform.
4+
5+
6+
## Installation on AMD GPUs
7+
8+
### Launch container environment
9+
10+
```bash
11+
CONTAINER_NAME=<your container name>
12+
IMAGE_NAME=rocm/vllm-dev:rocm6.4.1_navi_ubuntu24.04_py3.12_pytorch_2.7_vllm_0.8.5
13+
# For AMD Instinct GPUs, users can select latest pre-built docker image:
14+
# rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715. See https://hub.docker.com/r/rocm/vllm/tags
15+
16+
docker run -it \
17+
--rm \
18+
--device /dev/dri \
19+
--device /dev/kfd \
20+
--network host \
21+
--ipc host \
22+
--group-add video \
23+
--cap-add SYS_PTRACE \
24+
--security-opt seccomp=unconfined \
25+
--privileged \
26+
--shm-size 8G \
27+
--name ${CONTAINER_NAME} \
28+
${IMAGE_NAME} /bin/bash
29+
```
30+
31+
### Install through pip
32+
33+
```bash
34+
pip install --no-build-isolation git+https://github.com/GeeeekExplorer/nano-vllm.git
35+
```
36+
37+
38+
## Benchmark on AMD GPUs
39+
40+
See `bench.py` for benchmark.
41+
42+
**Test Configuration:**
43+
- Hardware:
44+
- Setup 1: Radeon RX7900XT (20GB)
45+
- Setup 2: Instinct MI300X (192GB)
46+
- Setup 3: Ryzen AI 395(128GB unified memory)
47+
- Setup 4: Radeon RX9070XT (16GB)
48+
- Model: Qwen3-0.6B
49+
- Total Requests: 256 sequences
50+
- Input Length: Randomly sampled between 100–1024 tokens
51+
- Output Length: Randomly sampled between 100–1024 tokens
52+
53+
**Setup 1 Performance Results:**
54+
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
55+
|----------------|-------------|----------|-----------------------|
56+
| vLLM | 133,966 | 61.80 | 2167.84 |
57+
| Nano-vLLM | 133,966 | 65.91 | 2032.36 |
58+
59+
**Setup 2 Performance Results:**
60+
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
61+
|----------------|-------------|----------|-----------------------|
62+
| vLLM | 133,966 | 8.93 | 14994.98 |
63+
| Nano-vLLM | 133,966 | 20.17 | 6640.22 |
64+
65+
**Setup 3 Performance Results:**
66+
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
67+
|----------------|-------------|----------|-----------------------|
68+
| vLLM | 133,966 | 114.72 | 1167.76 |
69+
| Nano-vLLM | 133,966 | 123.81 | 1082.05 |
70+
71+
**Setup 4 Performance Results:**
72+
| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |
73+
|----------------|-------------|----------|-----------------------|
74+
| vLLM | 133,966 | 47.12 | 2842.8 |
75+
| Nano-vLLM | 133,966 | * | * |
76+
77+
*Known issue: nano-vllm has memory access fault issue on RX9070XT to be fixed.

0 commit comments

Comments
 (0)