|
| 1 | +# Installation Guide on AMD GPUs |
| 2 | + |
| 3 | +This guide shows nano-vllm users how to install nano-vllm and how it performs on AMD platform. |
| 4 | + |
| 5 | + |
| 6 | +## Installation on AMD GPUs |
| 7 | + |
| 8 | +### Launch container environment |
| 9 | + |
| 10 | +```bash |
| 11 | +CONTAINER_NAME=<your container name> |
| 12 | +IMAGE_NAME=rocm/vllm-dev:rocm6.4.1_navi_ubuntu24.04_py3.12_pytorch_2.7_vllm_0.8.5 |
| 13 | +# For AMD Instinct GPUs, users can select latest pre-built docker image: |
| 14 | +# rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715. See https://hub.docker.com/r/rocm/vllm/tags |
| 15 | + |
| 16 | +docker run -it \ |
| 17 | + --rm \ |
| 18 | + --device /dev/dri \ |
| 19 | + --device /dev/kfd \ |
| 20 | + --network host \ |
| 21 | + --ipc host \ |
| 22 | + --group-add video \ |
| 23 | + --cap-add SYS_PTRACE \ |
| 24 | + --security-opt seccomp=unconfined \ |
| 25 | + --privileged \ |
| 26 | + --shm-size 8G \ |
| 27 | + --name ${CONTAINER_NAME} \ |
| 28 | + ${IMAGE_NAME} /bin/bash |
| 29 | +``` |
| 30 | + |
| 31 | +### Install through pip |
| 32 | + |
| 33 | +```bash |
| 34 | +pip install --no-build-isolation git+https://github.com/GeeeekExplorer/nano-vllm.git |
| 35 | +``` |
| 36 | + |
| 37 | + |
| 38 | +## Benchmark on AMD GPUs |
| 39 | + |
| 40 | +See `bench.py` for benchmark. |
| 41 | + |
| 42 | +**Test Configuration:** |
| 43 | +- Hardware: |
| 44 | + - Setup 1: Radeon RX7900XT (20GB) |
| 45 | + - Setup 2: Instinct MI300X (192GB) |
| 46 | + - Setup 3: Ryzen AI 395(128GB unified memory) |
| 47 | + - Setup 4: Radeon RX9070XT (16GB) |
| 48 | +- Model: Qwen3-0.6B |
| 49 | +- Total Requests: 256 sequences |
| 50 | +- Input Length: Randomly sampled between 100–1024 tokens |
| 51 | +- Output Length: Randomly sampled between 100–1024 tokens |
| 52 | + |
| 53 | +**Setup 1 Performance Results:** |
| 54 | +| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |
| 55 | +|----------------|-------------|----------|-----------------------| |
| 56 | +| vLLM | 133,966 | 61.80 | 2167.84 | |
| 57 | +| Nano-vLLM | 133,966 | 65.91 | 2032.36 | |
| 58 | + |
| 59 | +**Setup 2 Performance Results:** |
| 60 | +| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |
| 61 | +|----------------|-------------|----------|-----------------------| |
| 62 | +| vLLM | 133,966 | 8.93 | 14994.98 | |
| 63 | +| Nano-vLLM | 133,966 | 20.17 | 6640.22 | |
| 64 | + |
| 65 | +**Setup 3 Performance Results:** |
| 66 | +| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |
| 67 | +|----------------|-------------|----------|-----------------------| |
| 68 | +| vLLM | 133,966 | 114.72 | 1167.76 | |
| 69 | +| Nano-vLLM | 133,966 | 123.81 | 1082.05 | |
| 70 | + |
| 71 | +**Setup 4 Performance Results:** |
| 72 | +| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |
| 73 | +|----------------|-------------|----------|-----------------------| |
| 74 | +| vLLM | 133,966 | 47.12 | 2842.8 | |
| 75 | +| Nano-vLLM | 133,966 | * | * | |
| 76 | + |
| 77 | +*Known issue: nano-vllm has memory access fault issue on RX9070XT to be fixed. |
0 commit comments