Skip to content

Commit d7509a6

Browse files
committed
[Intel HPU] Support intel hpu platform
1 parent 2745f37 commit d7509a6

36 files changed

+2985
-23
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**,
6060
- [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md)
6161
- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md)
6262
- [MetaX GPU](./docs/get_started/installation/metax_gpu.md)
63+
- [Intel HPU](./docs/get_started/installation/intel_hpu.md)
6364

6465
**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates!
6566

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ FastDeploy 支持在**英伟达(NVIDIA)GPU**、**昆仑芯(Kunlunxin)XPU
5858
- [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md)
5959
- [海光 DCU](./docs/zh/get_started/installation/hygon_dcu.md)
6060
- [沐曦 GPU](./docs/zh/get_started/installation/metax_gpu.md)
61+
- [英特尔 HPU](./docs/zh/get_started/installation/intel_hpu.md)
6162

6263
**注意:** 我们正在积极拓展硬件支持范围。目前,包括昇腾(Ascend)NPU 等其他硬件平台正在开发测试中。敬请关注更新!
6364

build.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,12 @@ function copy_ops(){
128128
echo -e "MACA ops have been copy to fastdeploy"
129129
return
130130
fi
131+
is_intel_hpu=`$python -c "import paddle; print(paddle.is_compiled_with_custom_device('intel_hpu'))"`
132+
if [ "$is_intel_hpu" = "True" ]; then
133+
DEVICE_TYPE="intel-hpu"
134+
echo -e "intel_hpu ops have been copy to fastdeploy"
135+
return
136+
fi
131137

132138
DEVICE_TYPE="cpu"
133139
cd ../../../../
@@ -159,7 +165,9 @@ function build_and_install_ops() {
159165
else
160166
FD_BUILDING_ARCS=${FD_BUILDING_ARCS} ${python} setup_ops.py install --install-lib ${OPS_TMP_DIR}
161167
fi
162-
find ${OPS_TMP_DIR} -type f -name "*.o" -exec rm -f {} \;
168+
if [ -d "${OPS_TMP_DIR}" ]; then
169+
find ${OPS_TMP_DIR} -type f -name "*.o" -exec rm -f {} \;
170+
fi
163171
else
164172
echo "Error: Invalid parameter '$FD_CPU_USE_BF16'. Please use true or false."
165173
exit 1

custom_ops/setup_ops.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,8 @@ def find_end_files(directory, end_str):
621621
],
622622
),
623623
)
624+
elif paddle.is_compiled_with_custom_device('intel_hpu'):
625+
pass
624626
else:
625627
use_bf16 = envs.FD_CPU_USE_BF16 == "True"
626628

docs/get_started/installation/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ FastDeploy currently supports installation on the following hardware platforms:
77
- [Enflame S60 GCU Installation](Enflame_gcu.md)
88
- [Iluvatar GPU Installation](iluvatar_gpu.md)
99
- [Hygon DCU Installation](hygon_dcu.md)
10+
- [Intel HPU Installation](intel_hpu.md)
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Intel HPU Installation for running ERNIE 4.5 Series Models
2+
3+
The following installation methods are available when your environment meets these requirements:
4+
5+
- Python 3.10
6+
- Intel Gaudi 2
7+
- Intel Gaudi software version 1.22.0
8+
- Linux X86_64
9+
10+
### 1. Run Docker Container
11+
12+
Use the following commands to run a Docker container. Make sure to update the versions below as listed in the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html):
13+
14+
```{.console}
15+
$ docker pull vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest
16+
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest
17+
```
18+
19+
### 2. Install PaddlePaddle
20+
21+
```bash
22+
python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
23+
```
24+
25+
### 3. Install PaddleCustomDevice
26+
```shell
27+
git clone https://github.com/PaddlePaddle/PaddleCustomDevice
28+
cd PaddleCustomDevice/backends/intel_hpu/
29+
mkdir -p build
30+
cd build
31+
cmake ..
32+
make -j
33+
pip install --force-reinstall dist/paddle_intel_hpu*.whl
34+
cd PaddleCustomDevice/backends/intel_hpu/custom_ops
35+
python setup.py install
36+
```
37+
38+
### 4. Install FastDeploy
39+
40+
```shell
41+
git clone https://github.com/PaddlePaddle/FastDeploy
42+
cd FastDeploy
43+
bash build.sh
44+
```
45+
46+
## Prepare the inference demo
47+
48+
### 1. Start inference service
49+
```shell
50+
export GC_KERNEL_PATH=/usr/lib/habanalabs/libtpc_kernels.so
51+
export GC_KERNEL_PATH=/usr/local/lib/python3.10/dist-packages/paddle_custom_device/intel_hpu/libcustom_tpc_perf_lib.so:$GC_KERNEL_PATH
52+
export INTEL_HPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
53+
export PADDLE_DISTRI_BACKEND=xccl
54+
export PADDLE_XCCL_BACKEND=intel_hpu
55+
export HABANA_PROFILE=0
56+
export HPU_VISIBLE_DEVICES=0
57+
58+
HPU_WARMUP_BUCKET=1 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128
59+
```
60+
61+
### 2. Launch the request
62+
```bash
63+
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
64+
-H "Content-Type: application/json" \
65+
-d '{
66+
"messages": [
67+
{"role": "user", "content": "What is AI?"}
68+
], "max_tokens": 24
69+
}'
70+
```
71+
72+
### 3. Successfully returns the result
73+
```json
74+
{"id":"chatcmpl-3bd98ae2-fafe-46ae-a552-d653a8526503","object":"chat.completion","created":1757653575,"model":"ERNIE-4.5-21B-A3B-Paddle","choices":[{"index":0,"message":{"role":"assistant","content":"**AI (Artificial Intelligence)** refers to the development of computer systems that can perform tasks typically requiring human intelligence.","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"text_after_process":null,"raw_prediction":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":11,"total_tokens":35,"completion_tokens":24,"prompt_tokens_details":{"cached_tokens":0}}}
75+
```

docs/zh/get_started/installation/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ FastDeploy支持如下硬件平台:
77
- [Enflame S60 GCU Installation](Enflame_gcu.md)
88
- [Iluvatar GPU Installation](iluvatar_gpu.md)
99
- [Hygon DCU Installation](hygon_dcu.md)
10+
- [Intel HPU Installation](intel_hpu.md)
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# 使用 Intel HPU 运行ERNIE 4.5 系列模型
2+
3+
在环境满足如下条件前提下
4+
5+
- Python 3.10
6+
- Intel Gaudi 2
7+
- Intel Gaudi software version 1.22.0
8+
- Linux X86_64
9+
10+
### 1. 运行Docker容器
11+
12+
使用下面命令运行Docker容器. 确保更新的版本在如下列表中 [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html):
13+
14+
```{.console}
15+
$ docker pull vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest
16+
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest
17+
```
18+
19+
### 2. 安装 PaddlePaddle
20+
21+
```bash
22+
python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
23+
```
24+
25+
### 3. 安装 PaddleCustomDevice
26+
```shell
27+
git clone https://github.com/PaddlePaddle/PaddleCustomDevice
28+
cd PaddleCustomDevice/backends/intel_hpu/
29+
mkdir -p build
30+
cd build
31+
cmake ..
32+
make -j
33+
pip install --force-reinstall dist/paddle_intel_hpu*.whl
34+
cd PaddleCustomDevice/backends/intel_hpu/custom_ops
35+
python setup.py install
36+
```
37+
38+
### 4. 安装 FastDeploy
39+
40+
```shell
41+
git clone https://github.com/PaddlePaddle/FastDeploy
42+
cd FastDeploy
43+
bash build.sh
44+
```
45+
46+
## 准备推理示例
47+
48+
### 1. 启动推理服务
49+
```shell
50+
export GC_KERNEL_PATH=/usr/lib/habanalabs/libtpc_kernels.so
51+
export GC_KERNEL_PATH=/usr/local/lib/python3.10/dist-packages/paddle_custom_device/intel_hpu/libcustom_tpc_perf_lib.so:$GC_KERNEL_PATH
52+
export INTEL_HPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
53+
export PADDLE_DISTRI_BACKEND=xccl
54+
export PADDLE_XCCL_BACKEND=intel_hpu
55+
export HABANA_PROFILE=0
56+
export HPU_VISIBLE_DEVICES=0
57+
58+
HPU_WARMUP_BUCKET=1 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128
59+
```
60+
61+
### 2. 发送请求
62+
```bash
63+
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
64+
-H "Content-Type: application/json" \
65+
-d '{
66+
"messages": [
67+
{"role": "user", "content": "What is AI?"}
68+
], "max_tokens": 24
69+
}'
70+
```
71+
72+
### 3. 成功返回结果
73+
```json
74+
{"id":"chatcmpl-3bd98ae2-fafe-46ae-a552-d653a8526503","object":"chat.completion","created":1757653575,"model":"ERNIE-4.5-21B-A3B-Paddle","choices":[{"index":0,"message":{"role":"assistant","content":"**AI (Artificial Intelligence)** refers to the development of computer systems that can perform tasks typically requiring human intelligence.","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"text_after_process":null,"raw_prediction":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":11,"total_tokens":35,"completion_tokens":24,"prompt_tokens_details":{"cached_tokens":0}}}
75+
```

fastdeploy/config.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,7 @@ def __init__(
267267
self.expert_parallel_size = 1 # EP degree
268268
self.data_parallel_size = 1 # DP degree
269269
self.enable_expert_parallel = False
270+
self.enable_tensor_or_expert_parallel = False
270271
self.local_data_parallel_id = 0
271272
# The embedding weight distributed on your gpu cards is divided by row or column.
272273
# Defaults to False means divide by row. When vocab_size can not be divided by world_size
@@ -1219,6 +1220,8 @@ def __init__(
12191220
self.device_ids = os.getenv("CUDA_VISIBLE_DEVICES", self.device_ids)
12201221
if current_platform.is_xpu():
12211222
self.device_ids = os.getenv("XPU_VISIBLE_DEVICES", self.device_ids)
1223+
if current_platform.is_intel_hpu():
1224+
self.device_ids = os.getenv("HPU_VISIBLE_DEVICES", self.device_ids)
12221225

12231226
self.read_from_config()
12241227
self.postprocess()

fastdeploy/distributed/communication.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,26 @@ def tensor_model_parallel_all_reduce(
6666

6767
except:
6868
tensor_model_parallel_all_reduce = None
69+
70+
from paddle.distributed.communication import stream
71+
from paddle.distributed.communication.reduce import ReduceOp
72+
73+
def all_reduce(
74+
tensor,
75+
op,
76+
group,
77+
sync_op: bool = True,
78+
):
79+
return stream.all_reduce(
80+
tensor, op=op, group=group, sync_op=sync_op, use_calc_stream=True
81+
)
82+
83+
@paddle.jit.marker.unified
84+
def tensor_model_parallel_all_reduce_custom(input_: paddle.Tensor) -> paddle.Tensor:
85+
"""All-reduce the input tensor across model parallel group on calc stream."""
86+
if paddle.in_dynamic_mode():
87+
hcg = dist.fleet.get_hybrid_communicate_group()
88+
mp_group = hcg.get_model_parallel_group()
89+
all_reduce(input_, op=ReduceOp.SUM, group=mp_group)
90+
else:
91+
dist.all_reduce(input_)

0 commit comments

Comments
 (0)