|
| 1 | +# Intel HPU Installation for running ERNIE 4.5 Series Models |
| 2 | + |
| 3 | +The following installation methods are available when your environment meets these requirements: |
| 4 | + |
| 5 | +- Python 3.10 |
| 6 | +- Intel Gaudi 2 |
| 7 | +- Intel Gaudi software version 1.22.0 |
| 8 | +- Linux X86_64 |
| 9 | + |
| 10 | +### 1. Run Docker Container |
| 11 | + |
| 12 | +Use the following commands to run a Docker container. Make sure to update the versions below as listed in the [Support Matrix](https://docs.habana.ai/en/latest/Support_Matrix/Support_Matrix.html): |
| 13 | + |
| 14 | +```{.console} |
| 15 | +$ docker pull vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest |
| 16 | +$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:latest |
| 17 | +``` |
| 18 | + |
| 19 | +### 2. Install PaddlePaddle |
| 20 | + |
| 21 | +```bash |
| 22 | +python -m pip install paddlepaddle==3.1.1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/ |
| 23 | +``` |
| 24 | + |
| 25 | +### 3. Install PaddleCustomDevice |
| 26 | +```shell |
| 27 | +git clone https://github.com/PaddlePaddle/PaddleCustomDevice |
| 28 | +cd PaddleCustomDevice/backends/intel_hpu/ |
| 29 | +mkdir -p build |
| 30 | +cd build |
| 31 | +cmake .. |
| 32 | +make -j |
| 33 | +pip install --force-reinstall dist/paddle_intel_hpu*.whl |
| 34 | +cd PaddleCustomDevice/backends/intel_hpu/custom_ops |
| 35 | +python setup.py install |
| 36 | +``` |
| 37 | + |
| 38 | +### 4. Install FastDeploy |
| 39 | + |
| 40 | +```shell |
| 41 | +git clone https://github.com/PaddlePaddle/FastDeploy |
| 42 | +cd FastDeploy |
| 43 | +bash build.sh |
| 44 | +``` |
| 45 | + |
| 46 | +## Prepare the inference demo |
| 47 | + |
| 48 | +### 1. Start inference service |
| 49 | +```shell |
| 50 | +export GC_KERNEL_PATH=/usr/lib/habanalabs/libtpc_kernels.so |
| 51 | +export GC_KERNEL_PATH=/usr/local/lib/python3.10/dist-packages/paddle_custom_device/intel_hpu/libcustom_tpc_perf_lib.so:$GC_KERNEL_PATH |
| 52 | +export INTEL_HPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 |
| 53 | +export PADDLE_DISTRI_BACKEND=xccl |
| 54 | +export PADDLE_XCCL_BACKEND=intel_hpu |
| 55 | +export HABANA_PROFILE=0 |
| 56 | +export HPU_VISIBLE_DEVICES=0 |
| 57 | + |
| 58 | +HPU_WARMUP_BUCKET=1 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128 |
| 59 | +``` |
| 60 | + |
| 61 | +### 2. Launch the request |
| 62 | +```bash |
| 63 | +curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \ |
| 64 | +-H "Content-Type: application/json" \ |
| 65 | +-d '{ |
| 66 | + "messages": [ |
| 67 | + {"role": "user", "content": "What is AI?"} |
| 68 | + ], "max_tokens": 24 |
| 69 | +}' |
| 70 | +``` |
| 71 | + |
| 72 | +### 3. Successfully returns the result |
| 73 | +```json |
| 74 | +{"id":"chatcmpl-3bd98ae2-fafe-46ae-a552-d653a8526503","object":"chat.completion","created":1757653575,"model":"ERNIE-4.5-21B-A3B-Paddle","choices":[{"index":0,"message":{"role":"assistant","content":"**AI (Artificial Intelligence)** refers to the development of computer systems that can perform tasks typically requiring human intelligence.","multimodal_content":null,"reasoning_content":null,"tool_calls":null,"prompt_token_ids":null,"completion_token_ids":null,"text_after_process":null,"raw_prediction":null,"prompt_tokens":null,"completion_tokens":null},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":11,"total_tokens":35,"completion_tokens":24,"prompt_tokens_details":{"cached_tokens":0}}} |
| 75 | +``` |
0 commit comments