Skip to content

Conversation

fmiao2372
Copy link

FastDeploy在Intel HPU上已完成ERNIE 4.5模型的适配

依赖信息:
Gaudi software: 1.22.0
PaddlePaddle:3.1.1
PaddleCustomDevice: latest develop branch

更多模型的支持和性能的优化会继续更新。

Copy link

paddle-bot bot commented Sep 17, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Sep 17, 2025
@fmiao2372 fmiao2372 force-pushed the integration_upstreaming branch from 7e59562 to d7509a6 Compare September 17, 2025 12:49
try:
# assert len(paddle.static.cuda_places()) > 0
return True
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't seem to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# PACKAGE = "fastdeploy.model_executor.ops.intel_hpu"
PACKAGE = "paddlenlp_ops"

import_custom_ops(PACKAGE, "paddlenlp_ops", globals())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here should be fastdeploy.model_executor.ops.intel_hpu instead of paddlenlp_ops ?

Is this because of the naming convention of the ops implementation in custom device?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, real custom ops come from paddlecustomdevice, we just rename it in fastdeploy

@@ -0,0 +1,21 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024->2025

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

raise NotImplementedError


class AttentionBackend_HPU(AttentionBackend):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to move this class to fastdeploy/model_executor/layers/attention/hpu_attn_backend.py ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

"--enable-tensor-or-expert-parallel",
action='store_true',
default=EngineArgs.enable_tensor_or_expert_parallel,
help="Enable tensor parallelism for non-MoE and expert parallelism for MoE.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we enable tp + ep by setting --enable-expert-parallel and --tensor-parrllel-size without adding a new argument ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


parallel_config.engine_worker_queue_port = parallel_config.engine_worker_queue_port[
parallel_config.local_data_parallel_id
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CI fails at this line. TypeError: '\''int'\'' object is not subscriptable' . We need to solve it first and then see if there are any other problems

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -0,0 +1,314 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layers目录下有一个backends文件夹,里边放着各类device的layer有关的实现,把attention和moe的实现都放到这个文件夹下吧

Comment on lines +121 to +122
elif current_platform.is_intel_hpu():
self.forward = self.forward_intel_hpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forard_cuda名字可能现在已经不太合适叫这个了,应该是可以复用forward_cuda的,逻辑都是一样的

Comment on lines +212 to +213
elif current_platform.is_intel_hpu():
self.forward = self.forward_intel_hpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个和其他硬件平台有何不同之处吗,为啥需要单独写逻辑,不能抽象为几个op然后调用forward_cuda吗

from fastdeploy.platforms import current_platform


def reload_ep_checkpoint(model_path: str, fd_config: FDConfig, state_dict: dict, return_numpy: bool = False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么会修改加载模型这块儿的内容,是因为用的不是官方的模型吗

self.expert_parallel_size = 1 # EP degree
self.data_parallel_size = 1 # DP degree
self.enable_expert_parallel = False
self.enable_tensor_or_expert_parallel = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不能通过enable_expert_parallel或者是expert_parallel_size,tensor_parallel_size等这些字段组合判断吗,必须要给用户接口加新字段吗

cache_cfg = CacheConfig(all_dict)
load_cfg = LoadConfig(all_dict)
parallel_cfg = ParallelConfig(all_dict)
cache_cfg.enc_dec_block_num = self.static_decode_blocks
Copy link
Collaborator

@zoooo0820 zoooo0820 Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to set this value as https://github.com/PaddlePaddle/FastDeploy/blob/release/2.2/fastdeploy/config.py#L899 to avoid impact on other hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants