-
Notifications
You must be signed in to change notification settings - Fork 623
[Intel HPU] Support intel hpu platform #4161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
7e59562
to
d7509a6
Compare
try: | ||
# assert len(paddle.static.cuda_places()) > 0 | ||
return True | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check doesn't seem to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# PACKAGE = "fastdeploy.model_executor.ops.intel_hpu" | ||
PACKAGE = "paddlenlp_ops" | ||
|
||
import_custom_ops(PACKAGE, "paddlenlp_ops", globals()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here should be fastdeploy.model_executor.ops.intel_hpu
instead of paddlenlp_ops
?
Is this because of the naming convention of the ops implementation in custom device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, real custom ops come from paddlecustomdevice, we just rename it in fastdeploy
@@ -0,0 +1,21 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024->2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
raise NotImplementedError | ||
|
||
|
||
class AttentionBackend_HPU(AttentionBackend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be better to move this class to fastdeploy/model_executor/layers/attention/hpu_attn_backend.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
fastdeploy/engine/args_utils.py
Outdated
"--enable-tensor-or-expert-parallel", | ||
action='store_true', | ||
default=EngineArgs.enable_tensor_or_expert_parallel, | ||
help="Enable tensor parallelism for non-MoE and expert parallelism for MoE.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we enable tp + ep by setting --enable-expert-parallel
and --tensor-parrllel-size
without adding a new argument ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently EP is combined with DP, so we can't enable tp + ep with existing parameters
https://github.com/PaddlePaddle/FastDeploy/blob/develop/fastdeploy/config.py#L316-L318
https://github.com/PaddlePaddle/FastDeploy/blob/develop/fastdeploy/model_executor/layers/moe/moe.py#L132-L134
fastdeploy/worker/worker_process.py
Outdated
|
||
parallel_config.engine_worker_queue_port = parallel_config.engine_worker_queue_port[ | ||
parallel_config.local_data_parallel_id | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All CI fails at this line. TypeError: '\''int'\'' object is not subscriptable'
. We need to solve it first and then see if there are any other problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -0,0 +1,314 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
layers目录下有一个backends文件夹,里边放着各类device的layer有关的实现,把attention和moe的实现都放到这个文件夹下吧
elif current_platform.is_intel_hpu(): | ||
self.forward = self.forward_intel_hpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forard_cuda名字可能现在已经不太合适叫这个了,应该是可以复用forward_cuda的,逻辑都是一样的
elif current_platform.is_intel_hpu(): | ||
self.forward = self.forward_intel_hpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个和其他硬件平台有何不同之处吗,为啥需要单独写逻辑,不能抽象为几个op然后调用forward_cuda吗
from fastdeploy.platforms import current_platform | ||
|
||
|
||
def reload_ep_checkpoint(model_path: str, fd_config: FDConfig, state_dict: dict, return_numpy: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么会修改加载模型这块儿的内容,是因为用的不是官方的模型吗
self.expert_parallel_size = 1 # EP degree | ||
self.data_parallel_size = 1 # DP degree | ||
self.enable_expert_parallel = False | ||
self.enable_tensor_or_expert_parallel = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不能通过enable_expert_parallel或者是expert_parallel_size,tensor_parallel_size等这些字段组合判断吗,必须要给用户接口加新字段吗
cache_cfg = CacheConfig(all_dict) | ||
load_cfg = LoadConfig(all_dict) | ||
parallel_cfg = ParallelConfig(all_dict) | ||
cache_cfg.enc_dec_block_num = self.static_decode_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be better to set this value as https://github.com/PaddlePaddle/FastDeploy/blob/release/2.2/fastdeploy/config.py#L899 to avoid impact on other hardware.
FastDeploy在Intel HPU上已完成ERNIE 4.5模型的适配
依赖信息:
Gaudi software: 1.22.0
PaddlePaddle:3.1.1
PaddleCustomDevice: latest develop branch
更多模型的支持和性能的优化会继续更新。