-
Notifications
You must be signed in to change notification settings - Fork 313
[TransferEngine] Enable Huawei Ascend Transport for TransferEngine #502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Ascend Transport | ||
|
||
The source code path for Ascend Transport is `Mooncake/mooncake-transfer-engine/src/transport/ascend_transport`, which also includes automated build scripts and the README file. | ||
|
||
## Overview | ||
|
||
Ascend Transport is a high-performance zero-copy NPU data transfer library with one-sided semantics, directly compatible with Mooncake Transfer Engine. To compile and use the Ascend Transport library, please set the `USE_ASCEND` flag to `"ON"` in the `mooncake-common/common.cmake` file. | ||
|
||
Ascend Transport supports inter-NPU data transfer using one-sided semantics (currently supports Device to Device; other modes are under development). Users only need to specify the node and memory information at both ends through the Mooncake Transfer Engine interface to achieve high-performance point-to-point data transfer. Ascend Transport abstracts away internal complexities and automatically handles operations such as establishing connections, registering and exchanging memory, and checking transfer status. | ||
|
||
### New Dependencies | ||
|
||
In addition to the dependencies already required by Mooncake, Ascend Transport needs some HCCL-related dependencies: | ||
|
||
**MPI** | ||
```bash | ||
yum install -y mpich mpich-devel | ||
``` | ||
|
||
**Ascend Compute Architecture for Neural Networks** | ||
Ascend Compute Architecture for Neural Networks 8.1.RC1 version | ||
|
||
### One-Step Build Script | ||
|
||
Ascend Transport provides a one-step build script located at `scripts/ascend/dependencies_ascend.sh`. Copy this script to the desired installation directory and run it. You can also pass an installation path as an argument; if not provided, it defaults to the current directory: | ||
|
||
```bash | ||
./dependencies_ascend.sh /path/to/install_directory | ||
``` | ||
|
||
This script also supports environments where users cannot perform `git clone` directly. Users can place the source code for dependencies and Mooncake in the target directory, and the script will handle the compilation accordingly. | ||
|
||
### One-Step Installation Script (Without Building Mooncake) | ||
|
||
To avoid potential conflicts when running other processes during Mooncake compilation, Ascend Transport offers a solution that separates the build and runtime environments. | ||
|
||
After completing the Mooncake build via dependencies_ascend.sh, you can run dependencies_ascend_installation.sh to install only the required dependencies. Place the generated Mooncake .whl package and libascend_transport_mem.so into the installation directory. | ||
|
||
Copy the script to the installation directory and run: | ||
```bash | ||
./dependencies_ascend_installation.sh /path/to/install_directory | ||
``` | ||
|
||
Before use, ensure that `libascend_transport_mem.so` has been copied to `/usr/local/Ascend/ascend-toolkit/latest/python/site-packages`, then execute: | ||
```bash | ||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:$LD_LIBRARY_PATH | ||
``` | ||
|
||
### Build Instructions | ||
|
||
Once all dependencies are installed successfully, you can proceed with building Mooncake normally. If errors occur, try setting the following environment variable: | ||
```bash | ||
export CPLUS_INCLUDE_PATH=$(echo $CPLUS_INCLUDE_PATH | tr ':' '\n' | grep -v "/usr/local/Ascend" | paste -sd: -) | ||
``` | ||
|
||
### Endpoint Management | ||
|
||
Each Huawei NPU card has a dedicated parameter-plane NIC and should be managed by a single `TransferEngine` instance responsible for all its data transfers. | ||
|
||
### Ranktable Management | ||
Ascend Transport does not rely on global Ranktable information. It only needs to obtain the local Ranktable information of the current NPU card. During the initialization of Ascend Transport, it will automatically parse the /etc/hccn.conf file to acquire this information. | ||
|
||
### Initialization | ||
|
||
When using Ascend Transport, the `TransferEngine` must still call the `init` function after construction: | ||
|
||
```cpp | ||
TransferEngine(); | ||
|
||
int TransferEngine::init(const std::string &metadata_conn_string, | ||
const std::string &local_server_name, | ||
const std::string &ip_or_host_name, | ||
uint64_t rpc_port) | ||
``` | ||
|
||
The only difference is that the `local_server_name` parameter must now include the physical NPU card ID. The format changes from `ip:port` to `ip:port:npu_x`, e.g., `"0.0.0.0:12345:npu_2"`. | ||
|
||
> **Note**: This extension of the `local_server_name` is used internally by Ascend Transport without modifying Mooncake's external API. The `segment_desc_name` in metadata remains in the original format (`ip:port`). Therefore, each NPU card must use a unique port that is not occupied. | ||
|
||
### Metadata Service | ||
|
||
Ascend Transport is compatible with all metadata services currently supported by Mooncake, including `etcd`, `redis`, `http`, and `p2phandshake`. Upon initialization, Ascend Transport registers key NPU card information such as `device_id`, `device_ip`, `rank_id`, and `server_ip`. | ||
|
||
### Data Transfer | ||
|
||
Ascend Transport supports write/read semantics and automatically determines whether cross-HCCS communication is needed, selecting either HCCS or ROCE as the underlying transport protocol. Users can use the standard Mooncake `getTransferStatus` API to monitor the progress of each transfer request. | ||
|
||
### Fault Handling | ||
|
||
Building upon HCCL’s built-in fault handling mechanisms, Ascend Transport implements comprehensive error recovery strategies across multiple stages, including initialization, connection setup, and data transfer. It incorporates retry logic and returns precise error codes based on HCCL collective communication standards when retries fail. For detailed logs, refer to `/root/Ascend/log/plog`. | ||
|
||
### Test Cases | ||
|
||
Ascend Transport provides two test files: | ||
- Multi-scenario test: `mooncake-transfer-engine/example/transfer_engine_ascend_one_sided.cpp` | ||
- Performance test: `mooncake-transfer-engine/example/transfer_engine_ascend_perf.cpp` | ||
|
||
You can configure various scenarios (e.g., 1-to-1, 1-to-2, 2-to-1) and performance tests by passing valid parameters to these programs. | ||
|
||
#### Example Commands for Scenario Testing | ||
|
||
**Start Initiator Node:** | ||
```bash | ||
./transfer_engine_ascend_one_sided --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12345 --protocol=hccl --operation=write --segment_id=10.0.0.0:12346 --device_id=0 --mode=initiator --block_size=8388608 | ||
``` | ||
|
||
**Start Target Node:** | ||
```bash | ||
./transfer_engine_ascend_one_sided --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12346 --protocol=hccl --operation=write --device_id=1 --mode=target --block_size=8388608 | ||
``` | ||
|
||
#### Example Commands for Performance Testing | ||
|
||
**Start Initiator Node:** | ||
```bash | ||
./transfer_engine_ascend_perf --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12345 --protocol=hccl --operation=write --segment_id=10.0.0.0:12346 --device_id=0 --mode=initiator --block_size=8388608 | ||
``` | ||
|
||
**Start Target Node:** | ||
```bash | ||
./transfer_engine_ascend_perf --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12346 --protocol=hccl --operation=write --device_id=1 --mode=target | ||
``` | ||
|
||
### Print Description | ||
If you need to obtain information about whether each transport request is cross-hccs and its corresponding execution time, you can enable the related logs by setting the environment variable. Use the following command to turn on the logging: | ||
|
||
```bash | ||
export ASCEND_TRANSPORT_PRINT=1 | ||
``` | ||
|
||
### Notes | ||
ascend_transport will establish a TCP connection on the host side. This connection uses port (10000 + deviceId). Please avoid using this port for other services to prevent conflicts. | ||
|
||
ascend_transport has an automatic reconnection mechanism in place that triggers after a transfer is completed, in case the remote end goes offline and restarts. There is no need to manually restart the local service. | ||
|
||
Note If the target end goes offline and restarts, the initiating end will attempt to re-establish the connection when it sends the next request. The target must complete its restart and become ready within 5 seconds after the initiator sends the request. If the target does not become available within this window, the connection will fail and an error will be returned. | ||
|
||
### Timeout Configuration | ||
Ascend Transport uses TCP-based out-of-band communication on the host side, with a receive timeout set to 120 seconds. | ||
|
||
Connection timeout is controlled by the environment variable HCCL_CONNECT_TIMEOUT. | ||
Execution timeout is configured via HCCL_EXEC_TIMEOUT. | ||
If no communication occurs within this timeout, the hccl_socket connection will be terminated. | ||
|
||
Point-to-point communication between endpoints involves a connection handshake with a timeout of 120 seconds. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Ascend Transport | ||
Ascend Transport源代码路径为Mooncake/mooncake-transfer-engine/src/transport/ascend_transport,该路径下还包含自动化编译脚本、README文件。 | ||
## 概述 | ||
Ascend Transport是一个单边语义的高性能零拷贝NPU数据传输库,直接兼容Mooncake Transfer Engine。要编译使用Ascend Transport库,请在mooncake-common\common.cmake文件中将USE_ASCEND开关置于"ON"。 | ||
|
||
Ascend Transport支持使用单边语义进行NPU间数据传输(当前版本只支持DEVICE TO DEVICE,其它进行中),用户只需通过Mooncake Transfer Engine接口指定两端传输的节点及内存信息,即可完成点对点高性能传输。Ascend Transport为用户隐去繁琐的内部实现,自动完成一系列如建链、注册和交换内存、检查传输状态等操作。 | ||
|
||
### 新增依赖 | ||
Ascend Transport在Mooncake本身依赖的基础上,新增了一部分HCCL的依赖: | ||
**MPI** | ||
yum install -y mpich mpich-devel | ||
|
||
**昇腾Compute Architecture for Neural Networks** | ||
昇腾Compute Architecture for Neural Networks 8.1.RC1版本 | ||
|
||
### 一键式编译脚本 | ||
Ascend Transport提供一键式编译脚本,脚本位置为scripts/ascend/dependencies_ascend.sh,将脚本复制到想要安装依赖和Mooncake的目录下执行脚本即可,也支持传入参数指定安装路径,不传入时默认为脚本所在目录,命令如下: | ||
./dependencies_ascend.sh /path/to/install_directory | ||
一键式编译脚本同样考虑到用户无法直接在环境上git clone的情况,用户可以把给依赖和Mooncake的源码放在安装目录下并指定,脚本会自动编译依赖和Mooncake。 | ||
|
||
### 一键式安装脚本(不编译Mooncake) | ||
为了避免用户出现在编译Mooncake的环境上,执行其它进程有冲突的可能问题,Ascend Transport给出编译和执行Mooncake分离的一种方案。 | ||
在执行dependencies_ascend.sh完成Mooncake编译后,用户可执行scripts/ascend/dependencies_ascend_installation.sh,仅安装依赖。将一键式编译脚本生成的mooncake whl包和libascend_transport_mem.so放在安装目录下 | ||
|
||
将脚本复制到安装目录下执行脚本即可,命令如下: | ||
./dependencies_ascend_installation.sh /path/to/install_directory | ||
|
||
在使用前,确保编译生成的libascend_transport_mem.so文件已经复制到/usr/local/Ascend/ascend-toolkit/latest/python/site-packages下,然后执行命令: | ||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:$LD_LIBRARY_PATH | ||
|
||
### 编译说明 | ||
在成功安装所有依赖后,正常编译Mooncake即可,如果报错,可尝试设置环境变量: | ||
export CPLUS_INCLUDE_PATH=$(echo $CPLUS_INCLUDE_PATH | tr ':' '\n' | grep -v "/usr/local/Ascend" | paste -sd: -) | ||
|
||
### 端点管理 | ||
每张华为NPU卡拥有一张参数面网卡,对应使用一个TransferEngine管理该NPU卡上所有的传输操作。 | ||
|
||
### Ranktable管理 | ||
Ascend Transport不依赖全局的Ranktable信息,只需要获取当前NPU卡的本地Ranktable信息,在Ascend Transport初始化时会解析/etc/hccn.conf文件自动获取。 | ||
|
||
### 初始化 | ||
使用Ascend Transport时,TransferEngine 在完成构造后同样需要调用 `init` 函数进行初始化: | ||
```cpp | ||
TransferEngine(); | ||
|
||
int TransferEngine::init(const std::string &metadata_conn_string, | ||
const std::string &local_server_name, | ||
const std::string &ip_or_host_name, | ||
uint64_t rpc_port) | ||
``` | ||
唯一的区别在于,在TransferEngine init时需要在local_server_name参数中包含TransferEngine所在的NPU物理卡号,local_server_name参数从ip:port改为ip:port:npu_x,例如"0.0.0.0:12345:npu_2"。 | ||
|
||
```注意```:Ascend Transport只是在不修改Mooncake对外接口的形势下,传入所在的NPU物理卡号,只为Transport内部使用,metadata的segment_desc_name仍然还是原来的形式,即ip:port,因此,每张npu卡的port需要保证互不相同,且端口未被占用。 | ||
|
||
### metadata服务 | ||
Ascend Transport兼容所有Mooncake当前支持的metadata服务,包括 etcd, redis 和 http,以及p2phandshake。Ascend Transport在初始化时会注册当前NPU卡的一系列信息,包括device_id\device_ip\rank_id\server_ip等。 | ||
|
||
### 数据传输 | ||
Ascend Transport支持write/read语义,且会自动判断是否跨HCCS通信,选择HCCS/ROCE的底层通信协议,用户只需采用Mooncake的getTransferStatus接口即可获取每个请求的传输情况。 | ||
|
||
### 故障处理 | ||
Ascend Transport在HCCL本身的故障处理基础上,设计了完善的故障处理机制。针对初始化、建链、数据传输等多个阶段可能出现的故障,新增或沿用了失败重试机制。在重试仍然失败后,沿用了HCCL集合通信相关操作错误码,给出精准的报错信息。为获取更详细的错误信息,也可以查询/root/Ascend/log目录下的plog日志。 | ||
|
||
### 测试用例 | ||
Ascend Transport提供多场景测试mooncake-transfer-engine/example/transfer_engine_ascend_one_sided.cpp和性能测试mooncake-transfer-engine/example/transfer_engine_ascend_perf.cpp两个测试文件,根据测试头部设置的可传入参数传入合法参数, | ||
可以完成一对一、一对二、二对一多种场景和性能测试。 | ||
|
||
多场景用例执行命令如: | ||
```启动发起节点:``` | ||
./transfer_engine_ascend_one_sided --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12345 --protocol=hccl --operation=write --segment_id=10.0.0.0:12346 --device_id=0 --mode=initiator --block_size=8388608 | ||
```启动目标节点:``` | ||
./transfer_engine_ascend_one_sided --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12346 --protocol=hccl --operation=write --device_id=1 --mode=target --block_size=8388608 | ||
|
||
性能用例执行命令如: | ||
```启动发起节点:``` | ||
./transfer_engine_ascend_perf --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12345 --protocol=hccl --operation=write --segment_id=10.0.0.0:12346 --device_id=0 --mode=initiator --block_size=8388608 | ||
```启动目标节点:``` | ||
./transfer_engine_ascend_perf --metadata_server=P2PHANDSHAKE --local_server_name=10.0.0.0:12346 --protocol=hccl --operation=write --device_id=1 --mode=target | ||
|
||
### 打印说明 | ||
如果需要得到每个传输request请求是否跨hccs和耗时情况,可以通过设置环境变量打开相关打印,命令如下: | ||
export ASCEND_TRANSPORT_PRINT=1 | ||
|
||
### 注意事项 | ||
1.ascend_transport会建立一个host侧的tcp连接,占用端口为10000+deviceId,请注意避开此端口,勿重复占用 | ||
2.ascend_transport 在一次传输结束后,若对端(remote end)发生掉线并重启,系统已设计有自动重试建链机制,无需手动重启本端服务。 | ||
注意:若目标端发生掉线并重启,发起端在下次发起请求时会尝试重新建立连接。目标端需确保在发起端发起请求后的 5 秒内完成重启并进入就绪状态。若超过该时间窗口仍未恢复,连接将失败并返回错误。 | ||
|
||
### 超时时间配置 | ||
Ascend Transport基于TCP的带外通信,在主机侧接收超时设置为 120 秒。 | ||
|
||
在hccl_socket中,连接超时时间由环境变量HCCL_CONNECT_TIMEOUT配置,执行超时通过环境变量HCCL_EXEC_TIMEOUT配置,超过HCCL_EXEC_TIMEOUT未进行通信,会断开hccl_socket连接。 | ||
|
||
在transport_mem中,端到端之间的点对点通信涉及连接握手过程,其超时时间为 120 秒。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more hardwares are supported in the future, these lines should be revisited. @alogfans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe consider making this feature optional?