Skip to content

BUG: NCCL error: #1622

@ye7love7

Description

@ye7love7

使用v0.12.0docker镜像部署,启动命令如下:
sudo docker run -d -v /home/tskj/MOD/:/home/MOD/ -e XINFERENCE_HOME=/home/MOD -p 9997:9997 --gpus all xprobe/xinference:v0.12.0 xinference-local -H 0.0.0.0 --log-level debug
我是8卡,选择8后,模型报错:
2024-06-12 05:08:11,767 xinference.core.worker 95 DEBUG Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x7f013c4aa700>_uid': 'gpt-3.5-turbo-1-0', 'model_name': 'Qwen1.5-110B-Chat', 'model_size_in_billions': 110, 'model_format': 'pytorch', 'quantization': 'none', 'model_engl_type': 'LLM', 'n_gpu': 8, 'request_limits': None, 'peft_model_config': None, 'gpu_idx': None, 'gpu_memory_utilization': 0.9, 'max_model_len': 32768}
2024-06-12 05:08:11,767 xinference.core.worker 95 DEBUG GPU selected: [0, 1, 2, 3, 4, 5, 6, 7] for model gpt-3.5-turbo-1-0
2024-06-12 05:08:15,436 xinference.model.llm.core 95 DEBUG Launching gpt-3.5-turbo-1-0 with VLLMChatModel
2024-06-12 05:08:15,437 xinference.model.llm.llm_family 95 INFO Caching from URI: /home/MOD/Qwen/Qwen1.5-110B-Chat
2024-06-12 05:08:15,437 xinference.model.llm.llm_family 95 INFO Cache /home/MOD/Qwen/Qwen1.5-110B-Chat exists
2024-06-12 05:08:15,461 xinference.model.llm.vllm.core 210 INFO Loading gpt-3.5-turbo with following model config: {'gpu_memory_utilization': 0.9, 'max 'tokenizer_mode': 'auto', 'trust_remote_code': True, 'tensor_parallel_size': 8, 'block_size': 16, 'swap_space': 4, 'max_num_seqs': 256, 'quantization': Nose. Lora count: 0.
2024-06-12 05:08:17,542 WARNING services.py:2009 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67067904 bytes avharm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by p10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-06-12 05:08:18,666 INFO worker.py:1753 -- Started a local Ray instance.
INFO 06-12 05:08:20 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='/home/MOD/Qwen/Qwen1.5-110B-Chat', speculative_config=None, D/Qwen/Qwen1.5-110B-Chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=Truat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=None, enforcache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_modeln/Qwen1.5-110B-Chat)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 06-12 05:08:45 utils.py:618] Found nccl from library libnccl.so.2
INFO 06-12 05:08:45 pynccl.py:65] vLLM is using nccl==2.20.5
(RayWorkerWrapper pid=6318) INFO 06-12 05:08:45 utils.py:618] Found nccl from library libnccl.so.2
(RayWorkerWrapper pid=6318) INFO 06-12 05:08:45 pynccl.py:65] vLLM is using nccl==2.20.5
ERROR 06-12 05:08:46 worker_base.py:148] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 06-12 05:08:46 worker_base.py:148] Traceback (most recent call last):
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140, in execute_method
ERROR 06-12 05:08:46 worker_base.py:148] return executor(*args, **kwargs)
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 114, in init_device
ERROR 06-12 05:08:46 worker_base.py:148] init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 349, in init_worker_distributed_envir
ERROR 06-12 05:08:46 worker_base.py:148] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 239, in ensure_model_par
ERROR 06-12 05:08:46 worker_base.py:148] initialize_model_parallel(tensor_model_parallel_size,
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 191, in initialize_model
ERROR 06-12 05:08:46 worker_base.py:148] _TP_PYNCCL_COMMUNICATOR = PyNcclCommunicator(
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/pynccl.py", line 94, in __in
ERROR 06-12 05:08:46 worker_base.py:148] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 244nk
ERROR 06-12 05:08:46 worker_base.py:148] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 223
ERROR 06-12 05:08:46 worker_base.py:148] raise RuntimeError(f"NCCL error: {error_str}")
ERROR 06-12 05:08:46 worker_base.py:148] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details)
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] Error executing method init_device. This might cause deadlock in distributed execution
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] Traceback (most recent call last):
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 140,
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] return executor(*args, **kwargs)
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 114, in i
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] init_worker_distributed_environment(self.parallel_config, self.rank,
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 349, in ited_environment
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", lmodel_parallel_initialized
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] initialize_model_parallel(tensor_model_parallel_size,
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", lize_model_parallel
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] _TP_PYNCCL_COMMUNICATOR = PyNcclCommunicator(
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/, in init
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/ line 244, in ncclCommInitRank
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] File "/opt/conda/lib/python3.10/site-packages/vllm/distributed/device_communicators/ line 223, in NCCL_CHECK
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] raise RuntimeError(f"NCCL error: {error_str}")
(RayWorkerWrapper pid=6318) ERROR 06-12 05:08:46 worker_base.py:148] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details

服务器nccl如下:chatchat) yqga@dnb:~$ dpkg -l|grep nccl

ii libnccl-dev 2.21.5-1+cuda12.5 amd64 NVIDIA Collective Communication Library (NCCL) Development Files
ii libnccl2 2.21.5-1+cuda12.5 amd64 NVIDIA Collective Communication Library (NCCL) Runtime
ii nccl-local-repo-ubuntu2204-2.21.5-cuda12.5 1.0-1 amd64 nccl-local repository configuration files

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggpu

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions