多卡运行模型启动报错，单卡运行正常

### Describe the bug
重启电脑后出现以下问题，重启前多卡是正常运行的
问题：多卡运行模型启动报错，单卡运行正常

### To Reproduce
To help us to reproduce this bug, please provide information below:

1. Your Python version. 
3.10
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2


3. The version of xinference you use.
0.12.1
![image](https://github.com/xorbitsai/inference/assets/48248936/2c8bf121-73e7-4c75-9cb2-4b3e92f1631f)


![image](https://github.com/xorbitsai/inference/assets/48248936/05123fe0-fb99-4f0f-8bd4-f335991f25e1)


```bash
2024-06-18 19:00:58,283 xinference.core.supervisor 6285 INFO     Xinference supervisor 0.0.0.0:45235 started
2024-06-18 19:00:59,844 xinference.core.worker 6285 INFO     Starting metrics export server at 0.0.0.0:None
2024-06-18 19:00:59,846 xinference.core.worker 6285 INFO     Checking metrics export server...
2024-06-18 19:01:00,950 xinference.core.worker 6285 INFO     Metrics server is started at: http://0.0.0.0:46841
2024-06-18 19:01:00,951 xinference.core.supervisor 6285 DEBUG    Enter add_worker, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, '0.0.0.0:45235'), kwargs: {}
2024-06-18 19:01:00,951 xinference.core.supervisor 6285 DEBUG    Worker 0.0.0.0:45235 has been added successfully
2024-06-18 19:01:00,951 xinference.core.supervisor 6285 DEBUG    Leave add_worker, elapsed time: 0 s
2024-06-18 19:01:00,951 xinference.core.worker 6285 INFO     Xinference worker 0.0.0.0:45235 started
2024-06-18 19:01:00,952 xinference.core.worker 6285 INFO     Purge cache directory: /home/gx01/.xinference/cache
2024-06-18 19:01:00,961 xinference.core.supervisor 6285 DEBUG    Worker 0.0.0.0:45235 resources: {'cpu': ResourceStatus(usage=0.0, total=32, memory_used=3421847552, memory_available=130487431168, memory_total=135059939328), 'gpu-0': GPUStatus(mem_total=51527024640, mem_free=51032358912, mem_used=494665728), 'gpu-1': GPUStatus(mem_total=51527024640, mem_free=51032358912, mem_used=494665728)}
2024-06-18 19:01:03,284 xinference.core.supervisor 6285 DEBUG    Enter get_status, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>,), kwargs: {}
2024-06-18 19:01:03,284 xinference.core.supervisor 6285 DEBUG    Leave get_status, elapsed time: 0 s
2024-06-18 19:01:04,223 xinference.api.restful_api 6217 INFO     Starting Xinference at endpoint: http://0.0.0.0:9997
2024-06-18 19:01:06,744 xinference.core.supervisor 6285 DEBUG    Enter list_models, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>,), kwargs: {}
2024-06-18 19:01:06,744 xinference.core.worker 6285 DEBUG    Enter list_models, args: (<xinference.core.worker.WorkerActor object at 0x7f150fc05e40>,), kwargs: {}
2024-06-18 19:01:06,744 xinference.core.worker 6285 DEBUG    Leave list_models, elapsed time: 0 s
2024-06-18 19:01:06,744 xinference.core.supervisor 6285 DEBUG    Leave list_models, elapsed time: 0 s
2024-06-18 19:01:08,018 xinference.core.supervisor 6285 DEBUG    Enter list_models, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>,), kwargs: {}
2024-06-18 19:01:08,019 xinference.core.worker 6285 DEBUG    Enter list_models, args: (<xinference.core.worker.WorkerActor object at 0x7f150fc05e40>,), kwargs: {}
2024-06-18 19:01:08,019 xinference.core.worker 6285 DEBUG    Leave list_models, elapsed time: 0 s
2024-06-18 19:01:08,019 xinference.core.supervisor 6285 DEBUG    Leave list_models, elapsed time: 0 s
2024-06-18 19:01:11,846 xinference.core.supervisor 6285 DEBUG    Enter list_model_registrations, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'LLM'), kwargs: {'detailed': True}
2024-06-18 19:01:11,915 xinference.core.supervisor 6285 DEBUG    Leave list_model_registrations, elapsed time: 0 s
2024-06-18 19:01:12,584 xinference.core.supervisor 6285 DEBUG    Enter list_model_registrations, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'LLM'), kwargs: {'detailed': False}
2024-06-18 19:01:12,585 xinference.core.supervisor 6285 DEBUG    Leave list_model_registrations, elapsed time: 0 s
2024-06-18 19:01:12,595 xinference.core.supervisor 6285 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'LLM', 'llama3:70b'), kwargs: {}
2024-06-18 19:01:12,595 xinference.core.supervisor 6285 DEBUG    Leave get_model_registration, elapsed time: 0 s
2024-06-18 19:01:12,596 xinference.core.supervisor 6285 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'LLM', 'qwen:110b'), kwargs: {}
2024-06-18 19:01:12,596 xinference.core.supervisor 6285 DEBUG    Leave get_model_registration, elapsed time: 0 s
2024-06-18 19:01:12,597 xinference.core.supervisor 6285 DEBUG    Enter get_model_registration, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'LLM', 'qwen:72b'), kwargs: {}
2024-06-18 19:01:12,597 xinference.core.supervisor 6285 DEBUG    Leave get_model_registration, elapsed time: 0 s
2024-06-18 19:01:13,904 xinference.core.supervisor 6285 DEBUG    Enter query_engines_by_model_name, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'qwen:72b'), kwargs: {}
2024-06-18 19:01:13,904 xinference.core.supervisor 6285 DEBUG    Leave query_engines_by_model_name, elapsed time: 0 s
2024-06-18 19:01:19,763 xinference.core.supervisor 6285 DEBUG    Enter launch_builtin_model, model_uid: qwen:72b, model_name: qwen:72b, model_size: 72, model_format: gptq, quantization: Int4, replica: 1
2024-06-18 19:01:19,764 xinference.core.worker 6285 DEBUG    Enter get_model_count, args: (<xinference.core.worker.WorkerActor object at 0x7f150fc05e40>,), kwargs: {}
2024-06-18 19:01:19,764 xinference.core.worker 6285 DEBUG    Leave get_model_count, elapsed time: 0 s
2024-06-18 19:01:19,764 xinference.core.worker 6285 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x7f150fc05e40>,), kwargs: {'model_uid': 'qwen:72b-1-0', 'model_name': 'qwen:72b', 'model_size_in_billions': 72, 'model_format': 'gptq', 'quantization': 'Int4', 'model_engine': 'vLLM', 'model_type': 'LLM', 'n_gpu': 2, 'request_limits': None, 'peft_model_config': None, 'gpu_idx': None}
2024-06-18 19:01:19,764 xinference.core.worker 6285 DEBUG    GPU selected: [0, 1] for model qwen:72b-1-0
2024-06-18 19:01:22,954 xinference.model.llm.core 6285 DEBUG    Launching qwen:72b-1-0 with VLLMChatModel
2024-06-18 19:01:22,954 xinference.model.llm.llm_family 6285 INFO     Caching from URI: /home/gx01/models/Qwen2-72B-Instruct-GPTQ-Int4
2024-06-18 19:01:22,954 xinference.model.llm.llm_family 6285 INFO     Cache /home/gx01/models/Qwen2-72B-Instruct-GPTQ-Int4 exists
2024-06-18 19:01:23,029 xinference.model.llm.vllm.core 6305 INFO     Loading qwen:72b with following model config: {'tokenizer_mode': 'auto', 'trust_remote_code': True, 'tensor_parallel_size': 2, 'block_size': 16, 'swap_space': 4, 'gpu_memory_utilization': 0.9, 'max_num_seqs': 256, 'quantization': None, 'max_model_len': 4096}Enable lora: False. Lora count: 0.
2024-06-18 19:01:42,850 xinference.core.worker 6285 ERROR    Failed to load model qwen:72b-1-0
Traceback (most recent call last):
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/worker.py", line 665, in launch_builtin_model
    await model_ref.load()
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
    raise ServerClosed(
xoscar.errors.ServerClosed: Remote server unixsocket:///823787520 closed
2024-06-18 19:01:42,933 xinference.core.supervisor 6285 DEBUG    Enter terminate_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'qwen:72b'), kwargs: {'suppress_exception': True}
2024-06-18 19:01:42,934 xinference.core.supervisor 6285 DEBUG    Leave terminate_model, elapsed time: 0 s
2024-06-18 19:01:42,936 xinference.api.restful_api 6217 ERROR    [address=0.0.0.0:45235, pid=6285] Remote server unixsocket:///823787520 closed
Traceback (most recent call last):
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/api/restful_api.py", line 770, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/supervisor.py", line 837, in launch_builtin_model
    await _launch_model()
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/supervisor.py", line 801, in _launch_model
    await _launch_one_model(rep_model_uid)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/supervisor.py", line 782, in _launch_one_model
    await worker_ref.launch_builtin_model(
  File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper
    async with lock:
  File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper
    result = await result
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/worker.py", line 665, in launch_builtin_model
    await model_ref.load()
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 226, in send
    result = await self._wait(future, actor_ref.address, send_message)  # type: ignore
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
    return await future
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
    raise ServerClosed(
xoscar.errors.ServerClosed: [address=0.0.0.0:45235, pid=6285] Remote server unixsocket:///823787520 closed
2024-06-18 19:19:51,805 xinference.core.supervisor 6285 DEBUG    Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f150fba02c0>, 'qwen:72b'), kwargs: {}
2024-06-18 19:19:51,807 xinference.api.restful_api 6217 ERROR    [address=0.0.0.0:45235, pid=6285] Model not found in the model list, uid: qwen:72b
Traceback (most recent call last):
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1400, in create_chat_completion
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/gx01/miniconda3/envs/inference0.12.1/lib/python3.10/site-packages/xinference/core/supervisor.py", line 934, in get_model
    raise ValueError(f"Model not found in the model list, uid: {model_uid}")
ValueError: [address=0.0.0.0:45235, pid=6285] Model not found in the model list, uid: qwen:72b
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

多卡运行模型启动报错，单卡运行正常 #1668

Describe the bug

To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

多卡运行模型启动报错，单卡运行正常 #1668

Description

Describe the bug

To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions