-
Notifications
You must be signed in to change notification settings - Fork 210
Closed
Description
Priority
Undecided
OS type
- OS: Ubuntu 22.04
- Kernel: 5.15.0
Hardware type
- HW: Gaudi2
- driver_ver: 1.16.2-f195ec4
Installation method
- Pull docker images from hub.docker.com
Deploy method
- Helm
Running nodes
Single Node
What's the version?
Description
vllm-gaudi:latest
container does not find devices, and is in crash loop.
But if I change latest
tag to 1.1
, it works fine, i.e. this is regression.
Reproduce steps
Run ChatQnA from GenAIInfra with vLLM:
$ helm install chatqna chatqna/ --skip-tests --values chatqna/gaudi-vllm-values.yaml ...
Raw log
$ kubectl logs chatqna-vllm-75dfb59d66-wp4vs
...
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py", line 132, in current_device
init()
File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py", line 71, in init
_hpu_C.init()
RuntimeError: synStatus=8 [Device not found] Device acquire failed.