Description
π The doc issue
Today, I deployed the Qwen3 embedding model (version 0.9.1) on a V100 GPU. The model starts up without errors, but when making requests, I encounter the following error:β
asyncio.exceptions.CancelledError INFO: 10.1.1.38:60813 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error (research) dev2@v100-02:~/zh$ /home/dev2/anaconda3/envs/research/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 7 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' /home/dev2/anaconda3/envs/research/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
I noticed that the latest Triton version requires NVIDIA GPUs with Compute Capability 8.0 or higher.
Suggest a potential alternative/fix
The documentation download page should include vLLM's hardware requirements.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.