Skip to content

how to set tensor_parallel_size for vllm backend #8055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jasonsey opened this issue Apr 11, 2025 · 6 comments
Closed

how to set tensor_parallel_size for vllm backend #8055

Jasonsey opened this issue Apr 11, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@Jasonsey
Copy link

What happened?

my code is here,

import dspy


lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)


qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)

my question is how to set tensor_parallel_size for vllm backend? this code isnot working for this param

Steps to reproduce

import dspy


lm = dspy.LM("vllm//home/stone/max/base_model/hf_model/Qwen/Qwen2.5-VL-72B-Instruct")
dspy.configure(lm=lm)


qa = dspy.Predict("question: str -> answer: str", tensor_parallel_size=8)
res = qa(question="who are you?")
print(res)

DSPy version

2.6.17

@Jasonsey Jasonsey added the bug Something isn't working label Apr 11, 2025
@arnavsinghvi11
Copy link
Collaborator

Hi @Jasonsey , I believe you need to add the hosted_vllm prefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!

@Jasonsey
Copy link
Author

Hi @Jasonsey , I believe you need to add the hosted_vllm prefix to your model name or pass in vllm as a provider arg. Feel free to reference the LiteLLM vLLM guide!

you are right, but I can't find where to fill tensor_parallel_size in this doc. do you have any idea?

@arnavsinghvi11
Copy link
Collaborator

if this is a LM arg, you can set it in dspy.LM() or you can configure within in the model you launch with vLLM before querying.

@okhat
Copy link
Collaborator

okhat commented Apr 15, 2025

@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?

@Jasonsey
Copy link
Author

@Jasonsey Is that a flag to pass when launching vLLM? Or sent from the client in each request?

Here is how we can use vllm for tensor_parallel_size param:

from vllm import LLM
llm = LLM("facebook/opt-13b", tensor_parallel_size=4)
output = llm.generate("San Francisco is a")

@okhat
Copy link
Collaborator

okhat commented Apr 16, 2025

Yes. Please pass this when launching the vLLM server. Not related to DSPy.

@okhat okhat closed this as completed Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants