You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using MIPROv2 to optimize a basic instruction of an Agent. The optimization process uses GPT-4o-mini as the teacher model and a quantized model (via vLLM) as the student.
The optimization improves the metric exact_match which compares the model’s predicted speaker role to the correct next speaker role.
Here are the results before and after optimization, with DSPy.Evaluate:
Before optimization: 0.190
After MIPROv2 optimization: 0.238
However, I noticed a significant performance drop when I extract the instructions and examples into a structured Markdown prompt and call the model directly without DSPy. The results for 20 test questions are:
Instruction (non-optimized): 47%
Instruction (optimized) + examples: 28.5%
System prompt generated via dspy.inspect_history(): 5% (does not follow the expected format)
Steps to reproduce
Signature I am using, I call it with dspy.Predict class RouterSignature(dspy.Signature): """Read the conversation and select the next role from roles_list to play. Only return the role.""" roles = dspy.InputField(desc="available roles") roles_list = dspy.InputField() conversation = dspy.InputField() selected_role : Literal["..."] = dspy.OutputField()
if response.status_code == 200:
result = response.json()
role_selection = result["choices"][0]["message"]["content"]
return role_selection.strip()
else:
print(f"Error: {response.status_code}")
return None `
compare perfomance
Expected Behavior:
I expected the optimized instruction (when manually prompted) to perform at least as well as the non-optimized instruction.
Observed Behavior:
Instead, performance drops significantly. This suggests that DSPy’s optimization process is doing something beyond just modifying the instruction text.
Questions:
Could there be additional factors in DSPy that contribute to its improved performance?
Is there a recommended way to extract and reuse DSPy-optimized prompts while maintaining their effectiveness outside DSPy?
DSPy version
2.6.10
The text was updated successfully, but these errors were encountered:
okhat
changed the title
[Bug] Performance Drop When Using Extracted DSPy Instructions in Markdown Prompt
Extracting DSPy-optimized prompts correctly
Apr 2, 2025
Hey @Francisca266 ! DSPy does a lot of heavy-lifting indeed, and it's very common that people try to extract the optimized prompt but end up hurting quality in the process. Is it easy to just use the DSPy program as-is after optimization?
We do not usually advise that you extract anything for this reason, since the optimized prompt assumes a lot of DSPy behavior like the way the inference calls are made.
That said, if you really want to get a prompt you can apply this process. Note that it gives you a list of messages.
What happened?
I have been using MIPROv2 to optimize a basic instruction of an Agent. The optimization process uses GPT-4o-mini as the teacher model and a quantized model (via vLLM) as the student.
The optimization improves the metric exact_match which compares the model’s predicted speaker role to the correct next speaker role.
Here are the results before and after optimization, with DSPy.Evaluate:
Before optimization: 0.190
After MIPROv2 optimization: 0.238
However, I noticed a significant performance drop when I extract the instructions and examples into a structured Markdown prompt and call the model directly without DSPy. The results for 20 test questions are:
Instruction (non-optimized): 47%
Instruction (optimized) + examples: 28.5%
System prompt generated via dspy.inspect_history(): 5% (does not follow the expected format)
Steps to reproduce
Signature I am using, I call it with dspy.Predict
class RouterSignature(dspy.Signature): """Read the conversation and select the next role from roles_list to play. Only return the role.""" roles = dspy.InputField(desc="available roles") roles_list = dspy.InputField() conversation = dspy.InputField() selected_role : Literal["..."] = dspy.OutputField()
MIPROv2 params:
optimization_model_kwargs = dict(prompt_model=openai_lm, task_model=vllm, teacher_settings=dict(lm=openai_lm)) optimizer = MIPROv2Optimizer(metric=exact_match_router, max_bootstrapped_demos=2, max_labeled_demos=5, optimization_model_kwargs=optimization_model_kwargs)
Extract the optimized instruction and examples into a Markdown prompt.
Call the model manually using:
` import requests
import json
def query_vllm(roles, available_roles, conversation):
api_url = "http://localhost:port/v1/chat/completions"
payload = {
"model": "kaitchup/Llama-3.2-3B-Instruct-gptqmodel-4bit",
"messages": [
{"role": "system", "content": OPTIMIZED_SYS_PROMPT},
{"role": "user", "content": user_prompt}
],
"temperature": 0.0,
"max_tokens": 500
}
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer fake-key" # VLLM doesn't validate the key
}
response = requests.post(api_url, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
result = response.json()
role_selection = result["choices"][0]["message"]["content"]
return role_selection.strip()
else:
print(f"Error: {response.status_code}")
return None `
Expected Behavior:
I expected the optimized instruction (when manually prompted) to perform at least as well as the non-optimized instruction.
Observed Behavior:
Instead, performance drops significantly. This suggests that DSPy’s optimization process is doing something beyond just modifying the instruction text.
Questions:
DSPy version
2.6.10
The text was updated successfully, but these errors were encountered: