Skip to content

Extracting DSPy-optimized prompts correctly #8042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Francisca266 opened this issue Apr 2, 2025 · 1 comment
Closed

Extracting DSPy-optimized prompts correctly #8042

Francisca266 opened this issue Apr 2, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Francisca266
Copy link

Francisca266 commented Apr 2, 2025

What happened?

I have been using MIPROv2 to optimize a basic instruction of an Agent. The optimization process uses GPT-4o-mini as the teacher model and a quantized model (via vLLM) as the student.

The optimization improves the metric exact_match which compares the model’s predicted speaker role to the correct next speaker role.

Here are the results before and after optimization, with DSPy.Evaluate:

Before optimization: 0.190
After MIPROv2 optimization: 0.238

However, I noticed a significant performance drop when I extract the instructions and examples into a structured Markdown prompt and call the model directly without DSPy. The results for 20 test questions are:

Instruction (non-optimized): 47%

Instruction (optimized) + examples: 28.5%

System prompt generated via dspy.inspect_history(): 5% (does not follow the expected format)

Steps to reproduce

Signature I am using, I call it with dspy.Predict
class RouterSignature(dspy.Signature): """Read the conversation and select the next role from roles_list to play. Only return the role.""" roles = dspy.InputField(desc="available roles") roles_list = dspy.InputField() conversation = dspy.InputField() selected_role : Literal["..."] = dspy.OutputField()

MIPROv2 params:
optimization_model_kwargs = dict(prompt_model=openai_lm, task_model=vllm, teacher_settings=dict(lm=openai_lm)) optimizer = MIPROv2Optimizer(metric=exact_match_router, max_bootstrapped_demos=2, max_labeled_demos=5, optimization_model_kwargs=optimization_model_kwargs)

  1. Extract the optimized instruction and examples into a Markdown prompt.

  2. Call the model manually using:
    ` import requests
    import json

def query_vllm(roles, available_roles, conversation):
api_url = "http://localhost:port/v1/chat/completions"

payload = {
"model": "kaitchup/Llama-3.2-3B-Instruct-gptqmodel-4bit",
"messages": [
{"role": "system", "content": OPTIMIZED_SYS_PROMPT},
{"role": "user", "content": user_prompt}
],
"temperature": 0.0,
"max_tokens": 500
}

headers = {
"Content-Type": "application/json",
"Authorization": "Bearer fake-key" # VLLM doesn't validate the key
}

response = requests.post(api_url, headers=headers, data=json.dumps(payload))

if response.status_code == 200:
result = response.json()
role_selection = result["choices"][0]["message"]["content"]
return role_selection.strip()
else:
print(f"Error: {response.status_code}")
return None `

  1. compare perfomance

Expected Behavior:

I expected the optimized instruction (when manually prompted) to perform at least as well as the non-optimized instruction.

Observed Behavior:

Instead, performance drops significantly. This suggests that DSPy’s optimization process is doing something beyond just modifying the instruction text.

Questions:

  • Could there be additional factors in DSPy that contribute to its improved performance?
  • Is there a recommended way to extract and reuse DSPy-optimized prompts while maintaining their effectiveness outside DSPy?

DSPy version

2.6.10

@Francisca266 Francisca266 added the bug Something isn't working label Apr 2, 2025
@okhat okhat changed the title [Bug] Performance Drop When Using Extracted DSPy Instructions in Markdown Prompt Extracting DSPy-optimized prompts correctly Apr 2, 2025
@okhat
Copy link
Collaborator

okhat commented Apr 2, 2025

Hey @Francisca266 ! DSPy does a lot of heavy-lifting indeed, and it's very common that people try to extract the optimized prompt but end up hurting quality in the process. Is it easy to just use the DSPy program as-is after optimization?

We do not usually advise that you extract anything for this reason, since the optimized prompt assumes a lot of DSPy behavior like the way the inference calls are made.

That said, if you really want to get a prompt you can apply this process. Note that it gives you a list of messages.

{
 name: adapter.format(
   p.signature,
   demos=p.demos,
   inputs={k: f"{{{k}}}" for k in p.signature.input_fields},
 )
 for name, p in program.named_predictors()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants