Extracting DSPy-optimized prompts correctly

### What happened?

I have been using MIPROv2 to optimize a basic instruction of an Agent. The optimization process uses GPT-4o-mini as the teacher model and a quantized model (via vLLM) as the student.

The optimization improves the metric exact_match which compares the model’s predicted speaker role to the correct next speaker role.

Here are the results before and after optimization, with DSPy.Evaluate:

Before optimization: 0.190
After MIPROv2 optimization: 0.238


However, I noticed a significant performance drop when I extract the instructions and examples into a structured Markdown prompt and call the model directly without DSPy. The results for 20 test questions are:

Instruction (non-optimized): 47%

Instruction (optimized) + examples: 28.5%

System prompt generated via dspy.inspect_history(): 5% (does not follow the expected format)

### Steps to reproduce

1.
Signature I am using, I call it with dspy.Predict
`class RouterSignature(dspy.Signature):
    """Read the conversation and select the next role from roles_list to play. Only return the role."""
    roles = dspy.InputField(desc="available roles")
    roles_list = dspy.InputField()
    conversation = dspy.InputField()
    selected_role : Literal["..."] =  dspy.OutputField()`

MIPROv2 params:
`optimization_model_kwargs = dict(prompt_model=openai_lm, task_model=vllm, teacher_settings=dict(lm=openai_lm))
optimizer = MIPROv2Optimizer(metric=exact_match_router, max_bootstrapped_demos=2, max_labeled_demos=5, optimization_model_kwargs=optimization_model_kwargs)`


2. Extract the optimized instruction and examples into a Markdown prompt.

3. Call the model manually using:
` import requests
import json

def query_vllm(roles, available_roles, conversation):
  api_url = "http://localhost:port/v1/chat/completions"

  payload = {
      "model": "kaitchup/Llama-3.2-3B-Instruct-gptqmodel-4bit",
      "messages": [
          {"role": "system", "content": OPTIMIZED_SYS_PROMPT},
          {"role": "user", "content": user_prompt}
      ],
      "temperature": 0.0,
      "max_tokens": 500
  }
  
  headers = {
      "Content-Type": "application/json",
      "Authorization": "Bearer fake-key"  # VLLM doesn't validate the key
  }

  response = requests.post(api_url, headers=headers, data=json.dumps(payload))
  
  if response.status_code == 200:
    result = response.json()
    role_selection = result["choices"][0]["message"]["content"]
    return role_selection.strip()
  else:
    print(f"Error: {response.status_code}")
    return None `


4. compare perfomance 

## Expected Behavior:
I expected the optimized instruction (when manually prompted) to perform at least as well as the non-optimized instruction.

## Observed Behavior:
Instead, performance drops significantly. This suggests that DSPy’s optimization process is doing something beyond just modifying the instruction text.


# Questions:
- Could there be additional factors in DSPy that contribute to its improved performance?
- Is there a recommended way to extract and reuse DSPy-optimized prompts while maintaining their effectiveness outside DSPy?

### DSPy version

2.6.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extracting DSPy-optimized prompts correctly #8042

What happened?

Steps to reproduce

Expected Behavior:

Observed Behavior:

Questions:

DSPy version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extracting DSPy-optimized prompts correctly #8042

Description

What happened?

Steps to reproduce

Expected Behavior:

Observed Behavior:

Questions:

DSPy version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions