Skip to content

[Bug] when return_outputs is True and return_all_scores is True, COPRO compile will crash #8027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eightHundreds opened this issue Mar 29, 2025 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@eightHundreds
Copy link

What happened?

teleprompter = COPRO(
    metric=accuracy,
    breadth=3, 
    depth=5,    
    verbose=False,
    track_stats=False,
    prompt_model=promptLM
)

kwargs = dict(num_threads=8, 
              display_progress=True, 
              display_table=False,
              return_outputs=True,
              return_all_scores=True
        )

(compiled_analyzer) = teleprompter.compile(
    my_model, 
    trainset=trainset_examples, 
    eval_kwargs=kwargs,
)

error stack如下, It should be these two configuration items that cause candidate["score"] to not be a simple number, making it impossible to sort.

--> 134 (compiled_analyzer) = teleprompter.compile(
    135     quality_inspection_analyzer, 
    136     trainset=trainset_examples, 
    137     eval_kwargs=kwargs,
    138 )

File ~/Code/ai/dspy/.venv/lib/python3.13/site-packages/dspy/teleprompt/copro_optimizer.py:261, in COPRO.compile(self, student, trainset, eval_kwargs)
    257     results_latest[id(p_old)]["std"].append(np.std(latest_scores))
    259 # Now that we've evaluated the candidates, set this predictor to the best performing version
    260 # to ensure the next round of scores reflect the best possible version
--> 261 best_candidate = max(evaluated_candidates[id(p_old)].values(), key=lambda candidate: candidate["score"])
    262 *_, last_key = self._get_signature(p_old).fields.keys()
    263 updated_signature = (
    264     self._get_signature(p_new)
    265     .with_instructions(best_candidate["instruction"])
    266     .with_updated_fields(last_key, prefix=best_candidate["prefix"])
    267 )

TypeError: '>' not supported between instances of 'Prediction' and 'Prediction'

Steps to reproduce

As long as you pass eval_kwargs containing return_outputs=True and return_all_scores=True to the compile method of COPRO's invocation

DSPy version

2.6.13

@eightHundreds eightHundreds added the bug Something isn't working label Mar 29, 2025
@chenmoneygithub
Copy link
Collaborator

@TomeHirata Tomu mind taking a look?

@TomeHirata
Copy link
Collaborator

Hi, @eightHundreds. Thank you for the report. Can I know what your motivation was for passing return_outputs=True and return_all_scores=True as eval_kwargs? The evaluation result is treated as a single float within COPRO, but do you want to get the entire evaluation result for all candidate programs?

@eightHundreds
Copy link
Author

eightHundreds commented Apr 1, 2025

Hi, @eightHundreds. Thank you for the report. Can I know what your motivation was for passing return_outputs=True and return_all_scores=True as eval_kwargs? The evaluation result is treated as a single float within COPRO, but do you want to get the entire evaluation result for all candidate programs?

I want to know the evaluation status of each case so that I can find cases that may have problems

For example, I will focus on the inconsistent situations with the examples to determine whether it is my use case problem or the model problem

@TomeHirata
Copy link
Collaborator

TomeHirata commented Apr 9, 2025

Got it, then let me change the issue label as this is not an intended behavior of COPRO. For the use case, we are changing the behavior of dspy.Evaluate soon (#8003), and the output will become available regardless of arguments.

@TomeHirata TomeHirata added enhancement New feature or request and removed bug Something isn't working labels Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants