Skip to content

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrewfr opened this issue Mar 22, 2025 · 4 comments
Open

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

andrewfr opened this issue Mar 22, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@andrewfr
Copy link

andrewfr commented Mar 22, 2025

What happened?

I am new to DSPy. I have written a small DSPy program. The program calls evaluate 5 times and ends up with the result 5. I don't understand why a tool would be called multiple times and its result re-interpreted.

$ python llama_tool.py
2 + 2
2 + 3
2 + 3
2 + 3
2 + 3
5.0

Cheers,
Andrew

Steps to reproduce

from typing import Type

def evaluate_math(expression: str) -> float:
    print(expression)
    return eval(expression)

lm = dspy.LM('ollama_chat/llama3.2:3b', api_base='http://localhost:11434')
dspy.configure(lm=lm)

math_tool = dspy.Tool(
    name="evaluate_math",
    desc="Evaluates a mathematical expression.",
    func=evaluate_math,
    args={
        "expression": {
            "type": "string",
            "description": "Mathematical expression to evaluate"
        }
    }
)

react_module = dspy.ReAct(
    "question -> answer: float",
    tools=[math_tool],
    #max_iters=1
)

# Execute the module with a question that requires the tool
response = react_module(question="What is 2 + 2?")
print(response.answer)  # Expected output: 4.0

DSPy version

2.6.13

@andrewfr andrewfr added the bug Something isn't working label Mar 22, 2025
@chenmoneygithub
Copy link
Collaborator

@andrewfr Thanks for reporting the issue! You can quickly check if this is a DSPy issue, prompt issue or LM issue by getting the history:

dspy.inspect_history(n=5)

By running the command above, you should see the prompt and response from LM.

@andrewfr
Copy link
Author

andrewfr commented Mar 28, 2025

Thanks for the advice. I think the immediate source of the problem is the inclusion of the question mark "?" . The example works when it is omitted. I would consider this a bug. Perhaps the "?" is being interpreted?

Here is the result of inspect

2 + 2
2 + 3
2 + 3
2 + 3
2 + 3
5.0




�[34m[2025-03-28T12:00:56.357706]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.364382]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.372805]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.383920]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

[[ ## thought_3 ## ]]
What is 4?

[[ ## tool_name_3 ## ]]
evaluate_math

[[ ## tool_args_3 ## ]]
{"expression": "2 + 3"}

[[ ## observation_3 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.396033]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (float)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}        # note: the value you produce must be a single float value

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

[[ ## thought_3 ## ]]
What is 4?

[[ ## tool_name_3 ## ]]
evaluate_math

[[ ## tool_args_3 ## ]]
{"expression": "2 + 3"}

[[ ## observation_3 ## ]]
5

[[ ## thought_4 ## ]]
What is 4?

[[ ## tool_name_4 ## ]]
evaluate_math

[[ ## tool_args_4 ## ]]
{"expression": "2 + 3"}

[[ ## observation_4 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]` (must be formatted as a valid Python float), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## reasoning ## ]]
This is an example of the transitive property of equality, where if we know that 2 + 3 = 5, we can conclude that 4 = 5.

[[ ## answer ## ]]
5.0

[[ ## completed ## ]]�[0m

@hxy9243
Copy link

hxy9243 commented Apr 1, 2025

Looks like a common problem with base model? Maybe try again with instruct model?

@andrewfr
Copy link
Author

andrewfr commented Apr 4, 2025

Looks like a common problem with base model? Maybe try again with instruct model?

I tried the query in the ollama REPL and llama 3.2:3b. It worked. When I have time, I can try: different models; tool calling with different prompt techniques (ReACT, COT). If that doesn't work, I'll dive into the code.

Cheers,
Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants