[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

andrewfr · 2025-03-22T17:25:19Z

What happened?

I am new to DSPy. I have written a small DSPy program. The program calls evaluate 5 times and ends up with the result 5. I don't understand why a tool would be called multiple times and its result re-interpreted.

$ python llama_tool.py
2 + 2
2 + 3
2 + 3
2 + 3
2 + 3
5.0

Cheers,
Andrew

Steps to reproduce

from typing import Type

def evaluate_math(expression: str) -> float:
    print(expression)
    return eval(expression)

lm = dspy.LM('ollama_chat/llama3.2:3b', api_base='http://localhost:11434')
dspy.configure(lm=lm)

math_tool = dspy.Tool(
    name="evaluate_math",
    desc="Evaluates a mathematical expression.",
    func=evaluate_math,
    args={
        "expression": {
            "type": "string",
            "description": "Mathematical expression to evaluate"
        }
    }
)

react_module = dspy.ReAct(
    "question -> answer: float",
    tools=[math_tool],
    #max_iters=1
)

# Execute the module with a question that requires the tool
response = react_module(question="What is 2 + 2?")
print(response.answer)  # Expected output: 4.0

DSPy version

2.6.13

The text was updated successfully, but these errors were encountered:

chenmoneygithub · 2025-03-28T02:31:21Z

@andrewfr Thanks for reporting the issue! You can quickly check if this is a DSPy issue, prompt issue or LM issue by getting the history:

dspy.inspect_history(n=5)

By running the command above, you should see the prompt and response from LM.

andrewfr · 2025-03-28T16:03:27Z

Thanks for the advice. I think the immediate source of the problem is the inclusion of the question mark "?" . The example works when it is omitted. I would consider this a bug. Perhaps the "?" is being interpreted?

Here is the result of inspect

2 + 2
2 + 3
2 + 3
2 + 3
2 + 3
5.0




�[34m[2025-03-28T12:00:56.357706]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.364382]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.372805]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.383920]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `next_thought` (str)
2. `next_tool_name` (Literal['evaluate_math', 'finish'])
3. `next_tool_args` (dict[str, Any])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## next_thought ## ]]
{next_thought}

[[ ## next_tool_name ## ]]
{next_tool_name}        # note: the value you produce must exactly match (no extra characters) one of: evaluate_math; finish

[[ ## next_tool_args ## ]]
{next_tool_args}        # note: the value you produce must adhere to the JSON schema: {"type": "object"}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.
        
        You will be given `question` and your goal is to finish with `answer`.
        
        To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.
        
        Thought can reason about the current situation, and Tool Name can be the following types:
        
        (1) evaluate_math, whose description is <desc>Evaluates a mathematical expression.</desc>. It takes arguments {'expression': {'type': 'string', 'description': 'Mathematical expression to evaluate'}} in JSON format.
        (2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

[[ ## thought_3 ## ]]
What is 4?

[[ ## tool_name_3 ## ]]
evaluate_math

[[ ## tool_args_3 ## ]]
{"expression": "2 + 3"}

[[ ## observation_3 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## next_thought ## ]]`, then `[[ ## next_tool_name ## ]]` (must be formatted as a valid Python Literal['evaluate_math', 'finish']), then `[[ ## next_tool_args ## ]]` (must be formatted as a valid Python dict[str, Any]), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## next_thought ## ]]
What is 4?

[[ ## next_tool_name ## ]]
evaluate_math

[[ ## next_tool_args ## ]]
{"expression": "2 + 3"}

[[ ## completed ## ]]�[0m





�[34m[2025-03-28T12:00:56.396033]�[0m

�[31mSystem message:�[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (float)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}        # note: the value you produce must be a single float value

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


�[31mUser message:�[0m

[[ ## question ## ]]
What is 2 + 2?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
What is 2 + 2?

[[ ## tool_name_0 ## ]]
evaluate_math

[[ ## tool_args_0 ## ]]
{"expression": "2 + 2"}

[[ ## observation_0 ## ]]
4

[[ ## thought_1 ## ]]
What is 4?

[[ ## tool_name_1 ## ]]
evaluate_math

[[ ## tool_args_1 ## ]]
{"expression": "2 + 3"}

[[ ## observation_1 ## ]]
5

[[ ## thought_2 ## ]]
What is 4?

[[ ## tool_name_2 ## ]]
evaluate_math

[[ ## tool_args_2 ## ]]
{"expression": "2 + 3"}

[[ ## observation_2 ## ]]
5

[[ ## thought_3 ## ]]
What is 4?

[[ ## tool_name_3 ## ]]
evaluate_math

[[ ## tool_args_3 ## ]]
{"expression": "2 + 3"}

[[ ## observation_3 ## ]]
5

[[ ## thought_4 ## ]]
What is 4?

[[ ## tool_name_4 ## ]]
evaluate_math

[[ ## tool_args_4 ## ]]
{"expression": "2 + 3"}

[[ ## observation_4 ## ]]
5

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]` (must be formatted as a valid Python float), and then ending with the marker for `[[ ## completed ## ]]`.


�[31mResponse:�[0m

�[32m[[ ## reasoning ## ]]
This is an example of the transitive property of equality, where if we know that 2 + 3 = 5, we can conclude that 4 = 5.

[[ ## answer ## ]]
5.0

[[ ## completed ## ]]�[0m

hxy9243 · 2025-04-01T04:21:45Z

Looks like a common problem with base model? Maybe try again with instruct model?

andrewfr · 2025-04-04T16:03:02Z

Looks like a common problem with base model? Maybe try again with instruct model?

I tried the query in the ollama REPL and llama 3.2:3b. It worked. When I have time, I can try: different models; tool calling with different prompt techniques (ReACT, COT). If that doesn't work, I'll dive into the code.

Cheers,
Andrew

andrewfr added the bug Something isn't working label Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

andrewfr commented Mar 22, 2025 •

edited

Loading

chenmoneygithub commented Mar 28, 2025

andrewfr commented Mar 28, 2025 •

edited

Loading

hxy9243 commented Apr 1, 2025

andrewfr commented Apr 4, 2025

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

[Bug] New to DSPy. Using LLama 3.2 with ReAct 2 + 2 = 5 #7997

Comments

andrewfr commented Mar 22, 2025 • edited Loading

What happened?

Steps to reproduce

DSPy version

chenmoneygithub commented Mar 28, 2025

andrewfr commented Mar 28, 2025 • edited Loading

hxy9243 commented Apr 1, 2025

andrewfr commented Apr 4, 2025

andrewfr commented Mar 22, 2025 •

edited

Loading

andrewfr commented Mar 28, 2025 •

edited

Loading