Tree of Thoughts (ToT)

Author: Sunworl Kim
Peer Review:
Proofread : fastjw
This is a part of LangChain Open Tutorial

Overview

Tree of Thoughts (ToT) is a systematic problem-solving framework that enhances language models' reasoning capabilities by exploring and maintaining multiple paths of thought simultaneously.

The core concept of ToT breaks down problem-solving processes into multiple stages, where various "thoughts" are generated and evaluated at each stage, similar to how a chess player thinks several moves ahead.

Each thought represents an independent solution path that the system explores and evaluates in parallel.

In this tutorial, we run a "Make 15" game that combines four numbers to reach 15 using an LLM agent search algorithm with Tree of Thoughts (ToT).

Main steps

Expand: Creative generation phase using LLM to explore all possible equation combinations
Score: Validation phase that evaluates how accurately generated equations reach the target value of 15
Prune: Optimization phase that selects only the best candidates to determine the starting point for the next exploration

Key Components

Reverse-Polish Notation(RPN) Calculator for equation evaluation
LangChain-based solver
State-based graph management

[Note] This code is substantially derived from the LangGraph Tree of Thoughts (ToT) tutorial.

References

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Environment Setup

Setting up your environment is the first step. See the Environment Setup guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials.
Check out the langchain-opentutorial for more details.

%%capture --no-stderr
%pip install -U langchain-opentutorial

# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_core",
        "langgraph"
    ],
    verbose=False,
    upgrade=False,
)

You can set API keys in a .env file or set them manually.

[Note] If you’re not using the .env file, no worries! Just enter the keys directly in the cell below, and you’re good to go.

from dotenv import load_dotenv
from langchain_opentutorial import set_env

# Attempt to load environment variables from a .env file; if unsuccessful, set them manually.
if not load_dotenv():
    set_env(
        {
            "OPENAI_API_KEY": "",
            "LANGCHAIN_API_KEY": "",
            "LANGCHAIN_TRACING_V2": "true",
            "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
            "LANGCHAIN_PROJECT": "Tree of Thoughts (ToT)",
        }
    )

Environment variables have been set successfully.

Mathematical Equation Exploration and Evaluation Framework

Define a class that uilizes Pydantic to calculate RPN formulas.

Pydantic is a library that simplifies data validation.

import operator
from typing import List, Union, Literal, NamedTuple, Optional
from pydantic import BaseModel, Field, field_validator

# Define the available operators and token type
OperatorType = Literal["+", "-", "*", "/"]  
TokenType = Union[float, OperatorType]


class Equation(BaseModel):
    tokens: List[TokenType] = Field(
        description="The stack of tokens and operators in reverse-polish notation(RPN). Example: [2, 3, '+', 5, '*'] would evaluate to (2 + 3) * 5"
    )

    @field_validator('tokens')
    def clean_tokens(cls, v):    
        return [float(t) if isinstance(t, (int, float)) else t.strip() for t in v]

    # Validate operators 
    def validate_tokens(cls, v): 
        cleaned_tokens = []
        for token in v:
            if isinstance(token, str):
                token = token.strip()
                if token not in {'+', '-', '*', '/'}:
                    raise ValueError(f"Invalid operator: {token}")
            cleaned_tokens.append(token)
        return cleaned_tokens
    
    def compute(self) -> float:
        """Calculate RPN formula"""        
        op_funcs = {
            "+": operator.add,
            "-": operator.sub,
            "*": operator.mul,
            "/": operator.truediv,
        }
        stack = []
        for token in self.tokens:
            if isinstance(token, float):
                stack.append(token)
            else:
                b, a = stack.pop(), stack.pop()
                stack.append(op_funcs[token](a, b))

        return stack[0]

Define a class to organize and store the process of solving mathematical problems, including the generated equations.

class GuessEquations(BaseModel):
  
    reasoning: str = Field(
        description="Detailed explanation of how the formula was created and the process"
    )
    equations: List[Equation] = Field(  
        description="List of generated equations"
    )

This signifies a candidate equation that requires evaluation.

class Candidate(NamedTuple):
    candidate: Equation              # Equation to be evaluate
    score: Optional[float] = None    # Evaluation score of equation
    feedback: Optional[str] = None   # Evaluation opinion

    # Representation of Output
    def __str__(self):
        try:
            computed = self.candidate.compute()
            return f"Equation({self.candidate.tokens}) = {computed:.2f} (Score: {self.score:.2f if self.score is not None else 'N/A'})" 
        except Exception as e:
            return f"Invalid equation: {self.candidate.tokens}; Error: {str(e)}"

class ScoredCandidate(Candidate):
    candidate: Equation
    score: float
    feedback: str

This Defines the fucdamental structures that are responsible for managing and configuring the state of ToT.

from typing_extensions import Annotated, TypedDict
from langchain_core.runnables import RunnableConfig

def update_candidates(
    existing: Optional[list] = None,
    updates: Optional[Union[list, Literal["clear"]]] = None,
) -> List[str]:
    if existing is None:
        existing = []
    if updates is None:
        return existing
    if updates == "clear":
        return []
    # Concatenate the lists
    return existing + updates


class ToTState(TypedDict):
    
    problem: str
    candidates: Annotated[List[Candidate], update_candidates]
    scored_candidates: Annotated[List[ScoredCandidate], update_candidates]
    depth: Annotated[int, operator.add]

class Configuration(TypedDict, total=False):
    max_depth: int
    threshold: float
    k: int
    beam_size: int

def _ensure_configurable(config: RunnableConfig) -> Configuration:
    configurable = config.get("configurable", {})
    return {
        **configurable,
        "max_depth": configurable.get("max_depth", 10),
        "threshold": config.get("threshold", 1.0),      # success threshold
        "k": configurable.get("k", 5),                  # Number of candidates to generate
        "beam_size": configurable.get("beam_size", 3),  # Number of candidates to keep
    }

class ExpansionState(ToTState):
    seed: Optional[Candidate]

LLM Configuration for Game Solution Generation

Construct a solution generation pipeline to automate the "Make 15" game using LangChain and OpenAI.

ChatPromptTemplate : Generate system prompts that define the rules and objectives of the game.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# Define the Prompt Template
prompt = ChatPromptTemplate.from_messages(
    [
        ( # System prompt: Define the rules of the "Make 15" game
            "system",
            """You are playing Make 15. Create equations that evaluate to exactly 15 using the provided numbers.
            Rules:
            1. Use each number exactly once
            2. You can use +, -, *, / operators
            3. The result must be exactly 15
            4. Submit {k} different valid guesses        
            Evaluate your guesses carefully before submitting."""
        ),

        ( # User prompt: Includes the number to use and a history of previous attempts
            "user", 
            "Numbers to use: {problem}\nPrevious attempt (if any): {candidate}"
        ),
    ],
).partial(candidate="")
llm = ChatOpenAI(model="gpt-4o-mini")

bound_llm = llm.with_structured_output(GuessEquations)   # Output setting
solver = prompt | bound_llm   # Create pipeline

Game Solution Explorer

Implement a sophisticated search algorithm to solve the "Make 15" game.

A function that calculates the score of a candidate equation, evaluating how far it has reached the target value of 15.

from langchain_core.runnables import RunnableConfig
from langgraph.constants import Send
from typing import Dict, Any


def compute_score(problem: str, candidate: Candidate) -> ScoredCandidate:
    numbers = list(map(float, problem.split()))
    used_numbers = [
        token for token in candidate.candidate.tokens if isinstance(token, (int, float))
    ]
    # Check if every number is used exactly once
    if sorted(used_numbers) != sorted(numbers):
        return ScoredCandidate(
            candidate=candidate.candidate,
            score=0,
            feedback="The equation must use all 4 numbers exactly once."
        )
    # Score between 0 and 1 based on difference from 15    
    try:
        result = candidate.candidate.compute()
        score = max(0.0, 1.0 - abs(result - 15) / 15)
        feedback = f"Result: {result}"
    except Exception as e:
        return ScoredCandidate(
            candidate=candidate.candidate,
            score=0, 
            feedback=f"Invalid equation. Error: {repr(e)}"
        )
           
    return ScoredCandidate(
        candidate=candidate.candidate,
        score=score,
        feedback=feedback
    )

Core Algorithm Functions.

Calculate the scores for all generated candidate equations and select the candidate equations with the highest scores.

# Create new candidates using LLM
def expand(state: ExpansionState, *, config: RunnableConfig) -> Dict[str, List[str]]:
   configurable = _ensure_configurable(config)

   # Ensure problem is a string
   problem = state["problem"]
   if isinstance(problem, list):
       problem = problem[0]

   try:
        equation_submission = solver.invoke(
            {
                "problem": problem,
                "candidate": str(state.get("seed", "")),
                "k": configurable["k"],
            },
            config=config,
        )
        return {
            "candidates": [
                Candidate(candidate=eq) 
                for eq in equation_submission.equations
                if all(t in {'+', '-', '*', '/'} or isinstance(t, (int, float)) 
                    for t in eq.tokens)
            ]
        }
   except Exception as e:
        return {"candidates": []}


# Calculate the score for each candidate equation
def score(state: ToTState) -> Dict[str, List[ScoredCandidate]]:
    candidates = state.get("candidates", [])
    if not isinstance(candidates, list):
        return {"scored_candidates": [], "candidates": "clear"}        
    scored = [
        compute_score(state["problem"], candidate) 
        for candidate in candidates 
        if hasattr(candidate, 'candidate')
    ]
    return {"scored_candidates": scored, "candidates": "clear"}


# Choose the candidates with the best results
# Keep only top candidates as big as beam_size
def prune(state: ToTState, *, config: RunnableConfig) -> Dict[str, List[Dict[str, Any]]]:
    scored_candidates = state["scored_candidates"]
    beam_size = _ensure_configurable(config)["beam_size"]
    organized = sorted(
        scored_candidates, 
        key=lambda x: x.score if isinstance(x, ScoredCandidate) else -float('inf'),
        reverse=True
    )
    pruned = organized[:beam_size]
    return {
        "candidates": pruned, # Update the starting point for the next iteration
        "scored_candidates": "clear", # Clear the old memory
        "depth": 1, # Increment the depth by 1
    }

Check if the correct answer has been found or if maximum depth has been reached to determine the next step.

def should_terminate(state: ToTState, config: RunnableConfig):
    configurable = _ensure_configurable(config)
    if not state["candidates"]:
        return "__end__"
    
    best_candidate = state["candidates"][0]
    solved = isinstance(best_candidate, ScoredCandidate) and abs(best_candidate.score - 1.0) < 1e-6
    
    if solved or state["depth"] >= configurable["max_depth"]:
        return "__end__"
    
    return [Send("expand", {**state, "seed": candidate}) for candidate in state["candidates"]]

Create Graph

Construct a solution exploration graph for the "Make 15" game using LangGraph.

from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

# Create the graph
def create_graph(config: Configuration = None) -> StateGraph:
    if config is None:
        config = {
            "max_depth": 10,
            "threshold": 1.0,
            "k": 5,
            "beam_size": 3
        }

    builder = StateGraph(state_schema=ToTState, config_schema=Configuration)

    # Add nodes : expand, score, prune
    builder.add_node(expand)
    builder.add_node(score)
    builder.add_node(prune)

    # Add edges : Define the execution order between nodes
    builder.add_edge("expand", "score")
    builder.add_edge("score", "prune")
    builder.add_conditional_edges("prune", should_terminate, path_map=["expand", "__end__"])

    # Set entry point
    builder.add_edge("__start__", "expand")

    return builder.compile(checkpointer=MemorySaver())

Visualize the compiled graph structure.

from langchain_opentutorial.graphs import visualize_graph

visualize_graph(create_graph())

Run Graph

Generate the number puzzles that fit a range using the Number Combination Puzzle Generator.

import itertools
import random

def generate_make_puzzles(count=10, seed=None):
    if seed is not None:
        random.seed(seed)
    
    numbers = list(range(1, 10))
    all_combinations = list(itertools.combinations(numbers, 4))
    selected_combinations = random.sample(all_combinations, min(count, len(all_combinations)))
    
    return [" ".join(map(str, combo)) for combo in selected_combinations]

# Example of generated number puzzles
puzzles = generate_make_puzzles()
print(f"Example puzzles: {puzzles[:3]}")

Example puzzles: ['3 4 6 9', '1 4 7 8', '1 2 5 8']

Find the equation to reach the number 15 by combining the numbers in the given puzzle.

config = {
    "configurable": {
        "thread_id": "make15_test",
        "max_depth": 10,
        "threshold": 1.0,
        "k": 5,
        "beam_size": 3
    },
    "recursion_limit": 50
}

# Graph initialization
graph = create_graph()

# Create a puzzle for testing
puzzle = generate_make_puzzles(1, seed=42)[0]
print(f"New puzzle: {puzzle}")

# Run in a stream manner and monitor each step
for step in graph.stream({"problem": puzzle}, config):
    print(f"Step: {step}")

New puzzle: 2 5 6 7
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, 5.0, 2.0, 7.0, '*', '-', '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 6.0, 7.0, '-', 2.0, '*']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 7.0, '+', 2.0, '*', '-']), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 5.0, 2.0, 7.0, '*', '-', '+']), score=0.0, feedback='Result: -3.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, 7.0, '-', 2.0, '*']), score=0.33333333333333337, feedback='Result: 5.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 7.0, '+', 2.0, '*', '-']), score=0, feedback='The equation must use all 4 numbers exactly once.')], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, 7.0, '-', 2.0, '*']), score=0.33333333333333337, feedback='Result: 5.0')], 'scored_candidates': 'clear', 'depth': 1}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, '-', 5.0, 2.0, '*', '+', '*', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, '*', 5.0, '+', 6.0, '-', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '+', 5.0, '*', 2.0, '-', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, '+', 7.0, '*', 5.0, '-', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '+', 6.0, '-', 2.0, '*', 7.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, 2.0, 5.0, 7.0, '*', '+', '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 6.0, 2.0, '*', 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 5.0, 6.0, 2.0, '*', '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 6.0, 5.0, 7.0, '*', '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 5.0, 7.0, 6.0, '*', '-']), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '*', '-', '-', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 6.0, '-', 7.0, '+', 2.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, '*', 7.0, '+', 6.0, '-', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '-', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '+', 5.0, '+', '-', '-', 7.0, '+', 2.0]), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, 7.0, '-', 2.0, '*']), score=0.33333333333333337, feedback='Result: 5.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '*', '-', '-', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 6.0, '-', 7.0, '+', 2.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[2.0, '*', 7.0, '+', 6.0, '-', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '-', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '+', 5.0, '+', '-', '-', 7.0, '+', 2.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, 5.0, 7.0, '*', '+', '-']), score=0.0, feedback='Result: -31.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, 2.0, '*', 7.0, '-']), score=0.33333333333333337, feedback='Result: 5.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 5.0, 6.0, 2.0, '*', '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 6.0, 5.0, 7.0, '*', '+']), score=0.1333333333333333, feedback='Result: 2.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 5.0, 7.0, 6.0, '*', '-']), score=0.1333333333333333, feedback='Result: 2.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, '-', 5.0, 2.0, '*', '+', '*', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[2.0, '*', 5.0, '+', 6.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '+', 5.0, '*', 2.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[2.0, '+', 7.0, '*', 5.0, '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '+', 6.0, '-', 2.0, '*', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')")], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 5.0, 6.0, 2.0, '*', '-']), score=0.4666666666666667, feedback='Result: 7.0')], 'scored_candidates': 'clear', 'depth': 1}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '-', 7.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '-', 6.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, '/', 7.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 7.0, '+', 5.0, '-', 2.0, '/']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 5.0, '*', 7.0, '/', 6.0, '+']), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '-', '+', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '+', 7.0, '*', 2.0, '-', '/', '+', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '/', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', 6.0, '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 5.0, 6.0, 2.0, '*', '-']), score=0.4666666666666667, feedback='Result: 7.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '-', 6.0, '+']), score=0.0, feedback='Result: 39.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, '/', 7.0, '+']), score=0.6266666666666667, feedback='Result: 9.4'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 7.0, '+', 5.0, '-', 2.0, '/']), score=0.2666666666666667, feedback='Result: 4.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 5.0, '*', 7.0, '/', 6.0, '+']), score=0.49523809523809526, feedback='Result: 7.428571428571429'), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '-', '+', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '+', 7.0, '*', 2.0, '-', '/', '+', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '/', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')")], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, '/', 7.0, '+']), score=0.6266666666666667, feedback='Result: 9.4'), ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0')], 'scored_candidates': 'clear', 'depth': 1}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '-', 6.0, '+', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '+', 6.0, '+', 7.0, '-', 2.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '+', 2.0, '*', 5.0, '/', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, '*', 5.0, '+', 7.0, '-', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '/', 5.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 6.0, '*', 2.0, '-', 7.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 2.0, '*', 7.0, '-', 5.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 5.0, '-', 2.0, '+', 6.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', '+', 6.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 6.0, '*', 2.0, '+', 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '+']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 5.0, '*', 6.0, '/', 7.0, '+']), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, '/', 7.0, '+']), score=0.6266666666666667, feedback='Result: 9.4'), ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0]), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '/', 5.0, '+']), score=0.48888888888888893, feedback='Result: 7.333333333333334'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, '*', 2.0, '-', 7.0, '+']), score=0.0, feedback='Result: 35.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 7.0, '-', 5.0, '+']), score=0.6666666666666667, feedback='Result: 10.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '-']), score=0.7666666666666666, feedback='Result: 11.5'), ScoredCandidate(candidate=Equation(tokens=[7.0, 5.0, '-', 2.0, '+', 6.0]), score=0.2666666666666667, feedback='Result: 4.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, 5.0, '*', '+', 6.0, '-']), score=0.7333333333333334, feedback='Result: 11.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 6.0, '*', 2.0, '+', 7.0, '-']), score=0.33333333333333337, feedback='Result: 25.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '+']), score=0.43333333333333335, feedback='Result: 23.5'), ScoredCandidate(candidate=Equation(tokens=[2.0, 5.0, '*', 6.0, '/', 7.0, '+']), score=0.5777777777777777, feedback='Result: 8.666666666666666'), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '-', 6.0, '+', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '+', 6.0, '+', 7.0, '-', 2.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '+', 2.0, '*', 5.0, '/', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[2.0, '*', 5.0, '+', 7.0, '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')")], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '-']), score=0.7666666666666666, feedback='Result: 11.5')], 'scored_candidates': 'clear', 'depth': 1}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '-', '*', '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 5.0, '*', 2.0, '+', 7.0, '/', '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '*', '-', '/']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 2.0, '*', 7.0, '+', 6.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 2.0, 2.0, '+', 7.0, '+']), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '-', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, '+', 6.0, '+', 2.0, '-', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '+', 7.0, '+', '*', 2.0, '-', '*', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[6.0, 5.0, '*', 7.0, '+', 2.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 5.0, '*', 6.0, '+', 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 2.0, '*', 6.0, '+', 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 2.0, '*', 5.0, '+', 6.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, 7.0, '-', 6.0, '+', 2.0, '*']), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '*', 2.0, '/', 6.0, '-']), score=0.7666666666666666, feedback='Result: 11.5'), ScoredCandidate(candidate=Equation(tokens=[6.0, 5.0, '*', 7.0, '+', 2.0, '-']), score=0.0, feedback='Result: 35.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 5.0, '*', 6.0, '+', 7.0, '-']), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 2.0, '*', 6.0, '+', 7.0, '-']), score=0.6, feedback='Result: 9.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 5.0, '+', 6.0, '-']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '-', 6.0, '+', 2.0, '*']), score=0.5333333333333333, feedback='Result: 8.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 5.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '+', 6.0, '+', 2.0, '-', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '+', 7.0, '+', '*', 2.0, '-', '*', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '-', '*', '-']), score=0, feedback='The equation must use all 4 numbers exactly once.'), ScoredCandidate(candidate=Equation(tokens=[6.0, 5.0, '*', 2.0, '+', 7.0, '/', '-']), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 6.0, '*', '-', '/']), score=0, feedback='The equation must use all 4 numbers exactly once.'), ScoredCandidate(candidate=Equation(tokens=[5.0, 2.0, '*', 7.0, '+', 6.0, '-']), score=0.7333333333333334, feedback='Result: 11.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 2.0, 2.0, '+', 7.0, '+']), score=0, feedback='The equation must use all 4 numbers exactly once.')], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 5.0, '+', 6.0, '-']), score=0.8666666666666667, feedback='Result: 13.0')], 'scored_candidates': 'clear', 'depth': 1}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[5.0, 7.0, '+', 6.0, '*', 2.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[7.0, 5.0, '*', 6.0, '+', 2.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, 5.0, '+', 2.0, '*', 7.0, '-']), score=None, feedback=None), Candidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0, '+']), score=None, feedback=None)]}}
    Step: {'expand': {'candidates': [Candidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', '-', 6.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '-', 5.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '+', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[6.0, '*', 5.0, '/', 2.0, '-', 7.0]), score=None, feedback=None), Candidate(candidate=Equation(tokens=[5.0, '+', 6.0, '+', 7.0, '-', 2.0]), score=None, feedback=None)]}}
    Step: {'score': {'scored_candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 6.0, '-', 5.0, '+']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 2.0, '*', 5.0, '+', 6.0, '-']), score=0.8666666666666667, feedback='Result: 13.0'), ScoredCandidate(candidate=Equation(tokens=[5.0, 7.0, '+', 6.0, '*', 2.0, '-']), score=0.0, feedback='Result: 70.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, 5.0, '*', 6.0, '+', 2.0, '-']), score=0.0, feedback='Result: 39.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 6.0, '*', 5.0, 7.0, '-']), score=0.8, feedback='Result: 12.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 5.0, '+', 2.0, '*', 7.0, '-']), score=1.0, feedback='Result: 15.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0, '+']), score=1.0, feedback='Result: 15.0'), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[7.0, '*', 2.0, '+', 5.0, '-', '-', 6.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 2.0, '+', 7.0, '-', 5.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '*', 2.0, '+', 6.0, '+', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[6.0, '*', 5.0, '/', 2.0, '-', 7.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')"), ScoredCandidate(candidate=Equation(tokens=[5.0, '+', 6.0, '+', 7.0, '-', 2.0]), score=0, feedback="Invalid equation. Error: IndexError('pop from empty list')")], 'candidates': 'clear'}}
    Step: {'prune': {'candidates': [ScoredCandidate(candidate=Equation(tokens=[6.0, 5.0, '+', 2.0, '*', 7.0, '-']), score=1.0, feedback='Result: 15.0'), ScoredCandidate(candidate=Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0, '+']), score=1.0, feedback='Result: 15.0'), ScoredCandidate(candidate=Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), score=0.9333333333333333, feedback='Result: 14.0')], 'scored_candidates': 'clear', 'depth': 1}}

Show the final results for the top three scores.

def print_results(graph: StateGraph, config: dict):
    # Get final status
    final_state = graph.get_state(config)
    # Extract candidate solutions
    candidates = final_state.values["candidates"]
 
    print("\nFinal Results:")
    print("-" * 50)
    for i, candidate in enumerate(candidates, 1):
        print(f"Solution {i}: {candidate}")
    print("-" * 50)

final_state = graph.get_state(config)
best_solution = final_state.values["candidates"][0]  # Extract the highest score answer
search_depth = final_state.values["depth"]  # Check search depth


print_results(graph, config)

    Final Results:
    --------------------------------------------------
    Solution 1: [Equation(tokens=[6.0, 5.0, '+', 2.0, '*', 7.0, '-']), 1.0, 'Result: 15.0']
    Solution 2: [Equation(tokens=[2.0, 7.0, '*', 5.0, '-', 6.0, '+']), 1.0, 'Result: 15.0']
    Solution 3: [Equation(tokens=[6.0, 2.0, '*', 5.0, '-', 7.0, '+']), 0.9333333333333333, 'Result: 14.0']
    --------------------------------------------------

PreviousDeploy on LangGraph Cloud NextOllama Deep Researcher (Deepseek-R1)

Last updated 3 months ago