princeton-nlp/tree-of-thought-llm

Analysis updated 2026-07-03

★ 5,947PythonAudience · researcherComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((tree-of-thought-llm))
    Core Idea
      Multiple reasoning paths
      Evaluate candidates
      BFS and DFS search
    Tech Stack
      Python
      OpenAI API
    Experiments
      Game of 24
      Creative writing
      Crossword puzzles
    Audience
      AI researchers
      LLM engineers

mindmap root((tree-of-thought-llm)) Core Idea Multiple reasoning paths Evaluate candidates BFS and DFS search Tech Stack Python OpenAI API Experiments Game of 24 Creative writing Crossword puzzles Audience AI researchers LLM engineers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run the Game of 24 math puzzle experiment to see how Tree of Thoughts improves GPT-4 accuracy over standard prompting.

USE CASE 2

Define a new hard reasoning task and plug it into the framework by writing two small Python files.

USE CASE 3

Reproduce the NeurIPS 2023 paper results using the included saved logs without spending API credits.

What is it built with?

PythonOpenAI API

How does it compare?

	princeton-nlp/tree-of-thought-llm	coleifer/huey	om-ai-lab/vlm-r1
Stars	5,947	5,952	5,956
Language	Python	Python	Python
Setup difficulty	moderate	easy	hard
Complexity	3/5	2/5	5/5
Audience	researcher	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires a paid OpenAI API key, GPT-4 calls for the full benchmark experiments can be expensive.

Use freely for any purpose, including commercial use, as long as you keep the original copyright notice.

In plain English

This is the official code repository for a research paper called "Tree of Thoughts," published at NeurIPS 2023 by researchers at Princeton. The paper introduces a technique for getting AI language models to solve hard problems more reliably by having them explore multiple reasoning paths at once, rather than committing to one answer in a single pass. The core idea is inspired by how humans think through difficult problems: instead of picking the first plausible answer, you might sketch several approaches, evaluate which looks most promising, and continue down that branch while discarding less promising ones. The code implements this using large language models (in practice, GPT-4 via the OpenAI API) as both the idea generator and the evaluator. The model generates candidate "thoughts" (partial solutions or reasoning steps), evaluates how good each one is, and uses a search strategy (breadth-first or depth-first search) to find a complete solution. The repository includes experiments on three specific tasks from the paper: the Game of 24 (a math puzzle using arithmetic), creative writing, and crossword puzzle solving. Shell scripts and a Jupyter notebook reproduce each experiment. Saved logs from the original paper runs are included so you can inspect the model's step-by-step reasoning without re-running anything. Adding a new task to the framework is documented in the README and involves writing two small Python files: one defining the task and one defining the prompts the model should use. The library is also available as a pip package. The code requires an OpenAI API key and Python 3.7 or later. It is licensed under MIT.

Copy-paste prompts

Prompt 1

Using the tree-of-thought-llm library, show me how to run the Game of 24 experiment with GPT-4 using breadth-first search.

Prompt 2

How do I define a new custom task in tree-of-thought-llm? Show me what code to put in the task file and the prompts file.

Prompt 3

I want to compare breadth-first and depth-first search in tree-of-thought-llm for a logic puzzle. How do I switch between them?

Prompt 4

How do I install tree-of-thought-llm as a pip package and write a minimal script to solve a custom problem with GPT-4?

Frequently asked questions

What is tree-of-thought-llm?

An implementation of the Tree of Thoughts technique from NeurIPS 2023 that improves AI problem-solving by exploring multiple reasoning paths and picking the best one, using GPT-4.

What language is tree-of-thought-llm written in?

Mainly Python. The stack also includes Python, OpenAI API.

What license does tree-of-thought-llm use?

Use freely for any purpose, including commercial use, as long as you keep the original copyright notice.

How hard is tree-of-thought-llm to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is tree-of-thought-llm for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub princeton-nlp on gitmyhub

Verify against the repo before relying on details.