Analysis updated 2026-07-03
Run the Game of 24 math puzzle experiment to see how Tree of Thoughts improves GPT-4 accuracy over standard prompting.
Define a new hard reasoning task and plug it into the framework by writing two small Python files.
Reproduce the NeurIPS 2023 paper results using the included saved logs without spending API credits.
| princeton-nlp/tree-of-thought-llm | coleifer/huey | om-ai-lab/vlm-r1 | |
|---|---|---|---|
| Stars | 5,947 | 5,952 | 5,956 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | easy | hard |
| Complexity | 3/5 | 2/5 | 5/5 |
| Audience | researcher | developer | researcher |
Figures from each repo's GitHub metadata at analysis time.
Requires a paid OpenAI API key, GPT-4 calls for the full benchmark experiments can be expensive.
This is the official code repository for a research paper called "Tree of Thoughts," published at NeurIPS 2023 by researchers at Princeton. The paper introduces a technique for getting AI language models to solve hard problems more reliably by having them explore multiple reasoning paths at once, rather than committing to one answer in a single pass. The core idea is inspired by how humans think through difficult problems: instead of picking the first plausible answer, you might sketch several approaches, evaluate which looks most promising, and continue down that branch while discarding less promising ones. The code implements this using large language models (in practice, GPT-4 via the OpenAI API) as both the idea generator and the evaluator. The model generates candidate "thoughts" (partial solutions or reasoning steps), evaluates how good each one is, and uses a search strategy (breadth-first or depth-first search) to find a complete solution. The repository includes experiments on three specific tasks from the paper: the Game of 24 (a math puzzle using arithmetic), creative writing, and crossword puzzle solving. Shell scripts and a Jupyter notebook reproduce each experiment. Saved logs from the original paper runs are included so you can inspect the model's step-by-step reasoning without re-running anything. Adding a new task to the framework is documented in the README and involves writing two small Python files: one defining the task and one defining the prompts the model should use. The library is also available as a pip package. The code requires an OpenAI API key and Python 3.7 or later. It is licensed under MIT.
An implementation of the Tree of Thoughts technique from NeurIPS 2023 that improves AI problem-solving by exploring multiple reasoning paths and picking the best one, using GPT-4.
Mainly Python. The stack also includes Python, OpenAI API.
Use freely for any purpose, including commercial use, as long as you keep the original copyright notice.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.