ToolBench is a research project from OpenBMB that teaches AI language models how to use real-world software tools and APIs. An API is a way for one piece of software to call another, for example a weather service, a translation tool, or a payment system. The project gathers over 16,000 such APIs from a platform called RapidAPI and builds a large dataset of tasks that require an AI to pick the right tools and call them in the right order. The core of the project is a dataset of about 126,000 examples, each showing an AI working through a task step by step: which tools to call, what results they returned, and how the AI reasoned about what to do next. To build this dataset automatically, the team created a method called DFSDT (depth-first search decision tree), which lets an AI explore different sequences of tool calls until it finds one that works. This process was run using ChatGPT and then filtered by the researchers. On top of the dataset, the project ships ToolLLaMA, an open-source language model fine-tuned on the ToolBench data. ToolLLaMA-2-7b-v2 is the current recommended version and shows tool-use performance comparable to ChatGPT. There is also an evaluation framework called ToolEval for measuring how well any model performs at using tools, and a companion project called StableToolBench that replaces live API calls with simulated responses so tests are more reproducible. This is primarily a research artifact accepted at the ICLR 2024 conference. Running it requires downloading the dataset, setting up the Python environment with the provided scripts, and in some cases obtaining a ToolBench API key to access the backend RapidAPI service. It is released under the Apache 2.0 license for research and educational use. The full README is longer than what was shown.
← openbmb on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.