Fine-tune open-source models like LLaMA on your own conversation data to create custom chatbots.
Self-host multiple language models behind an OpenAI-compatible API so existing tools can use them without code changes.
Run systematic evaluations and comparisons of different models using MT-Bench or Chatbot Arena voting.
Build and operate a large-scale model evaluation platform with human preference feedback.
Requires GPU/CUDA setup, model downloads (10GB+), and multiple service components (training, serving, evaluation backend).
FastChat is an open platform for training, serving, and evaluating large language model chatbots. It was created by the LMSYS organization and is the release repository for Vicuna, an open-source chatbot trained by fine-tuning Meta's LLaMA model on conversation data, and for Chatbot Arena, a popular benchmark where users vote on which AI responses they prefer in blind side-by-side comparisons. The platform has three main capabilities. First, it provides training code and recipes for fine-tuning foundation models like LLaMA on instruction-following data, which is how Vicuna was created. Second, it includes a distributed serving system that can load multiple large language models and expose them through a web chat interface or through an OpenAI-compatible REST API, meaning existing software that calls the OpenAI API can be pointed at FastChat instead to use open-source models. Third, it contains evaluation frameworks including MT-Bench, a multi-turn benchmark designed to measure how well chatbots handle complex, multi-step conversations beyond simple one-shot questions. You would use FastChat if you are a researcher studying how to train better open-source chatbots, an engineer who wants to self-host language models behind an API that existing tools already know how to call, or someone building infrastructure to evaluate and compare multiple models systematically. The Chatbot Arena component has powered over 10 million chat requests across 70 or more models and collected over 1.5 million human preference votes, producing one of the most widely cited LLM leaderboards in the research community. The tech stack is Python throughout. It uses the Hugging Face Transformers library for model loading, supports single and multi-GPU inference, CPU inference, and Apple Silicon via the Metal backend, and can be installed with a single pip command.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.