torchchat is a project from the PyTorch team that demonstrates how to run large language models (the kind of AI that powers chat assistants) in many different environments: on a laptop or server using Python, embedded in a C or C++ application, and on mobile devices running iOS or Android. The goal is to show that these models do not have to live only in the cloud and can run directly on your own hardware. The project is no longer under active development, as noted in the repository, but the code remains available as a reference. While it was maintained, it supported a wide range of well-known open models including various sizes of Meta's Llama family, Mistral, IBM Granite, DeepSeek R1, and small toy models for testing. Some of those models were marked as mobile-friendly, meaning they were small enough to run on a phone. From the command line, you can have a back-and-forth conversation with a model, ask it to generate text from a prompt, or open a browser-based chat interface. There is also a server mode. For mobile and desktop use without Python, the project supports exporting models to optimized formats that can run faster in C++ applications or be packaged into iOS and Android apps. The project supports several ways to speed up inference, including compiling models ahead of time and using reduced numerical precision (smaller numbers that take less memory and compute). It works on Linux, Mac with Apple Silicon, Android devices, and iPhones with enough memory. Installation requires Python 3.10 and uses a virtual environment to keep dependencies isolated. The full README is longer than what was shown.
← pytorch on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.