SmolLM is a family of small AI models from Hugging Face built to run efficiently on ordinary devices rather than requiring large cloud servers. The repository covers two main model types: SmolLM for text generation, and SmolVLM for understanding both images and text together. The latest text model, SmolLM3, has 3 billion parameters and was trained on 11 trillion tokens of text. The README says it outperforms other models of similar size and stays competitive with some models that are larger. It supports six languages (English, French, Spanish, German, Italian, and Portuguese), can handle long conversations up to 128,000 tokens, and includes a reasoning mode that lets it show its thinking before giving a final answer. The full training process, datasets used, and configuration details are publicly available, making this a fully transparent release. SmolVLM is the vision version of the family. It takes images and text as input together and can answer questions about images, describe what is in a picture, or handle conversations that include multiple images. Both models are available through the Hugging Face transformers library, which means loading and running them requires only a few lines of Python. The repository also includes tools for running inference locally, which aligns with the project's stated goal of making capable models that work on-device without depending on an internet connection. The repository organizes code into separate folders for the text models, vision models, and shared utilities. Training datasets used to build these models are also published separately on the Hugging Face platform and are linked from the README.
← huggingface on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.