Convert a Llama, Mistral, or Gemma model and run text generation at up to 4x less memory than standard formats.
Build a fast machine translation pipeline that works on CPU without requiring a GPU.
Speed up text summarization by swapping your standard inference framework for CTranslate2.
Run multilingual models on cheap cloud instances using 8-bit quantization to cut memory and cost.
Models must be converted to CTranslate2 format using the included converter before inference can begin.
CTranslate2 is a C++ library, also available as a Python package, for running AI language models faster and with less memory than general-purpose training frameworks. It does not train models. It takes an already-trained model, converts it to an optimized format, and then runs it at high speed for tasks like translation, text summarization, or text generation. The library supports a wide range of model architectures, including models behind many translation systems, text summarizers, and open-weight language models such as Llama, Mistral, and Gemma. Compatible models need to be converted using the provided tools before they can be used. Converters are included for several popular training frameworks, so most users can bring their existing models over without writing custom conversion code. Speed comes from several techniques applied automatically during inference: merging certain computation steps, removing padding from inputs, reordering batches to minimize wasted time, and using reduced numerical precision. The library can store and compute weights in 16-bit or 8-bit formats rather than the standard 32-bit, which shrinks model size on disk by up to 4x and often speeds up computation with minimal accuracy loss. The library runs on both CPU and GPU and detects the best backend for the current hardware automatically. Supported CPU architectures include x86-64 and ARM64, with integrations for several math acceleration libraries. Python users can install it with pip and start translating or generating text in a few lines of code. Documentation is available at the project site, and the project is maintained with backward compatibility in mind.
← opennmt on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.