Train a custom translation model between two languages on your own dataset.
Fine-tune a Llama or Mistral large language model on a single 24 GB GPU using 4-bit quantization and LoRA adapters.
Build a REST API server that accepts text input and returns translations from a trained model.
Export a trained model to CTranslate2 for fast production inference.
Requires PyTorch 2.0+ and CUDA, fine-tuning 7B-13B models needs a 24 GB GPU even with 4-bit quantization.
OpenNMT-py is a Python framework for neural machine translation and language model training, built on top of PyTorch. Neural machine translation means using a neural network to translate text from one language to another, and this project provides the code and tools to train, fine-tune, and run such models. It was designed to be researcher-friendly, with flexibility to experiment with different model configurations and training setups. An important note from the README: this project is no longer actively maintained. The team has started a successor project called Eole, which covers the same functionality with a redesigned codebase. If you are starting fresh, the README recommends switching to Eole instead. OpenNMT-py remains available and functional but will not receive ongoing development. Beyond translation, later versions of OpenNMT-py added support for large language models. You can load, fine-tune, and run inference with models in the Llama and Mistral family, along with several other open-weight models. The project supports 4-bit and 8-bit quantization, which means you can run or fine-tune large models on consumer-grade GPUs with limited memory. For example, the README notes that a 7 billion or 13 billion parameter model can be fine-tuned on a single 24 GB GPU using 4-bit quantization. It also supports LoRA adapters, which are a way of fine-tuning a model without updating all of its weights. For training at scale, the project supports tensor parallelism, which splits a model across multiple GPUs when it is too large to fit in one. For faster inference after training, the README suggests exporting models to CTranslate2, a separate inference engine optimized for speed. Installation is through pip or from source, and requires Python 3.8 or later and PyTorch 2.0 or later. Docker images with CUDA support are available for reproducible setups. Optional performance improvements come from installing Apex (an NVIDIA library for mixed-precision training) and Flash Attention 2. The project includes tutorials for common tasks, including fine-tuning translation models, replicating a conversational model based on Llama, and setting up a simple REST server for serving translations.
← opennmt on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.