opennmt/opennmt-py

★ 7,003PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((OpenNMT-py))
    Core Features
      Neural translation
      LLM fine-tuning
      LoRA adapters
    Supported Models
      Llama and Mistral
      Translation models
      Quantized models
    Deployment
      REST server
      CTranslate2 export
      Docker images
    Tech Stack
      Python
      PyTorch
      CUDA

mindmap root((OpenNMT-py)) Core Features Neural translation LLM fine-tuning LoRA adapters Supported Models Llama and Mistral Translation models Quantized models Deployment REST server CTranslate2 export Docker images Tech Stack Python PyTorch CUDA

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Train a custom translation model between two languages on your own dataset.

USE CASE 2

Fine-tune a Llama or Mistral large language model on a single 24 GB GPU using 4-bit quantization and LoRA adapters.

USE CASE 3

Build a REST API server that accepts text input and returns translations from a trained model.

USE CASE 4

Export a trained model to CTranslate2 for fast production inference.

Tech stack

PythonPyTorchDockerCUDACTranslate2

Getting it running

Difficulty · hard Time to first run · 1h+

Requires PyTorch 2.0+ and CUDA, fine-tuning 7B-13B models needs a 24 GB GPU even with 4-bit quantization.

In plain English

OpenNMT-py is a Python framework for neural machine translation and language model training, built on top of PyTorch. Neural machine translation means using a neural network to translate text from one language to another, and this project provides the code and tools to train, fine-tune, and run such models. It was designed to be researcher-friendly, with flexibility to experiment with different model configurations and training setups. An important note from the README: this project is no longer actively maintained. The team has started a successor project called Eole, which covers the same functionality with a redesigned codebase. If you are starting fresh, the README recommends switching to Eole instead. OpenNMT-py remains available and functional but will not receive ongoing development. Beyond translation, later versions of OpenNMT-py added support for large language models. You can load, fine-tune, and run inference with models in the Llama and Mistral family, along with several other open-weight models. The project supports 4-bit and 8-bit quantization, which means you can run or fine-tune large models on consumer-grade GPUs with limited memory. For example, the README notes that a 7 billion or 13 billion parameter model can be fine-tuned on a single 24 GB GPU using 4-bit quantization. It also supports LoRA adapters, which are a way of fine-tuning a model without updating all of its weights. For training at scale, the project supports tensor parallelism, which splits a model across multiple GPUs when it is too large to fit in one. For faster inference after training, the README suggests exporting models to CTranslate2, a separate inference engine optimized for speed. Installation is through pip or from source, and requires Python 3.8 or later and PyTorch 2.0 or later. Docker images with CUDA support are available for reproducible setups. Optional performance improvements come from installing Apex (an NVIDIA library for mixed-precision training) and Flash Attention 2. The project includes tutorials for common tasks, including fine-tuning translation models, replicating a conversational model based on Llama, and setting up a simple REST server for serving translations.

Copy-paste prompts

Prompt 1

I want to fine-tune a Llama 7B model using OpenNMT-py with 4-bit quantization on a single GPU, walk me through the training config and command.

Prompt 2

How do I train a translation model from scratch in OpenNMT-py? Give me the data prep, training, and evaluation commands.

Prompt 3

Set up a REST server with OpenNMT-py that accepts text and returns translations, what's the minimal config to get it running?

Prompt 4

Export my trained OpenNMT-py model to CTranslate2 format for faster inference, what are the steps?

Open on GitHub → Explain another repo

← opennmt on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.