Speed up a PyTorch or TensorFlow model for real-time inference by exporting it to ONNX and compiling it with TensorRT on an NVIDIA GPU.
Write custom TensorRT plugins to add GPU compute operations for model architectures that TensorRT does not natively support.
Serve large language models at scale on NVIDIA hardware with lower latency than running the raw model directly.
Requires an NVIDIA GPU with CUDA installed, Docker is strongly recommended to avoid managing system dependencies manually.
TensorRT is NVIDIA's toolkit for running AI models as fast as possible on NVIDIA GPUs. When you train an AI model, the result is a large file describing a network of mathematical operations. TensorRT takes that file and optimizes it specifically for the GPU it will run on, producing a much faster version that consumes less memory and delivers lower latency than running the original model directly. This repository contains the open-source portions of TensorRT, which is a subset of the full product. The open-source components include plugins (modular pieces of custom compute logic that extend what TensorRT can handle), an ONNX parser (ONNX is a standard file format for AI models, and the parser lets TensorRT read models saved in that format), and a collection of sample applications showing how to use the toolkit. The easiest way to use TensorRT with Python is through a pip install, which handles everything automatically. Building from source is more involved and requires a compatible NVIDIA GPU, CUDA libraries, CMake, and several other system dependencies. The repository provides Docker container setups to make this process more consistent across machines. TensorRT is widely used in production environments where inference speed matters, such as real-time video processing, autonomous vehicles, and serving large language models at scale. It supports models from frameworks like PyTorch and TensorFlow by first exporting them to the ONNX format and then compiling them with TensorRT. A major new version, TensorRT 11.0, is planned for mid-2026 and will remove several older APIs while introducing a cleaner interface. The README notes specific older features that will be dropped and points to their replacements for developers who need to migrate existing code.
← nvidia on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.