nvidia/tensorrt

★ 12,978C++Audience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((TensorRT))
    What it does
      Optimize AI models
      Faster GPU inference
      Lower memory use
    Tech Stack
      CUDA
      ONNX parser
      C plus plus
      Python
    Use Cases
      Video processing
      LLM serving
      Autonomous vehicles
    Setup
      Docker recommended
      NVIDIA GPU required
      CUDA libraries

mindmap root((TensorRT)) What it does Optimize AI models Faster GPU inference Lower memory use Tech Stack CUDA ONNX parser C plus plus Python Use Cases Video processing LLM serving Autonomous vehicles Setup Docker recommended NVIDIA GPU required CUDA libraries

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Speed up a PyTorch or TensorFlow model for real-time inference by exporting it to ONNX and compiling it with TensorRT on an NVIDIA GPU.

USE CASE 2

Write custom TensorRT plugins to add GPU compute operations for model architectures that TensorRT does not natively support.

USE CASE 3

Serve large language models at scale on NVIDIA hardware with lower latency than running the raw model directly.

Tech stack

C++PythonCUDAONNXCMakeDocker

Getting it running

Difficulty · hard Time to first run · 1h+

Requires an NVIDIA GPU with CUDA installed, Docker is strongly recommended to avoid managing system dependencies manually.

In plain English

TensorRT is NVIDIA's toolkit for running AI models as fast as possible on NVIDIA GPUs. When you train an AI model, the result is a large file describing a network of mathematical operations. TensorRT takes that file and optimizes it specifically for the GPU it will run on, producing a much faster version that consumes less memory and delivers lower latency than running the original model directly. This repository contains the open-source portions of TensorRT, which is a subset of the full product. The open-source components include plugins (modular pieces of custom compute logic that extend what TensorRT can handle), an ONNX parser (ONNX is a standard file format for AI models, and the parser lets TensorRT read models saved in that format), and a collection of sample applications showing how to use the toolkit. The easiest way to use TensorRT with Python is through a pip install, which handles everything automatically. Building from source is more involved and requires a compatible NVIDIA GPU, CUDA libraries, CMake, and several other system dependencies. The repository provides Docker container setups to make this process more consistent across machines. TensorRT is widely used in production environments where inference speed matters, such as real-time video processing, autonomous vehicles, and serving large language models at scale. It supports models from frameworks like PyTorch and TensorFlow by first exporting them to the ONNX format and then compiling them with TensorRT. A major new version, TensorRT 11.0, is planned for mid-2026 and will remove several older APIs while introducing a cleaner interface. The README notes specific older features that will be dropped and points to their replacements for developers who need to migrate existing code.

Copy-paste prompts

Prompt 1

I have a PyTorch model I want to run faster on an NVIDIA GPU. Walk me through exporting it to ONNX and compiling it with TensorRT step by step.

Prompt 2

I need to write a custom TensorRT plugin for a non-standard attention layer in my model. Show me the plugin API and a minimal working example.

Prompt 3

My TensorRT inference is slower than I expected. Here is my build config and model architecture, help me identify what is limiting performance.

Prompt 4

How do I set up the TensorRT Docker container on Ubuntu to build and run the included sample applications?

Prompt 5

I am migrating from TensorRT 8 to TensorRT 10 and some APIs were removed. Help me update this inference code to use the new interface.

Open on GitHub → Explain another repo

← nvidia on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.