explaingit

xuehaipan/nvitop

6,895PythonAudience · dataComplexity · 2/5Setup · easy

TLDR

An interactive terminal dashboard for monitoring NVIDIA GPUs in real time, showing memory, temperature, and running processes with sorting, filtering, and signal controls, plus a Python API and Grafana exporter for custom monitoring.

Mindmap

mindmap
  root((nvitop))
    What it does
      GPU monitoring
      Process management
      Real-time dashboard
    Interactive features
      Sort and filter
      Process signals
      History graphs
    Companion tools
      nvisel GPU picker
      Python API
      Grafana exporter
    Install
      pip or conda
      Linux and Windows
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Watch GPU memory and temperature update live during a machine learning training run instead of running nvidia-smi repeatedly

USE CASE 2

Find and kill a runaway GPU process that is consuming all VRAM on a shared research server

USE CASE 3

Use nvisel to automatically pick the GPU with the most free memory before launching a training job

USE CASE 4

Export GPU metrics to a Grafana dashboard to track historical utilization across a team's GPU cluster

Tech stack

PythonNVIDIA CUDA

Getting it running

Difficulty · easy Time to first run · 5min

Requires NVIDIA GPU and drivers installed, does not support AMD or Intel GPUs.

In plain English

nvitop is a terminal-based tool for monitoring NVIDIA graphics cards and the processes running on them. If you work with machine learning or any software that uses a GPU, you have probably used the built-in nvidia-smi command, which gives you a static snapshot of what is happening on your GPU. nvitop goes further by providing a continuously updating, colorful interface that shows GPU memory usage, temperature, running processes, and more, all in a format that is much easier to read at a glance. The tool has two main ways to use it. The first is a quick status check that prints the current GPU and process state to the terminal. The second is a full monitor mode that runs interactively, similar to how the htop tool works for CPU processes. In monitor mode you can sort and filter the list of GPU processes, view the process tree to see which parent programs launched each GPU task, inspect environment variables, send signals to stop or pause processes, and navigate using either the keyboard or the mouse. Bar charts and history graphs show how resource usage has changed over time. Beyond the interactive display, nvitop also ships a companion tool called nvisel that helps deep learning researchers choose which GPU to use before starting a training job, based on available memory or other criteria. The package additionally exposes a Python programming interface so developers can query GPU and process information from within their own scripts or applications. This API supports collecting snapshots of metrics and building custom monitoring dashboards. There is also an exporter component that feeds data into Grafana, a popular tool for displaying metrics on dashboards. Installation is straightforward via pip or conda, and the tool works on both Linux and Windows. It queries the GPU directly using NVIDIA's own library rather than by parsing nvidia-smi output, which makes it faster and more accurate. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Run nvitop in monitor mode on a Linux server and show me how to filter the GPU process list to only show jobs using more than 4GB of VRAM.
Prompt 2
Using nvitop's Python API, write a script that checks GPU memory every 30 seconds and prints a warning to the terminal if any GPU exceeds 90% utilization.
Prompt 3
Show me how to use nvisel to automatically select the least-loaded GPU and pass it as the CUDA_VISIBLE_DEVICES variable when starting a PyTorch training script.
Prompt 4
Set up the nvitop Grafana exporter on a Linux server with multiple GPUs and display per-GPU temperature, memory, and utilization in a shared team dashboard.
Open on GitHub → Explain another repo

← xuehaipan on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.