explaingit

megvii-basedetection/yolox

10,457PythonAudience · researcherComplexity · 3/5Setup · moderate

TLDR

YOLOX is a Python library that automatically draws labeled boxes around objects in images and video in real time, offering model sizes from phone-friendly Nano to GPU-powered Extra-Large, without needing pre-defined anchor boxes.

Mindmap

mindmap
  root((YOLOX))
    What it does
      Object detection
      Real-time video
      Anchor-free design
    Model sizes
      Nano and Tiny
      Small and Medium
      Large and XLarge
    Deployment formats
      ONNX cross-platform
      TensorRT Nvidia
      ncnn mobile
      OpenVINO Intel
    Training
      COCO benchmark
      Custom datasets
      PyTorch backbone
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run real-time object detection on video footage to automatically identify and draw boxes around cars, people, and other objects.

USE CASE 2

Train a custom object detection model on your own labeled image dataset using the YOLOX training pipeline.

USE CASE 3

Deploy a YOLOX Nano model on a mobile or embedded device for lightweight on-device object detection.

USE CASE 4

Export a trained YOLOX model to TensorRT for fast inference on Nvidia GPU hardware in production.

Tech stack

PythonPyTorchONNXTensorRTOpenVINOCUDA

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a CUDA-capable Nvidia GPU for training, CPU inference is possible but significantly slower.

In plain English

YOLOX is a Python library for object detection, meaning it lets you take an image or video and automatically draw boxes around the objects in it, labeling each one (a car, a person, a dog, and so on). It belongs to the YOLO family of detectors, which have been popular for years because they are fast enough to process video in real time. YOLOX was released by Megvii in 2021 and presented improvements over earlier YOLO versions (v3 through v5). The key design change in YOLOX is that it removed a concept called anchors. Older YOLO models used a set of pre-defined box shapes to help them guess where objects might be. YOLOX predicts boxes directly without that pre-defined set, which simplifies the training process and makes the model easier to adapt to new tasks. The README states this approach achieves higher accuracy than the anchor-based versions it replaces. The library ships several model sizes: Nano and Tiny for devices with very limited computing power (like a phone or a small embedded board), and Small, Medium, Large, and Extra-Large for servers with GPUs. A table in the README shows the speed and accuracy numbers for each size, tested on the COCO benchmark dataset that researchers commonly use to compare object detectors. Once you install YOLOX from source using pip, you can run a demo on a single image or on a video file with one command. Training your own model on the COCO dataset or a custom dataset is also supported. The library works with multiple export formats so you can deploy a trained model in different environments: ONNX for cross-platform compatibility, TensorRT for fast inference on Nvidia hardware, ncnn for mobile devices, and OpenVINO for Intel hardware. The codebase is written in PyTorch, which is the framework most AI researchers use. A separate version using MegEngine (Megvii's own framework) also exists in a different repository.

Copy-paste prompts

Prompt 1
I want to run YOLOX on a video file to detect cars and pedestrians. Walk me through installing YOLOX from source and running the demo command.
Prompt 2
Help me train a custom YOLOX Small model on my own dataset in COCO format, from data preparation to evaluating the final model.
Prompt 3
I want to export my trained YOLOX model to ONNX format for deployment on a different platform. What command do I run and what are the common gotchas?
Prompt 4
Which YOLOX model size should I choose for real-time detection on an edge device versus a desktop GPU? Explain the speed-accuracy trade-offs from the benchmark table.
Prompt 5
How do I evaluate my YOLOX model on the COCO benchmark and interpret the mAP score to know if it's performing well?
Open on GitHub → Explain another repo

← megvii-basedetection on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.