apache/tvm

★ 13,360PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((Apache TVM))
    What it does
      Model compilation
      Hardware targeting
      Speed optimization
    Architecture
      TensorIR
      Relax graphs
      Python API
    Hardware targets
      CPU
      GPU
      Mobile
      WebAssembly
    Use cases
      Model deployment
      ML research
      Edge devices

mindmap root((Apache TVM)) What it does Model compilation Hardware targeting Speed optimization Architecture TensorIR Relax graphs Python API Hardware targets CPU GPU Mobile WebAssembly Use cases Model deployment ML research Edge devices

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Compile a trained PyTorch or TensorFlow model into an optimized binary that runs faster on a specific CPU or GPU

USE CASE 2

Deploy the same AI model to multiple hardware targets from a single codebase without rewriting the model

USE CASE 3

Customize the compilation pipeline in Python to experiment with new optimization passes for ML research

USE CASE 4

Target a JavaScript or WebAssembly environment to run a compiled AI model directly in the browser

Tech stack

PythonC++LLVMCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires matching hardware drivers (CUDA, ROCm, or Metal) and significant build time for non-Python-wheel targets.

Use freely for any purpose, including commercial use, as long as you include the Apache 2.0 license and copyright notice.

In plain English

Apache TVM is an open-source compiler framework for machine learning models. A compiler in this context is a tool that takes a trained AI model and translates it into optimized code that runs efficiently on specific hardware, whether that is a laptop CPU, a phone GPU, or a specialized chip. The goal is to make models run as fast and as leanly as possible on whatever device they are deployed to. The project started as academic research into deep learning compilation and has gone through several design overhauls since then. The current version focuses on Python-first development, meaning that the people who use and customize TVM can do most of their work in Python rather than lower-level languages. This makes it easier to experiment with and adapt the compilation pipeline for different needs. TVM supports a wide range of hardware targets: standard CPUs, GPUs from different vendors, mobile devices, and even JavaScript environments. Its ability to target so many different platforms from a single framework is one of its main appeals for teams that need to deploy the same model in multiple places. The internal architecture uses two main representations: TensorIR for describing individual math operations at a low level, and Relax for describing the full computation graph of a model. Both layers can be customized and optimized through Python, and they work together to squeeze out performance across the whole model rather than just individual pieces. TVM is part of the Apache Software Foundation and is licensed under Apache 2.0. Documentation and tutorials are hosted separately at tvm.apache.org.

Copy-paste prompts

Prompt 1

Show me how to compile a PyTorch ResNet model with Apache TVM for CPU inference, including the auto-tuning step

Prompt 2

How do I use TVM's Relax frontend to import an ONNX model and compile it for a mobile GPU target?

Prompt 3

Write a Python script that benchmarks a TVM-compiled model's inference speed against the original PyTorch model

Prompt 4

How do I write a custom TensorIR schedule in TVM to optimize a matrix multiplication for my specific hardware?

Prompt 5

Walk me through targeting a WebAssembly backend with Apache TVM to run a small language model in the browser

Open on GitHub → Explain another repo

← apache on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.