alibaba/mnn

★ 15,162C++Audience · developerComplexity · 5/5ActiveLicenseSetup · hard

Mindmap

mindmap
  root((MNN))
    Inputs
      Trained models
      ONNX TFLite
      Quantized weights
    Outputs
      On-device inference
      LLM tokens
      Generated images
    Use Cases
      Run an LLM on phone
      Stable diffusion on device
      Mobile vision app
    Tech Stack
      C++
      Android
      iOS
      ARM
      CUDA

mindmap root((MNN)) Inputs Trained models ONNX TFLite Quantized weights Outputs On-device inference LLM tokens Generated images Use Cases Run an LLM on phone Stable diffusion on device Mobile vision app Tech Stack C++ Android iOS ARM CUDA

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a Qwen or LLaMA LLM locally on an Android or iOS device with MNN-LLM.

USE CASE 2

Generate images with a Stable Diffusion model entirely on a phone using MNN-Diffusion.

USE CASE 3

Embed a small classifier or detector in a mobile app with the 800KB Android core library.

USE CASE 4

Quantize a model to FP16 or INT8 to shrink it by 50-70% before shipping to devices.

Tech stack

C++AndroidiOSARMCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires native build toolchains for Android NDK or iOS Xcode, model conversion, and tuning for the target device.

Apache 2.0 licensed: use freely in personal and commercial projects, including modified versions, as long as you keep the license notice.

In plain English

MNN is a deep learning framework built by Alibaba whose job is to run machine learning models on individual devices, such as phones, PCs, and small embedded computers, rather than on a server in the cloud. The technical term for this is on-device inference. Once a model has been trained somewhere else, MNN takes that trained model and executes it efficiently on whatever hardware the user has in their hand or on their desk. The README says MNN also supports training, but inference is the headline use case. The README backs up its claims with deployment numbers from inside Alibaba. MNN is integrated into more than 30 Alibaba apps, including Taobao, Tmall, Youku, DingTalk, and Xianyu, across over 70 distinct usage scenarios such as live broadcasts, short-video capture, search and recommendation, visual product search, interactive marketing, and risk control. MNN is also the basic compute module of a system Alibaba calls Walle, described in a paper at the OSDI 2022 systems conference as the first large-scale production system for device-cloud collaborative machine learning. The README provides the BibTeX entry for citing that paper. Two sub-projects sit on top of MNN. MNN-LLM is a runtime for large language models that aims to run LLMs locally on phones, PCs, and Internet-of-Things devices. It claims support for several open-weight model families including Qianwen, Baichuan, Zhipu, and LLAMA. MNN-Diffusion is the same idea for stable-diffusion image-generation models. The repository also ships several reference applications: an Android chat app that bundles text, image, audio, and image-generation models, an iOS multimodal chat app, a 3D-avatar app called TaoAvatar that talks back using on-device speech recognition, language modelling, and text-to-speech, and a cartoon-style photo editor named Sana. The key features section emphasises small size and broad hardware support. On iOS the static library, with all options enabled for armv7 and arm64, comes to about 12 megabytes, and on Android the core shared library is around 800 kilobytes. A reduced build option called MNN_BUILD_MINI shrinks this another 25 percent at the cost of requiring fixed model input sizes. MNN also supports half-precision (FP16) and 8-bit integer quantization, which the README says can cut model size by 50 to 70 percent. The news section at the top tracks recent additions: support for the Qwen 3.5 series, the Qwen3-VL vision-language series, DeepSeek R1 1.5B, and Qwen 2.5 Omni 3B and 7B. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Show me how to convert a PyTorch model to MNN format and run it on an Android phone in C++.

Prompt 2

Build the MNN-LLM Android demo app and load a Qwen 2.5 7B quantized model on a Snapdragon device.

Prompt 3

Run Stable Diffusion image generation on iOS using MNN-Diffusion with FP16 weights.

Prompt 4

Quantize my ONNX model to INT8 with MNN's tools and compare the on-device latency vs FP32.

Prompt 5

Use the MNN_BUILD_MINI option to shrink the binary for a fixed-input-size image classifier on an embedded board.

Open on GitHub → Explain another repo

← alibaba on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.