explaingit

facebookresearch/mae

8,315Python
This is a quick first-pass explanation. The richer sections — use-cases, tech stack, setup, prompts — are still being generated.

TLDR

This repository contains a PyTorch implementation of Masked Autoencoders (MAE), a technique for training image recognition models developed by researchers at Facebook.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

In plain English

This repository contains a PyTorch implementation of Masked Autoencoders (MAE), a technique for training image recognition models developed by researchers at Facebook. The core idea behind MAE is to teach a model to understand images by hiding random patches of an image and asking the model to reconstruct the missing parts. Through this self-supervised training process, the model learns rich visual features without needing labeled data. The training happens in two phases. First, the model is pre-trained on a large collection of unlabeled images using the masking approach. Then the pre-trained model is fine-tuned on a labeled dataset for a specific task, such as classifying what object is in a photo. The researchers found this approach produces models that generalize well: the same pre-trained weights perform strongly across a variety of image recognition benchmarks, including tests that involve sketches, corrupted images, and adversarial examples designed to fool classifiers. Pre-trained model weights are available for three model sizes called ViT-Base, ViT-Large, and ViT-Huge. These names refer to the Vision Transformer architecture, a type of neural network that processes images by dividing them into patches and treating those patches similarly to how language models process words. The largest model (ViT-Huge at 448 pixel input) achieved 87.8% accuracy on ImageNet, which was state of the art at the time of publication. The repository includes code for the visualization demo, fine-tuning on new datasets, and running the pre-training process from scratch. A Colab notebook lets anyone try the visualization without a local GPU. The project is released under the CC-BY-NC 4.0 license, which allows non-commercial use.

Open on GitHub → Explain another repo

← facebookresearch on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.