explaingit

google/magika

16,993PythonAudience · developerComplexity · 2/5Setup · easy

TLDR

Magika is a Google tool that identifies what type a file really is by inspecting its contents with a small AI model, detecting over 200 file types in about 5 milliseconds on a standard CPU.

Mindmap

mindmap
  root((Magika))
    What it does
      File type detection
      Content-based scanning
      AI model inference
    How it works
      Inspects file slice
      Deep learning model
      Confidence modes
    Outputs
      MIME type
      Plain label
      JSON or JSONL
    Tech
      Python
      Rust CLI
      JavaScript package
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Identify the true type of user-uploaded files in a security pipeline to route them to the correct scanner.

USE CASE 2

Detect disguised or misnamed malicious files during malware analysis workflows.

USE CASE 3

Build a file processing app that behaves differently depending on whether an upload is a PDF, image, script, or archive.

USE CASE 4

Replace extension-based file type guessing in a batch processing pipeline with accurate content-based detection.

Tech stack

PythonRustJavaScriptTypeScriptGodeep learning

Getting it running

Difficulty · easy Time to first run · 5min
License information is not mentioned in the repository description.

In plain English

Magika is a tool from Google that figures out what kind of file something actually is, Python source, a Word document, a PNG image, a Dockerfile, and so on, by looking at the contents rather than trusting the extension. Knowing the real type matters for security, since the wrong assumption is how malicious files sneak past scanners. What makes Magika unusual is that it uses a small AI model (deep learning) for the job instead of hand-written rules. The model is only a few megabytes, runs on a single CPU, and can identify a file in about five milliseconds. It was trained on roughly 100 million samples across more than 200 content types, both binary and textual, and reaches around 99 percent average precision and recall on the test set. Inference time stays nearly constant regardless of file size because Magika only inspects a limited slice. It offers prediction modes, high-confidence, medium-confidence, and best-guess, and falls back to generic labels like Generic text document or Unknown binary data when unsure. You run Magika against one file, many files, or a directory recursively, and it prints the detected type for each, optionally as MIME types, plain labels, JSON, or JSONL. The command-line tool is written in Rust, there is also a Python package, a JavaScript/TypeScript package that powers an in-browser demo, and Go bindings in progress. Google itself uses Magika at scale to route files in Gmail, Drive, and Safe Browsing to the right scanners, and it is integrated with VirusTotal and abuse.ch. Reach for it for fast, accurate file-type identification in security pipelines, malware analysis, or any code that has to behave differently per file type.

Copy-paste prompts

Prompt 1
I have a directory of user-uploaded files with potentially wrong extensions. Write a Python script using Magika to scan them all and print the detected type and MIME type for each.
Prompt 2
Integrate Magika into a Flask file upload handler to type-check every uploaded file and reject ones that do not match the expected type.
Prompt 3
How do I use the Magika Rust CLI to recursively scan a directory and output results as JSONL for downstream processing?
Prompt 4
I am building a malware analysis tool. Show me how to use Magika's Python API in high-confidence mode and fall back to best-guess mode when confidence is low.
Open on GitHub → Explain another repo

← google on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.