explaingit

eriklindernoren/ml-from-scratch

31,419PythonAudience · researcherComplexity · 2/5Setup · easy

TLDR

A Python project that rebuilds popular machine learning algorithms step by step using only NumPy, so you can see exactly how each algorithm works under the hood instead of treating a library like scikit-learn or PyTorch as a black box.

Mindmap

mindmap
  root((ml-from-scratch))
    Goal
      Education not production
      Math made visible
    Algorithms covered
      Supervised learning
      Unsupervised learning
      Reinforcement learning
      Deep learning layers
    Tech stack
      Python
      NumPy only
      Matplotlib visuals
    Use cases
      Study ML internals
      Interview prep
      Run visual examples
    How to use
      Clone and run locally
      Read alongside theory
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Study how neural networks, decision trees, and support vector machines work by reading and running clean Python implementations

USE CASE 2

Prepare for technical machine learning interviews by implementing classic algorithms from scratch with no shortcuts

USE CASE 3

Visualize how a GAN learns to generate handwritten digits or how a regression model fits data through included runnable example scripts

USE CASE 4

Understand deep learning building blocks like convolutional layers, batch normalization, and attention mechanisms without any black-box library hiding the math

Tech stack

PythonNumPyscikit-learnMatplotlib

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

ML From Scratch is a collection of Python implementations of machine learning algorithms written from first principles using only NumPy, the fundamental numerical computing library. Its goal is education: rather than providing optimized, production-ready code, it prioritizes showing exactly how each algorithm works step by step, making the underlying math and logic visible and approachable. The project covers a broad range of machine learning techniques organized into four categories. Supervised learning includes algorithms like linear regression, decision trees, support vector machines, and neural networks. Unsupervised learning includes clustering methods like k-means and DBSCAN, dimensionality reduction methods like PCA, and generative models like variational autoencoders and generative adversarial networks. Reinforcement learning includes deep Q-networks. The deep learning section covers building neural network layers from scratch, including convolutional layers, recurrent layers, batch normalization, and attention mechanisms. Each implementation is accompanied by runnable example scripts that produce visualizations, such as an animated GIF of a GAN learning to generate handwritten digits or a graph of a regression model fitting temperature data. This makes abstract concepts concrete by letting learners run and observe the algorithms directly. You would use this repository when studying machine learning and wanting to understand what is actually happening inside a model, rather than just using a high-level library like scikit-learn or PyTorch as a black box. It is also useful for preparing for technical interviews where implementation knowledge matters. The tech stack is Python with NumPy as the only significant dependency. Some examples also use scikit-learn for datasets and Matplotlib for plotting. The project is designed to be read and run locally rather than deployed.

Copy-paste prompts

Prompt 1
Using ml-from-scratch, walk me through the neural network implementation step by step and explain what each layer's forward and backward pass is doing in plain terms
Prompt 2
Run the GAN example from ml-from-scratch on the MNIST dataset and explain what is happening at each training step that produces the animated digit generation
Prompt 3
Using ml-from-scratch's k-means implementation as a reference, write a new version that supports k-means++ initialization for better starting centroids
Prompt 4
Compare ml-from-scratch's PCA implementation with scikit-learn's PCA on the iris dataset and verify they produce the same transformed output
Prompt 5
Using ml-from-scratch's decision tree as a starting point, add feature importance scoring that ranks which input columns influence predictions the most
Open on GitHub → Explain another repo

← eriklindernoren on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.