eriklindernoren/ml-from-scratch

★ 31,419PythonAudience · researcherComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((ml-from-scratch))
    Goal
      Education not production
      Math made visible
    Algorithms covered
      Supervised learning
      Unsupervised learning
      Reinforcement learning
      Deep learning layers
    Tech stack
      Python
      NumPy only
      Matplotlib visuals
    Use cases
      Study ML internals
      Interview prep
      Run visual examples
    How to use
      Clone and run locally
      Read alongside theory

mindmap root((ml-from-scratch)) Goal Education not production Math made visible Algorithms covered Supervised learning Unsupervised learning Reinforcement learning Deep learning layers Tech stack Python NumPy only Matplotlib visuals Use cases Study ML internals Interview prep Run visual examples How to use Clone and run locally Read alongside theory

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Study how neural networks, decision trees, and support vector machines work by reading and running clean Python implementations

USE CASE 2

Prepare for technical machine learning interviews by implementing classic algorithms from scratch with no shortcuts

USE CASE 3

Visualize how a GAN learns to generate handwritten digits or how a regression model fits data through included runnable example scripts

USE CASE 4

Understand deep learning building blocks like convolutional layers, batch normalization, and attention mechanisms without any black-box library hiding the math

Tech stack

PythonNumPyscikit-learnMatplotlib

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

ML From Scratch is a collection of Python implementations of machine learning algorithms written from first principles using only NumPy, the fundamental numerical computing library. Its goal is education: rather than providing optimized, production-ready code, it prioritizes showing exactly how each algorithm works step by step, making the underlying math and logic visible and approachable. The project covers a broad range of machine learning techniques organized into four categories. Supervised learning includes algorithms like linear regression, decision trees, support vector machines, and neural networks. Unsupervised learning includes clustering methods like k-means and DBSCAN, dimensionality reduction methods like PCA, and generative models like variational autoencoders and generative adversarial networks. Reinforcement learning includes deep Q-networks. The deep learning section covers building neural network layers from scratch, including convolutional layers, recurrent layers, batch normalization, and attention mechanisms. Each implementation is accompanied by runnable example scripts that produce visualizations, such as an animated GIF of a GAN learning to generate handwritten digits or a graph of a regression model fitting temperature data. This makes abstract concepts concrete by letting learners run and observe the algorithms directly. You would use this repository when studying machine learning and wanting to understand what is actually happening inside a model, rather than just using a high-level library like scikit-learn or PyTorch as a black box. It is also useful for preparing for technical interviews where implementation knowledge matters. The tech stack is Python with NumPy as the only significant dependency. Some examples also use scikit-learn for datasets and Matplotlib for plotting. The project is designed to be read and run locally rather than deployed.

Copy-paste prompts

Prompt 1

Using ml-from-scratch, walk me through the neural network implementation step by step and explain what each layer's forward and backward pass is doing in plain terms

Prompt 2

Run the GAN example from ml-from-scratch on the MNIST dataset and explain what is happening at each training step that produces the animated digit generation

Prompt 3

Using ml-from-scratch's k-means implementation as a reference, write a new version that supports k-means++ initialization for better starting centroids

Prompt 4

Compare ml-from-scratch's PCA implementation with scikit-learn's PCA on the iris dataset and verify they produce the same transformed output

Prompt 5

Using ml-from-scratch's decision tree as a starting point, add feature importance scoring that ranks which input columns influence predictions the most

Open on GitHub → Explain another repo

← eriklindernoren on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.