explaingit

probml/pyprobml

7,069Jupyter NotebookAudience · researcherComplexity · 3/5Setup · easy

TLDR

Python notebooks that reproduce every figure and code example from Kevin Murphy's two probabilistic machine learning textbooks, organized by chapter and runnable in Google Colab.

Mindmap

mindmap
  root((pyprobml))
    Books covered
      Introduction book
      Advanced Topics book
    Topics
      Bayesian inference
      Uncertainty quantification
      Deep learning methods
    Tech stack
      Python
      JAX
      NumPy
      scikit-learn
    Running options
      Google Colab
      Local install
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Follow along with Kevin Murphy's probabilistic machine learning textbooks using runnable code examples in Colab

USE CASE 2

Reproduce academic figures and experiments from probabilistic ML research with working Python notebooks

USE CASE 3

Learn Bayesian inference and uncertainty quantification with concrete JAX or scikit-learn implementations

USE CASE 4

Run probabilistic ML experiments on free Google Colab GPUs without setting up a local environment

Tech stack

PythonJupyter NotebookJAXNumPyscikit-learnTensorFlowPyTorch

Getting it running

Difficulty · easy Time to first run · 5min

Best run in Google Colab where most libraries are pre-installed, some advanced notebooks require JAX with GPU access for reasonable speed.

In plain English

pyprobml is a collection of Python notebooks that reproduce the figures and code examples from two textbooks by Kevin Murphy: "Probabilistic Machine Learning: An Introduction" and "Probabilistic Machine Learning: Advanced Topics." The notebooks are organized by book and chapter so readers can follow along with the textbook and run the examples themselves. Probabilistic machine learning is a branch of the field that treats predictions as probability distributions rather than single fixed answers. This lets models express uncertainty, which is useful in areas like medical diagnosis or scientific research where knowing how confident a prediction is matters as much as the prediction itself. The books cover a wide range of topics in this area, and the code in this repository shows working implementations of the methods they describe. Most of the code uses standard Python scientific computing libraries such as NumPy, SciPy, Matplotlib, and scikit-learn. Some notebooks, especially those from the advanced topics book, also use JAX, which is a Google library for numerical computing on GPUs and TPUs. A few notebooks from the introduction book use TensorFlow and PyTorch. The simplest way to run the notebooks is through Google Colab, a free browser-based environment that has most of the required libraries already installed and provides access to GPUs. The README explains how to open any notebook in Colab by modifying its GitHub URL. For those who want to run the code locally, a requirements file is provided for installing the necessary packages. As of September 2022, the repository is in maintenance mode, meaning active development has stopped. The notebooks are still available and runnable, and contributors are still accepted via the contribution guide.

Copy-paste prompts

Prompt 1
I am reading Kevin Murphy's Probabilistic Machine Learning Introduction book. Help me open and run the Chapter 4 notebook from pyprobml in Google Colab.
Prompt 2
Using the pyprobml code, help me implement a Gaussian mixture model from the Introduction textbook and visualize the cluster assignments.
Prompt 3
Show me how to run the JAX-based advanced topics notebooks from pyprobml on Google Colab, including how to enable GPU acceleration.
Prompt 4
I want to understand Bayesian linear regression. Which pyprobml notebooks cover it and how do I adapt the code for my own dataset?
Prompt 5
Help me adapt a pyprobml example notebook to use my own CSV data instead of the synthetic data the textbook example generates.
Open on GitHub → Explain another repo

← probml on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.