jakevdp/pythondatasciencehandbook

Analysis updated 2026-06-20

★ 47,914Jupyter NotebookAudience · dataComplexity · 1/5LicenseSetup · easy

Mindmap

mindmap
  root((repo))
    What it does
      Free data science textbook
      Runnable Jupyter chapters
      Google Colab compatible
    Libraries covered
      NumPy arrays
      Pandas DataFrames
      Matplotlib charts
      Scikit-Learn models
    Learning path
      Numerical computation
      Data manipulation
      Visualization
      Machine learning
    Audience
      Data science learners
      Python beginners
      Students and analysts

mindmap root((repo)) What it does Free data science textbook Runnable Jupyter chapters Google Colab compatible Libraries covered NumPy arrays Pandas DataFrames Matplotlib charts Scikit-Learn models Learning path Numerical computation Data manipulation Visualization Machine learning Audience Data science learners Python beginners Students and analysts

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Learn to load, filter, and group tabular data using Pandas by following hands-on, runnable notebook examples.

USE CASE 2

Build your first machine learning classifier using Scikit-Learn by working through the supervised and unsupervised learning chapters.

USE CASE 3

Visualize datasets using Matplotlib by running the visualization chapter notebooks directly in Google Colab with no setup.

What is it built with?

PythonJupyter NotebookNumPyPandasMatplotlibScikit-Learn

How does it compare?

	jakevdp/pythondatasciencehandbook	gokumohandas/made-with-ml	microsoft/ai-for-beginners
Stars	47,914	47,507	47,250
Language	Jupyter Notebook	Jupyter Notebook	Jupyter Notebook
Setup difficulty	easy	moderate	moderate
Complexity	1/5	4/5	3/5
Audience	data	data	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Code samples are MIT-licensed and free for any use, the written text may not be used for commercial purposes per its Creative Commons license.

In plain English

The Python Data Science Handbook is a comprehensive, freely available textbook that teaches the essential tools for doing data science with Python. It covers the entire workflow from loading and cleaning data to visualizing results and building machine learning models, all using the most widely adopted Python libraries in the field. The book is organized around five core libraries. IPython and Jupyter Notebooks provide an interactive environment for experimenting with code and presenting results alongside prose and charts. NumPy introduces efficient numerical computation, particularly working with large arrays of numbers far faster than plain Python lists allow. Pandas adds a higher-level table structure called a DataFrame for loading, filtering, grouping, and transforming datasets. Matplotlib handles data visualization, from simple line charts to complex multi-panel figures. Scikit-Learn covers machine learning: building models that classify, cluster, predict, or reduce the dimensionality of data. The repository contains the full text of the book as Jupyter Notebooks, interactive documents that mix runnable code, output, explanations, and charts. You can read it on a website, run it locally by cloning the repository, or open it instantly in Google Colab or Binder without installing anything. You would use this resource when learning data science with Python for the first time, refreshing your knowledge of a particular library, or following along with practical examples that run directly in your browser. It assumes you already know basic Python, if you don't, the same author provides a separate free tutorial as a companion resource. The code samples are licensed under MIT (meaning you can freely reuse them), while the text is under Creative Commons with non-commercial restrictions. The primary language is Jupyter Notebook, and the libraries covered are all Python-based.

Copy-paste prompts

Prompt 1

Using the Python Data Science Handbook as context, show me how to use Pandas to load a CSV, filter rows where a column exceeds a threshold, and group by another column to compute the mean.

Prompt 2

Based on the Python Data Science Handbook Scikit-Learn chapter, walk me through training a random forest classifier and evaluating it with cross-validation on a tabular dataset.

Prompt 3

Using NumPy as taught in the Python Data Science Handbook, show me how to replace a slow Python loop that computes pairwise distances between rows in an array with a vectorized operation.

Prompt 4

Show me how to create a multi-panel Matplotlib figure with subplots, a histogram, and a scatter plot, following the style used in the Python Data Science Handbook visualization chapter.

Frequently asked questions

What is pythondatasciencehandbook?

A free, complete data science textbook for Python, covering NumPy, Pandas, Matplotlib, and Scikit-Learn, with every chapter as a runnable Jupyter Notebook you can open in Google Colab with no installation required.

What language is pythondatasciencehandbook written in?

Mainly Jupyter Notebook. The stack also includes Python, Jupyter Notebook, NumPy.

What license does pythondatasciencehandbook use?

Code samples are MIT-licensed and free for any use, the written text may not be used for commercial purposes per its Creative Commons license.

How hard is pythondatasciencehandbook to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is pythondatasciencehandbook for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub jakevdp on gitmyhub

Verify against the repo before relying on details.