explaingit

jvns/pandas-cookbook

7,069Jupyter NotebookAudience · dataComplexity · 2/5LicenseSetup · easy

TLDR

Nine interactive Jupyter notebooks teaching pandas data analysis using real-world datasets, from reading a CSV file to grouping, cleaning messy data, and working with dates and SQL.

Mindmap

mindmap
  root((pandas-cookbook))
    Chapters covered
      CSV basics
      Grouping data
      Combining datasets
      Text and dates
      SQL loading
    Real datasets
      NYC 311 calls
      Montreal bike paths
      Weather data 2012
    Tech stack
      Python
      pandas
      Jupyter Notebook
    Running options
      Jupyter Lite browser
      Local install
      Docker
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn pandas from scratch using real messy datasets in interactive browser notebooks without installing anything

USE CASE 2

Filter, sort, group, and clean tabular data from CSV files using pandas

USE CASE 3

Combine multiple datasets by merging on a shared column, then analyze the result

USE CASE 4

Extract patterns from timestamped data and load tables from a SQL database into a pandas DataFrame

Tech stack

PythonpandasJupyter Notebook

Getting it running

Difficulty · easy Time to first run · 5min

Can run entirely in the browser via Jupyter Lite with no installation required, all three datasets are included in the repo.

Free to share and adapt, but you must give credit and share any derivative work under the same Creative Commons terms.

In plain English

Pandas is a Python library for working with structured data like spreadsheets and CSV files. It is widely used in data analysis because it makes it fast to filter, sort, group, and combine large datasets. This cookbook is a collection of worked examples intended to help beginners get started with pandas using real datasets rather than toy examples. The cookbook is organized as nine chapters, each in its own Jupyter Notebook file. Jupyter Notebooks are interactive documents where code and explanatory text are combined, so you can run each example step by step in your browser or on your own machine. The chapters start with the basics, like reading a CSV file and selecting rows or columns, and progress through more involved tasks: grouping data to find patterns, combining multiple datasets, extracting information from text, cleaning up messy data, working with dates and timestamps, and loading data from a SQL database. All three real-world datasets used in the cookbook are included in the repository, so you can run every example immediately without hunting for data. The datasets are 311 service calls in New York City, bicycle path counts in Montreal, and hourly Montreal weather data for 2012. You can try the cookbook in your browser via Jupyter Lite without installing anything. To run it locally, you clone the repository, install the dependencies with pip, and start Jupyter. A Docker option is also described for those who prefer containers. The cookbook was written by Julia Evans, who notes in the README that the official pandas documentation is thorough but that many people find it hard to get started without concrete examples that show real-world messiness. The license is Creative Commons Attribution-ShareAlike 4.0. A Chinese translation of the repository exists separately.

Copy-paste prompts

Prompt 1
Using pandas, help me group the NYC 311 service call dataset from the pandas cookbook by complaint type and count occurrences per borough.
Prompt 2
I am following the pandas cookbook chapter on combining datasets. Help me merge two CSV files that share a customer ID column and find rows that appear in only one of them.
Prompt 3
Show me how to extract the month and day from a messy date string column in a pandas DataFrame, the way the Montreal weather data example does it.
Prompt 4
Using the pandas cookbook approach, clean a CSV file with missing values, inconsistent date formats, and duplicate rows, here is a sample: [paste data here].
Prompt 5
Help me follow the pandas cookbook SQL chapter to load a table from a SQLite database into a DataFrame and filter rows from the last 30 days.
Open on GitHub → Explain another repo

← jvns on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.