fivethirtyeight/data

Analysis updated 2026-06-24

★ 17,359Jupyter NotebookAudience · dataComplexity · 1/5LicenseSetup · easy

Mindmap

mindmap
  root((data))
    Inputs
      Raw CSVs
      Notebooks
      Index file
    Outputs
      Datasets per story
      Reproducible analyses
    Use Cases
      Reproduce a FiveThirtyEight study
      Teach data journalism
      Practice data wrangling
    Tech Stack
      Jupyter Notebook
      Python
      R
    License
      CC BY 4.0 data
      MIT code

mindmap root((data)) Inputs Raw CSVs Notebooks Index file Outputs Datasets per story Reproducible analyses Use Cases Reproduce a FiveThirtyEight study Teach data journalism Practice data wrangling Tech Stack Jupyter Notebook Python R License CC BY 4.0 data MIT code

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Reproduce a FiveThirtyEight politics or sports analysis from raw data

USE CASE 2

Use a CSV as a teaching dataset for an intro stats or pandas class

USE CASE 3

Bootstrap a side project with vetted real-world journalism data

What is it built with?

JupyterPythonRCSV

How does it compare?

	fivethirtyeight/data	stefan-jansen/machine-learning-for-trading	ufund-me/qbot
Stars	17,359	17,322	17,322
Language	Jupyter Notebook	Jupyter Notebook	Jupyter Notebook
Setup difficulty	easy	hard	hard
Complexity	1/5	4/5	4/5
Audience	data	data	data

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Just clone and open the CSVs in pandas or a spreadsheet, no install needed.

Data is free to use and share with attribution under CC BY 4.0, and the code is free to use under MIT.

In plain English

This repository is the public archive of data and code that powered the articles and charts published by FiveThirtyEight, a data journalism outlet. Each folder in the repository corresponds to a story or analysis, containing the raw data files and any code used to process or visualize them. The datasets cover topics FiveThirtyEight wrote about, including sports, politics, economics, and culture. An index file lists all available datasets alongside links to the accompanying articles. The data is released under the Creative Commons Attribution 4.0 license, meaning anyone can freely use and share it with attribution. The accompanying code is under the MIT License. Sports predictions and forecasts in the repository are no longer being updated as of June 2023. The rest of the data archive remains available as a historical record. You would use this if you are a student, journalist, or data analyst who wants to explore real-world datasets from published journalism, reproduce a FiveThirtyEight analysis, or use their data as a starting point for your own work.

Copy-paste prompts

Prompt 1

Pick a recent dataset from fivethirtyeight/data and walk me through loading it in pandas with a chart

Prompt 2

Help me reproduce the FiveThirtyEight NBA Elo ratings using the CSVs in this repo

Prompt 3

Find me 3 small datasets in fivethirtyeight/data good for a beginner pandas tutorial

Prompt 4

Compare the polling data folder in fivethirtyeight/data to what a 2024 election model would need

Frequently asked questions

What is data?

Public archive of datasets and code behind FiveThirtyEight articles, covering politics, sports, economics, and culture. Each folder maps to one published story.

What language is data written in?

Mainly Jupyter Notebook. The stack also includes Jupyter, Python, R.

What license does data use?

Data is free to use and share with attribution under CC BY 4.0, and the code is free to use under MIT.

How hard is data to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is data for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub fivethirtyeight on gitmyhub

Verify against the repo before relying on details.